Popular datasets on security used in research and workshops.
If you do not know how to use the following datasets, please let me know. If I have time, I will look into it and implement a Python script or Jupyter notebook.
・ Malware
・ Dark_web
・ CANBus
・ Web
・ Game
・ KDD Cup 1999 Data [1]
・ Canadian Institute for Cybersecurity datasets[2]
・ User-Computer Authentication Associations in Time[3]
・ Unified Host and Network Dataset[4]
・ CTU-13 Dataset[5]
・ IoT devices captures[6]
・ Automotive Controller Area Network (CAN) Bus Intrusion Dataset v2[14]
・ MQTT-IOT-IDS2020: MQTT INTERNET OF THINGS INTRUSION DETECTION DATASET[16]
・ Stratosphere IPS[8]
・ The Drebin Dataset[9]
・ UNSW-NB15 data set[10]
・ Microsoft Malware Classification Challenge
・ 2007 TREC Public Spam Corpus[13]
・ SPAM list
・ Malicious URLs Dataset
・ Malware Domain List[11]
・ Malicious and Benign Websites[12]
・The t5 corpus[15]
・SynCAN[17]
・INTRUSION DETECTION IN CAN BUS[18]
・IN-VEHICLE NETWORK INTRUSION DETECTION CHALLENGE[19]
・Car-Hacking Dataset[20]
・ReCAN Data[24]
・Web-Hacking Dataset[21]
・Game-Contagion[22]
・Game Bot Detection[23]
・Hardware Trojan Power & EM Side-Channel Dataset[25]
[1] Tavallaee, Mahbod, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani. "A detailed analysis of the KDD CUP 99 data set." In 2009 IEEE symposium on computational intelligence for security and defense applications, pp. 1-6. IEEE, 2009.
[2] Panigrahi, Ranjit, and Samarjeet Borah. "A detailed analysis of CICIDS2017 dataset for designing Intrusion Detection Systems." International Journal of Engineering & Technology 7, no. 3.24 (2018): 479-482.
[3] Kent, Alexander D. "User-computer authentication associations in time." Los Alamos National Laboratory (2014).
[4] Patrick, Rubin-delanchy, and Turcotte Mellisa, eds. Data Science for Cyber-security. Vol. 3. World Scientific, 2018.
[5] Sebastian Garcia, Martin Grill, Jan Stiborek and Alejandro Zunino. "An empirical comparison of botnet detection methods", Computers and Security Journal, Elsevier. 2014. Vol 45, pp 100-123.
[6] Miettinen, Markus, Samuel Marchal, Ibbad Hafeez, N. Asokan, Ahmad-Reza Sadeghi, and Sasu Tarkoma. "Iot sentinel: Automated device-type identification for security enforcement in iot." In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 2177-2184. IEEE, 2017.
[7] Marco Ramilli,"Malware Training Sets: a machine learning dataset for everyone", https://marcoramilli.com/2016/12/16/malware-training-sets-a-machine-learning-dataset-for-everyone/, 2016
[8] Stratosphere. (2015). Stratosphere Laboratory Datasets. Retrieved March 13, 2020, from https://www.stratosphereips.org/datasets-overview
[9] Daniel Arp, Michael Spreitzenbarth, Malte Huebner, Hugo Gascon, and Konrad Rieck "Drebin: Efficient and Explainable Detection of Android Malware in Your Pocket", 21th Annual Network and Distributed System Security Symposium (NDSS), February 2014
[10] Moustafa, Nour, et al. "An Ensemble Intrusion Detection Technique based on proposed Statistical Flow Features for Protecting Network Traffic of Internet of Things." IEEE Internet of Things Journal (2018).
[11] List, Malware Domain. "Malware Domain List frequent asked questions." (2013).
[12] rcuqui, C., Navarro, A., Osorio, J., & Garcıa, M. (2017). Machine Learning Classifiers to Detect Malicious Websites. CEUR Workshop Proceedings. Vol 1950, 14-17.
[13] 2007 TREC Public Spam Corpus, https://plg.uwaterloo.ca/~gvcormac/treccorpus07/,
[14] Automotive Controller Area Network (CAN) Bus Intrusion Dataset v2 , https://data.4tu.nl/articles/dataset/Automotive_Controller_Area_Network_CAN_Bus_Intrusion_Dataset/12696950
[15] Roussev, V., An Evaluation of Forensic Similarity Hashes. In Proceedings of the Eleventh Annual DFRWS Conference, pp. S34-41, Aug 2011, New Orleans, LA.
[16] Hindy, Hanan, Ethan Bayne, Miroslav Bures, Robert Atkinson, Christos Tachtatzis, and Xavier Bellekens. "Machine Learning Based IoT Intrusion Detection System: An MQTT Case Study (MQTT-IoT-IDS2020 Dataset)." In International Networking Conference, pp. 73-84. Springer, Cham, 2020.
[17] Hanselmann, Markus, Thilo Strauss, Katharina Dormann, and Holger Ulmer. "CANet: An unsupervised intrusion detection system for high dimensional CAN bus data." IEEE Access 8 (2020): 58194-58205.
[18] Muhammad Sami, December 30, 2019, "Intrusion Detection in CAN bus", IEEE Dataport, doi: https://dx.doi.org/10.21227/24m9-a446.
[19] Mee Lan Han, Byung Il Kwak, and Huy Kang Kim. “Anomaly intrusion detection method for vehicular networks based on survival analysis.” Vehicular Communications 14 (2018): 52-63.
[20] Song, Hyun Min, Jiyoung Woo, and Huy Kang Kim. "In-vehicle network intrusion detection using deep convolutional neural network." Vehicular Communications 21 (2020): 100198.
[21] Han, M. L., Kwak, B. I., & Kim, H. K. (2019). CBR-Based Decision Support Methodology for Cybercrime Investigation: Focused on the Data-Driven Website Defacement Analysis. Security and Communication Networks, 2019.
[22] Woo, Jiyoung, et al. "Contagion of Cheating Behaviors in Online Social Networks." IEEE Access 6 (2018): 29098-29108.
[23] Kang, A. R., Jeong, S. H., Mohaisen, A., & Kim, H. K. (2016). Multimodal game bot detection using user behavioral characteristics. SpringerPlus, 5(1), 1-19.
[24] Zago, Mattia; Longari, Stefano; Tricarico, Andrea; Carminati, Michele; Gil Pérez, Manuel; Martinez Perez, Gregorio; Zanero, Stefano (2020), “ReCAN Data - Reverse engineering of Controller Area Networks”, Mendeley Data, V2, doi: 10.17632/76knkx3fzv.2
[25] Faezi, Sina, Rozhin Yasaei, Anomadarshi Barua, and Mohammad Abdullah Al Faruque. "Brain-inspired golden chip free hardware trojan detection." IEEE Transactions on Information Forensics and Security 16 (2021): 2697-2708.
[26]