Skip to content

A comprehensive list of amazingly awesome cybersecurity datasets

License

Notifications You must be signed in to change notification settings

Arslan1979/security_datasets

This branch is up to date with karapto/security_datasets:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

975e40e · Jan 4, 2024

History

61 Commits
May 19, 2020
Jan 4, 2024

Repository files navigation

Security Datasets

Popular datasets on security used in research and workshops.

If you do not know how to use the following datasets, please let me know. If I have time, I will look into it and implement a Python script or Jupyter notebook.


List

Intrusion Detection

Malware

Spam Mail

Malicious Domains

Dark_web

Image_dataset

CANBus

Web

Game

Hardware_Trojan


Datasets

Intrusion_Detection

KDD Cup 1999 Data [1]

Canadian Institute for Cybersecurity datasets[2]

User-Computer Authentication Associations in Time[3]

Unified Host and Network Dataset[4]

CTU-13 Dataset[5]

IoT devices captures[6]

Splunk Security Dataset

Automotive Controller Area Network (CAN) Bus Intrusion Dataset v2[14]

MQTT-IOT-IDS2020: MQTT INTERNET OF THINGS INTRUSION DETECTION DATASET[16]

Malware

Malware Training Sets[7]

Stratosphere IPS[8]

The Drebin Dataset[9]

UNSW-NB15 data set[10]

Microsoft Malware Classification Challenge

Spam_Mail

2007 TREC Public Spam Corpus[13]

・ SPAM list

Malicious_Domains

・ Malicious URLs Dataset

Malware Domain List[11]

Malicious and Benign Websites[12]

Detecting Malicious URLs

Feodo Tracker

OpenDNS Top Domains List

The Majestic Million

StopForumSpam

Dark_web

CIC-Darknet2020

Dark Net Marketplace Data

Image_dataset

The t5 corpus[15]

CANBus_Network

SynCAN[17]

INTRUSION DETECTION IN CAN BUS[18]

IN-VEHICLE NETWORK INTRUSION DETECTION CHALLENGE[19]

Car-Hacking Dataset[20]

OBDdatasets

ReCAN Data[24]

Web

Web-Hacking Dataset[21]

Game

Game-Contagion[22]

Game Bot Detection[23]

Hardware_Trojan

Hardware-Security

Hardware Trojan Power & EM Side-Channel Dataset[25]

CAS lab HT AI Data Table[26]


Citation and Reference

[1] Tavallaee, Mahbod, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani. "A detailed analysis of the KDD CUP 99 data set." In 2009 IEEE symposium on computational intelligence for security and defense applications, pp. 1-6. IEEE, 2009.

[2] Panigrahi, Ranjit, and Samarjeet Borah. "A detailed analysis of CICIDS2017 dataset for designing Intrusion Detection Systems." International Journal of Engineering & Technology 7, no. 3.24 (2018): 479-482.

[3] Kent, Alexander D. "User-computer authentication associations in time." Los Alamos National Laboratory (2014).

[4] Patrick, Rubin-delanchy, and Turcotte Mellisa, eds. Data Science for Cyber-security. Vol. 3. World Scientific, 2018.

[5] Sebastian Garcia, Martin Grill, Jan Stiborek and Alejandro Zunino. "An empirical comparison of botnet detection methods", Computers and Security Journal, Elsevier. 2014. Vol 45, pp 100-123.

[6] Miettinen, Markus, Samuel Marchal, Ibbad Hafeez, N. Asokan, Ahmad-Reza Sadeghi, and Sasu Tarkoma. "Iot sentinel: Automated device-type identification for security enforcement in iot." In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 2177-2184. IEEE, 2017.

[7] Marco Ramilli,"Malware Training Sets: a machine learning dataset for everyone", https://marcoramilli.com/2016/12/16/malware-training-sets-a-machine-learning-dataset-for-everyone/, 2016

[8] Stratosphere. (2015). Stratosphere Laboratory Datasets. Retrieved March 13, 2020, from https://www.stratosphereips.org/datasets-overview

[9] Daniel Arp, Michael Spreitzenbarth, Malte Huebner, Hugo Gascon, and Konrad Rieck "Drebin: Efficient and Explainable Detection of Android Malware in Your Pocket", 21th Annual Network and Distributed System Security Symposium (NDSS), February 2014

[10] Moustafa, Nour, et al. "An Ensemble Intrusion Detection Technique based on proposed Statistical Flow Features for Protecting Network Traffic of Internet of Things." IEEE Internet of Things Journal (2018).

[11] List, Malware Domain. "Malware Domain List frequent asked questions." (2013).

[12] rcuqui, C., Navarro, A., Osorio, J., & Garcıa, M. (2017). Machine Learning Classifiers to Detect Malicious Websites. CEUR Workshop Proceedings. Vol 1950, 14-17.

[13] 2007 TREC Public Spam Corpus, https://plg.uwaterloo.ca/~gvcormac/treccorpus07/,

[14] Automotive Controller Area Network (CAN) Bus Intrusion Dataset v2 , https://data.4tu.nl/articles/dataset/Automotive_Controller_Area_Network_CAN_Bus_Intrusion_Dataset/12696950

[15] Roussev, V., An Evaluation of Forensic Similarity Hashes. In Proceedings of the Eleventh Annual DFRWS Conference, pp. S34-41, Aug 2011, New Orleans, LA.

[16] Hindy, Hanan, Ethan Bayne, Miroslav Bures, Robert Atkinson, Christos Tachtatzis, and Xavier Bellekens. "Machine Learning Based IoT Intrusion Detection System: An MQTT Case Study (MQTT-IoT-IDS2020 Dataset)." In International Networking Conference, pp. 73-84. Springer, Cham, 2020.

[17] Hanselmann, Markus, Thilo Strauss, Katharina Dormann, and Holger Ulmer. "CANet: An unsupervised intrusion detection system for high dimensional CAN bus data." IEEE Access 8 (2020): 58194-58205.

[18] Muhammad Sami, December 30, 2019, "Intrusion Detection in CAN bus", IEEE Dataport, doi: https://dx.doi.org/10.21227/24m9-a446.

[19] Mee Lan Han, Byung Il Kwak, and Huy Kang Kim. “Anomaly intrusion detection method for vehicular networks based on survival analysis.” Vehicular Communications 14 (2018): 52-63.

[20] Song, Hyun Min, Jiyoung Woo, and Huy Kang Kim. "In-vehicle network intrusion detection using deep convolutional neural network." Vehicular Communications 21 (2020): 100198.

[21] Han, M. L., Kwak, B. I., & Kim, H. K. (2019). CBR-Based Decision Support Methodology for Cybercrime Investigation: Focused on the Data-Driven Website Defacement Analysis. Security and Communication Networks, 2019.

[22] Woo, Jiyoung, et al. "Contagion of Cheating Behaviors in Online Social Networks." IEEE Access 6 (2018): 29098-29108.

[23] Kang, A. R., Jeong, S. H., Mohaisen, A., & Kim, H. K. (2016). Multimodal game bot detection using user behavioral characteristics. SpringerPlus, 5(1), 1-19.

[24] Zago, Mattia; Longari, Stefano; Tricarico, Andrea; Carminati, Michele; Gil Pérez, Manuel; Martinez Perez, Gregorio; Zanero, Stefano (2020), “ReCAN Data - Reverse engineering of Controller Area Networks”, Mendeley Data, V2, doi: 10.17632/76knkx3fzv.2

[25] Faezi, Sina, Rozhin Yasaei, Anomadarshi Barua, and Mohammad Abdullah Al Faruque. "Brain-inspired golden chip free hardware trojan detection." IEEE Transactions on Information Forensics and Security 16 (2021): 2697-2708.

[26]

About

A comprehensive list of amazingly awesome cybersecurity datasets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published