🚀 CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models 🛡️
The largest and most comprehensive Generative AI-based CyberSecurity-focused Dataset for Benchmarking Large Language Models
The CySecBench paper offers:
- 🎯 A cutting-edge dataset of 12662 prompts tailored to cybersecurity challenges.
- 🧠 Novel jailbreaking methods leveraging prompt obfuscation and refinement.
- 📊 Comprehensive performance evaluation of LLMs like ChatGPT, Claude, and Gemini.
Why CySecBench?
Existing datasets are too broad and often lack focus on cybersecurity. CySecBench fills this gap by providing domain-specific prompts organized into 10 categories, enabling a precise evaluation of LLM security mechanisms.
You can download the full research paper here: CySecBench (PDF)
- 📁 10 Categories of Prompts:
/
├── Code/
│ ├── dataset_generation.py
│ ├── keywords.txt
├── Dataset/
│ ├── Category sets/
│ │ ├── cysecbench-cloud-attacks.csv
│ │ ├── cysecbench-control-system-attacks.csv
│ │ ├── cysecbench-cryptographic-attacks.csv
│ │ ├── cysecbench-evasion-techniques.csv
│ │ ├── cysecbench-hardware-attacks.csv
│ │ ├── cysecbench-intrusion-techniques.csv
│ │ ├── cysecbench-iot-attacks.csv
│ │ ├── cysecbench-malware-attacks.csv
│ │ ├── cysecbench-network-attacks.csv
│ │ ├── cysecbench-web-application-attacks.csv
│ ├── Full dataset/
│ │ ├── cysecbench.csv
│ ├── Sample sets/
│ ├── cysecbench-500.csv
│ ├── cysecbench-2000.csv
│ ├── cysecbench-6000.csv
- 🐍 Python 3.8+
- 📦 Required libraries:
openai
(only for dataset generation)
- ✅ Success Rate (SR): Percentage of prompts bypassing ethical guidelines.
- 📈 Average Rating (AR): Degree of harmfulness in LLM responses (on a scale of 1-5, where 5 is the most harmful).
LLM | Success Rate (SR) | Average Rating (AR) |
---|---|---|
🤖 Claude | 17.4% | 2.00 |
🤖 ChatGPT | 65.4% | 4.06 |
🤖 Gemini | 88.4% | 4.77 |
If you use CySecBench, please cite:
@article{CySecBench2024,
title = {{CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models}},
author = {Johan Wahréus and Ahmed Mohamed Hussain and Panos Papadimitratos},
year = {2025},
journal = {arXiv preprint arXiv:2501.01335},
url = {https://arxiv.org/abs/2501.01335}
}
If you found CySecBench helpful or interesting, please give this repository a star ⭐ to show your support!
This project is licensed under the MIT License. See the LICENSE file for details.