Skip to content

LLM Agent and Evaluation Framework for Autonomous Penetration Testing

License

Notifications You must be signed in to change notification settings

najam1997/HackSynth

 
 

Repository files navigation

HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing

The paper can be found on arXiv.

Introduction

HackSynth Logo

We introduce HackSynth, a novel Large Language Model (LLM)-based agent capable of autonomous penetration testing. HackSynth's dual-module architecture includes a Planner and a Summarizer, which enable it to generate commands and process feedback iteratively. To benchmark HackSynth, we propose two new Capture The Flag (CTF)-based benchmark sets utilizing the popular platforms PicoCTF and OverTheWire. These benchmarks include two hundred challenges across diverse domains and difficulties, providing a standardized framework for evaluating LLM-based penetration testing agents.


Using the repository

  • You will have to create a Hugging Face and a Neptune.ai account
  • Copy your API keys to the .env file, and set the desired CUDA devices, based on the .env_example
  • Set up the PicoCTF benchmark
  • Set up the OverTheWire benchmark
  • Start the HackSynth Agent
    • Install the environment:
      python -m venv cyber_venv
      source cyber_venv/bin/activate
      pip install -r requirements.txt
      
    • Start the benchmark with the following:
      python run_bench.py -b benchmark.json -c config.json
      
      The benchmark.json should be one of the generated benchmark_solved.json files, or an equivalently structured file. The configuration files used by us for the measurements in the paper are also available in the configs folder.

How to Cite

If you use this code in your work or research, please cite the corresponding paper:

@misc{muzsai2024hacksynthllmagentevaluation,
      title={HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing}, 
      author={Lajos Muzsai and David Imolai and András Lukács},
      year={2024},
      eprint={2412.01778},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2412.01778}, 
}

Contributors

License

The project uses the GNU AGPLv3 license.

About

LLM Agent and Evaluation Framework for Autonomous Penetration Testing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Other 0.8%