Skip to content
/ SRDC Public

RDCS: Ransomware Detection and Classification Using Semantics with LLM-assisted Pre-training

Notifications You must be signed in to change notification settings

iDC-NEU/SRDC

Repository files navigation

SRDC: Semantics-based Ransomware Detection and Classification with LLM-assisted Pre-training

Ce Zhou, Yilun Liu, Weibin Meng, Shimin Tao, Weinan Tian, Feiyu Yao, Xiaochun Li, Tao Han, Boxing Chen, Hao Yang

News💡

  • [2024.12] SRDC is accepted by AAAI 2025 AISI track! 🎉🎉🎉

Usage 🛠

🚀 Feature_Internal_Semantic_Processing

1. Download open-source datasets from RISS group Link, for fine-tuning and testing models

2. Feature Internal Semantic Processing

python Feature_Internal_Semantic_Processing/Internal_Semantic_Processing.py 

Please set the path to the required file --RansomwareData_csv_path RansomwareData.csv path --VariableNames_txt_path VariableNames.txt path

An example of usage: My RansomwareData_csv_path is data/path_A.csv My VariableNames_txt_path is data/path_B.csv

python Feature_Internal_Semantic_Processing/Internal_Semantic_Processing.py --RansomwareData_csv_path data/path_A.csv --VariableNames_txt_path data/path_B.csv

🚀 ZeroDay Ransomware Detection

Our model after LDTAP pre-training is at HuggingfaceLink, and our pre-trained corpus is at Link

python ZeroDay_Ransomware_Detection/ransomware_0_day_detection.py  

Please set the path to the required file --Data_Test_path Training data after internal feature semantic processing --Data_Train_path Test data after internal feature semantic processing

So please do the first step of internal feature semantic processing and then divide the training dataset (for fine-tuning) and the test dataset (for testing)

🚀 Ransomware Family Classification

python Ransomware_Family_Classification/ransomware_family_classifier.py 

Please set the path to the required file --Data_csv_path Data after internal feature semantic processing

So please do the first step of internal feature semantic processing

LATAP Training Data Generation

The corpus was generated using GPT-3.5 Turbo API with custom-designed promptsLink. Below is a detailed explanation of the corpus composition at that time:

Corpus Source Manually Collected Generated by gpt-3.5-turbo
Windows System APIs 230 1903
Windows System Registry 1005 7380
Introduction to Ransomware 59 0

Additionally, all corpora were reviewed by our security experts.

More Recent Models

We tested the latest models, including the gpt-4-turbo and claude-3.5-sonnet. The experimental results are as follows:

zero-day ransomware detection:

Method Accuracy Recall F1-score
gpt-4-turbo 0.4950 0.1000 0.1653
claude-3.5-sonnet 0.4700 0.6600 0.5546
ours (Seen two classes) 0.886 0.916 0.913

ransomware family classification:

Method Balanced Accuracy
gpt-4-turbo 0.1075
claude-3.5-sonnet 0.1045
ours 0.5483

The results of the experiments indicate that both GPT-4 and Claude show limited understanding of specific dynamic features. They rely on general knowledge and reasoning to solve problems, which inevitably leads to hallucinations.

For implementation and deployment, our system should primarily consider the processing time for individual data points, specifically whether it can quickly respond based on dynamically collected online features. This is crucial in security products. Here we provide experimental data and continue to explore this area:

zero-day ransomware detection:

Method Speed (seconds per sample)
gpt-4-turbo 0.786
claude-3.5-sonnet 5.96
ours (Seen two classes) 0.0866

ransomware family classification:

Method Speed (seconds per sample)
gpt-4-turbo 0.808
claude-3.5-sonnet 6.07
ours 0.0836

These results indicate the model’s practical viability in terms of speed and response time, allowing for rapid threat response based on provided features.

For large-scale deployments with multiple concurrent requests, we can deploy the pre-trained model across multiple clusters, supporting multi-cluster deployment from a single training session.

🙋‍♂️If you have any questions about our project, please send email to [email protected].

About

RDCS: Ransomware Detection and Classification Using Semantics with LLM-assisted Pre-training

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages