SRDC: Semantics-based Ransomware Detection and Classification with LLM-assisted Pre-training

Ce Zhou, Yilun Liu, Weibin Meng, Shimin Tao, Weinan Tian, Feiyu Yao, Xiaochun Li, Tao Han, Boxing Chen, Hao Yang

News💡

[2024.12] SRDC is accepted by AAAI 2025 AISI track! 🎉🎉🎉

Usage 🛠

🚀 Feature_Internal_Semantic_Processing

1. Download open-source datasets from RISS group Link, for fine-tuning and testing models

2. Feature Internal Semantic Processing

python Feature_Internal_Semantic_Processing/Internal_Semantic_Processing.py

Please set the path to the required file --RansomwareData_csv_path RansomwareData.csv path --VariableNames_txt_path VariableNames.txt path

An example of usage: My RansomwareData_csv_path is data/path_A.csv My VariableNames_txt_path is data/path_B.csv

python Feature_Internal_Semantic_Processing/Internal_Semantic_Processing.py --RansomwareData_csv_path data/path_A.csv --VariableNames_txt_path data/path_B.csv

🚀 ZeroDay Ransomware Detection

Our model after LDTAP pre-training is at HuggingfaceLink, and our pre-trained corpus is at Link

python ZeroDay_Ransomware_Detection/ransomware_0_day_detection.py

Please set the path to the required file --Data_Test_path Training data after internal feature semantic processing --Data_Train_path Test data after internal feature semantic processing

So please do the first step of internal feature semantic processing and then divide the training dataset (for fine-tuning) and the test dataset (for testing)

🚀 Ransomware Family Classification

python Ransomware_Family_Classification/ransomware_family_classifier.py

Please set the path to the required file --Data_csv_path Data after internal feature semantic processing

So please do the first step of internal feature semantic processing

LATAP Training Data Generation

The corpus was generated using GPT-3.5 Turbo API with custom-designed promptsLink. Below is a detailed explanation of the corpus composition at that time：

Corpus Source	Manually Collected	Generated by gpt-3.5-turbo
Windows System APIs	230	1903
Windows System Registry	1005	7380
Introduction to Ransomware	59	0

Additionally, all corpora were reviewed by our security experts.

More Recent Models

We tested the latest models, including the gpt-4-turbo and claude-3.5-sonnet. The experimental results are as follows:

zero-day ransomware detection:

Method	Accuracy	Recall	F1-score
gpt-4-turbo	0.4950	0.1000	0.1653
claude-3.5-sonnet	0.4700	0.6600	0.5546
ours (Seen two classes)	0.886	0.916	0.913

ransomware family classification:

Method	Balanced Accuracy
gpt-4-turbo	0.1075
claude-3.5-sonnet	0.1045
ours	0.5483

The results of the experiments indicate that both GPT-4 and Claude show limited understanding of specific dynamic features. They rely on general knowledge and reasoning to solve problems, which inevitably leads to hallucinations.

For implementation and deployment, our system should primarily consider the processing time for individual data points, specifically whether it can quickly respond based on dynamically collected online features. This is crucial in security products. Here we provide experimental data and continue to explore this area:

zero-day ransomware detection:

Method	Speed （seconds per sample）
gpt-4-turbo	0.786
claude-3.5-sonnet	5.96
ours (Seen two classes)	0.0866

ransomware family classification:

Method	Speed （seconds per sample）
gpt-4-turbo	0.808
claude-3.5-sonnet	6.07
ours	0.0836

These results indicate the model’s practical viability in terms of speed and response time, allowing for rapid threat response based on provided features.

For large-scale deployments with multiple concurrent requests, we can deploy the pre-trained model across multiple clusters, supporting multi-cluster deployment from a single training session.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Feature_Internal_Semantic_Processing		Feature_Internal_Semantic_Processing
Pretraining_Corpus		Pretraining_Corpus
Ransomware_Family_Classification		Ransomware_Family_Classification
ZeroDay_Ransomware_Detection		ZeroDay_Ransomware_Detection
README.md		README.md
SRDC.pdf		SRDC.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRDC: Semantics-based Ransomware Detection and Classification with LLM-assisted Pre-training

Ce Zhou, Yilun Liu, Weibin Meng, Shimin Tao, Weinan Tian, Feiyu Yao, Xiaochun Li, Tao Han, Boxing Chen, Hao Yang

News💡

Usage 🛠

🚀 Feature_Internal_Semantic_Processing

1. Download open-source datasets from RISS group Link, for fine-tuning and testing models

2. Feature Internal Semantic Processing

🚀 ZeroDay Ransomware Detection

🚀 Ransomware Family Classification

LATAP Training Data Generation

More Recent Models

🙋‍♂️If you have any questions about our project, please send email to [email protected].

About

Releases

Packages

Languages

iDC-NEU/SRDC

Folders and files

Latest commit

History

Repository files navigation

SRDC: Semantics-based Ransomware Detection and Classification with LLM-assisted Pre-training

Ce Zhou, Yilun Liu, Weibin Meng, Shimin Tao, Weinan Tian, Feiyu Yao, Xiaochun Li, Tao Han, Boxing Chen, Hao Yang

News💡

Usage 🛠

🚀 Feature_Internal_Semantic_Processing

1. Download open-source datasets from RISS group Link, for fine-tuning and testing models

2. Feature Internal Semantic Processing

🚀 ZeroDay Ransomware Detection

🚀 Ransomware Family Classification

LATAP Training Data Generation

More Recent Models

🙋‍♂️If you have any questions about our project, please send email to [email protected].

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages