This repository contains the code and data for Vulnerability Detection with Fine-grained Interpretations
Despite the successes of machine learning-based vulnerability detectors (VD), they are limited to providing only the decision on whether a given code is vulnerable or not, without details on what part of the code is relevant to the detected vulnerability. We present IVDetect, an interpretable vulnerability detector with the philosophy of using Artificial Intelligence (AI) to detect vulnerabilities, while using Intelligence Assistant (IA) via providing VD interpretations at the fine-grained level in term of vulnerable statements. For vulnerability detection, we separately consider the vulnerable statements and their surrounding contexts via data and control dependencies. This allows our model better discriminate vulnerable statements than using the mixture of vulnerable code and contextual code as in existing approaches. In addition to the coarsegrained vulnerability detection result, we leverage interpretable ML to provide users with fine-grained interpretations that include the sub-graph in the PDG with the crucial statements that are relevant to the detected vulnerability. Our empirical evaluation on vulnerability databases shows that IVDetect outperforms the existing ML-based approaches 64–122% and 105–255% in top-10 nDCG and MAP ranking scores. IVDetect correctly points out the vulnerable statements relevant to the vulnerability via its interpretations in 67% of the cases with a top-5 ranked list. It improves over ATT and GRAD interpretation models by 12.3–400% and 9–400% in accuracy.
The Dataset we used in the paper:
Fan et al.[1]: https://drive.google.com/file/d/1-0VhnHBp9IGh90s2wCNjeCMuy70HPl8X/view?usp=sharing
Reveal [2]: https://drive.google.com/drive/folders/1KuIYgFcvWUXheDhT--cBALsfy1I4utOy
FFMPeg+Qemu [3]: https://drive.google.com/file/d/1x6hoF7G-tSYxg8AFybggypLZgMGDNHfF
In this study, we use Joern to generate AST and graphs. However, the Joern is updating quickly with some functionality changes. So if you want to use the scripts that we used to generate the graphs. Please use:
git checkout cbca30d2631a48aed47be1ba46c6d8b5aa23c103
to roll back the joern to the old version that we previously used. The scripts for generating the graphs can be found in:
If you are using newer versions of Joern or you have any detailed questions about Joern, please go to Joern's website: https://github.com/joernio/joern for more details on AST and graph generation.
We put an example CSV dataset to show how the generated dataset looks: https://drive.google.com/file/d/1LHOC4JDpnQ7gWnEHGfc4soQYHAPomlNp/view?usp=sharing You can see more details in utils/process.py
about how to use the generated dataset.
We want to clarify that the AST and graphs generated by different versions of Joern may have significant differences based on our findings. So if using the newer versions of Joern to generate ASTs and graphs, the model may have a different performance compared with the results we reported in the paper.
After you generate the AST and graphs and store them into the same format as the example data. You can use our provided preprocessing code in utils/process.py
to preprocess the data and generate the features that used in our model.
Or you can directly go to Code section. The gen_graphs.py
contains the usage of the preprocessing code in utils/process.py
for generating the features for the model.
Please check all requirements in the requirement.txt
Our approach can use NNI (Auto-ML) to tune the parameters. To do so, uncomment all lines with nni
in main.py
and comment line 195 in main.py
. Then run nnictl create --config config.yml
to automatically tune the model parameters.
-
Please use
git clone https://github.com/vulnerabilitydetection/VulnerabilityDetectionResearch.git
to get the repository -
Run
gen_graphs.py
. The line 166 is the output dir and line 52 is the input data name. This running will end with a file not found error -
Run
glove/ash.sh
andglove/pdg.sh
to generate the GloVe embedding. -
Comment line 55 in
gen_graphs.py
and rungen_graphs.py
again. -
Run
train_test_valid.py
to split the dataset -
Run
main.py
to train and test the model.
Pre-trained model can be downloaded from: https://drive.google.com/file/d/1KQv0aRUFCh-_jQCu8K7uQsB0c_5uCQKa/view?usp=sharing
The relevant test dataset can be downloaded from: https://drive.google.com/file/d/1uMnm7_W9DgXN4AbJ0iUir052H1AF4hA1/view?usp=sharing
Because of the randomness in the deep learning model and the different data splitting, the model performance may be different from the results reported in the paper.
[1] Jiahao Fan, Yi Li, Shaohua Wang, and Tien Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In The 2020 International Conference on Mining Software Repositories (MSR). IEEE.
[2] Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2020. Deep Learning based Vulnerability Detection: Are We There Yet? arXiv preprint arXiv:2009.07235 (2020).
[3] Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems. 10197–10207.