Welcome to the repository for the research paper: "Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers." Our paper has been accepted to the 47th International Conference on Software Engineering (ICSE 2025).
Experiments are conducted using Python 3.9.7 on an Ubuntu 22.04.1 server.
To install all required packages, navigate to the root directory of this project and run:
pip install -r requirements.txt
To prepare the datasets used in our study:
-
Navigate to the
code-generation
directory. -
Obtain datasets from either:
-
Update the data paths and model specifications in
generate.py
to reflect your local setup. -
Execute the data generation script:
python generate.py
Note: You can skip the empirical study if you are only interested in detecting machine-generated code with DetectCodeGPT.
After data preparation, you can proceed to the empirical analysis:
-
Navigate to the
code-analysis
directory. -
Analyze code length:
python analyze_length.py
-
Verify Zipf's and Heaps' laws, and compute token frequencies:
python analyze_law_and_frequency.py
-
Analyze the proportion of different token categories:
python analyze_proportion.py
-
Study the naturalness of code snippets:
python analyze_naturalness.py
To evaluate our DetectCodeGPT model:
-
Navigate to the
code-detection
directory. -
Configure
main.py
with the appropriate model and dataset paths. -
Run the model evaluation script:
python main.py
Note: If you are using your custom model to generate code, please update
'base_model_name': "codellama/CodeLlama-7b-hf"
inmain.py
to your model name during the detection stage.
The code is modified based on the original repositories of DetectGPT and DetectLLM. We thank the authors for their contributions.
If you use DetectCodeGPT in your research, please cite our paper:
@inproceedings{shi2025detectcodegpt,
title={Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers},
author={Shi, Yuling and Zhang, Hongyu and Wan, Chengcheng and Gu, Xiaodong},
booktitle={Proceedings of the 47th International Conference on Software Engineering (ICSE 2025)},
year={2025},
organization={IEEE}
}