Skip to content

Latest commit

 

History

History
77 lines (45 loc) · 2.04 KB

README.md

File metadata and controls

77 lines (45 loc) · 2.04 KB

Code Summarization with Strcuture-induced Transformer

This repo serves as the official implementation of ACL 2021 findings paper "Code Summarization with Strcuture-induced Transformer".

If you have any questions, be free to email me.

Dependency

pip install -r requirements.txt

Data

For Python, we follow the pipline in https://github.com/wanyao1992/code_summarization_public.

For Java, we fetch from https://github.com/xing-hu/TL-CodeSum.

In the paper, we write the scripts on our own to parse code into AST. But it is a tough task. We are trying to find a nice way to do so and then experiment under SiT.

For just reproducing the results, you can download the data we used directly from here and put both python and java in the data directory.

The adjacency is too large to load on my personal server. So I allocate a guid for each code snippet in .guid and retrieve them one by one. What you need to do is:

cd sit3
unzip adjacency.zip

Quick Start

Training

cd main
python train.py --dataset_name python --model_name YOUR_MODEL_NAME

See the log through:

vi ../modelx/YOUR_MODEL_NAME.txt

In the paper, we run SiT for 150 epochs. For example in Java:

01/18/2021 01:12:25 PM: [ dev valid official: Epoch = 150 | bleu = 44.89 | rouge_l = 55.25 | Precision = 61.14 | Recall = 57.81 | F1 = 56.95 | examples = 8714 | valid time = 58.93 (s) ]

Testing

python test.py --dataset_name python --beam_size 5 --model_name YOUR_MODEL_NAME

Acknowledgement: The implementation is based on https://github.com/wasiahmad/NeuralCodeSum.

Citation

@inproceedings{hongqiu2021summarization,
 author = {Hongqiu, Wu and Hai, Zhao and Min, Zhang},
 booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL)},
 title = {Code summarization with structure-induced transformer},
 year = {2021}
}