CoDas4CG

Contests based Dataset for Code Generation

If you are using the dataset, please cite the following paper: H. Liu, M. Shen, J. Zhu, N. Niu, G. Li and L. Zhang, "Deep Learning Based Program Generation from Requirements Text: Are We There Yet?," in IEEE Transactions on Software Engineering, doi: 10.1109/TSE.2020.3018481. Available at: https://ieeexplore.ieee.org/document/9173704

There are Seven folders: AssistantTools , DatasetInSQL, Dataset, TestCases, Tools, CodeOfApproaches and GeneratedPrograms.

/Dataset contains the programming tasks and their corresponding implementations in different programming languages. Each subfolder under /Datset corresponds to a single programming task. Notably, we do not include the commercial script to crawl data.

/TestCases contains the test cases for the programming tasks in folder /Dataset. The names of the subfolders in /TestCases specify the names of the programming tasks. According to such names you may find the corresponding tasks under /Dataset. Notably, such test cases are collected from programming contest websites, and we do not leverage test case generation tools.

/Tools contains the source code of our tool kit.

/GeneratedPrograms contains the programs generated by each approach.

/CodeOfApproaches Implementation of evaluated approaches.

/DatasetInSQL: This folder is composed of a database that contains the whole dataset (Python programs only).

In case your approach is specially designed for Python, this database is strongly suggested for usage (compated to the /Dataset folder).

How to use the database:

#Retrieving original data (without preprocessing)

RetrieveTasks(): Returns the description (requirements) of all tasks, each requirement is a text string

SQL: select question from question
RetrieveTask(ID): Return the task description of the specified ID

SQL: select question from question where numId = ID
RetreiveImplementations()：Return all codes, and each code corresponds to a python file.

SQL：select code from code
RetreiveImplementations（ID）：Return all codes corresponding to the specified topic id, and each code corresponds to a python file. s

SQL: select code from code where numId = ID

#Retrieving processed data (preprocessing includes word segmentation and standarlization, et al.)

RetrieveTasks(): Returns the description (requirements) of all tasks, each requirement is a text string

SQL: select question from process_question
RetrieveTask(ID): Return the task description of the specified ID

SQL: select question from process_question where numId = ID
RetreiveImplementations()：Return all codes, and each code corresponds to a python file.

SQL：select code from process_code
RetreiveImplementations（ID）：Return all codes corresponding to the specified topic id, and each code corresponds to a python file.

SQL: select code from process_code where numId = ID
RetreiveTestCasess（ID）：Returns the test case corresponding to the specified question id.

SQL:select input,output from testcase where numId= ID

/AssistantTools: This folder contains tools to calculate the BLEU, and to detect compilation errors in generated programs.

1.ComputeBLEU（pred, refer）: Retrun BLEU between the generated code pred and the reference code refer

2.ComputeBLEU2（pred,refers）:Retrun BLEU of code pred according to a series of refer

3.hasCompilerErrors（File name）:Check whether the code has static and dynamic compilation errors

4.PreProcessALL(File requirements， File implements): Preprocess related requirements and programs

5.PreProcessReq(File requirements， File implements): Preprocess requirements (tasks)

6.PreProcessImp(File requirements， File implements): Preprocess the program

Copyright.

You should get official permission from the authors for commercial use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoDas4CG

/DatasetInSQL: This folder is composed of a database that contains the whole dataset (Python programs only).

How to use the database:

/AssistantTools: This folder contains tools to calculate the BLEU, and to detect compilation errors in generated programs.

Copyright.

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
AssistantTools		AssistantTools
CodeOfApproaches		CodeOfApproaches
Dataset		Dataset
DatasetInSQL		DatasetInSQL
GeneratedPrograms		GeneratedPrograms
TestCases		TestCases
Tools		Tools
README.md		README.md

ds4an/CoDas4CG

Folders and files

Latest commit

History

Repository files navigation

CoDas4CG

/DatasetInSQL: This folder is composed of a database that contains the whole dataset (Python programs only).

How to use the database:

/AssistantTools: This folder contains tools to calculate the BLEU, and to detect compilation errors in generated programs.

Copyright.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages