Skip to content

ds4an/CoDas4CG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoDas4CG

Contests based Dataset for Code Generation

If you are using the dataset, please cite the following paper: H. Liu, M. Shen, J. Zhu, N. Niu, G. Li and L. Zhang, "Deep Learning Based Program Generation from Requirements Text: Are We There Yet?," in IEEE Transactions on Software Engineering, doi: 10.1109/TSE.2020.3018481. Available at: https://ieeexplore.ieee.org/document/9173704

There are Seven folders: AssistantTools , DatasetInSQL, Dataset, TestCases, Tools, CodeOfApproaches and GeneratedPrograms.

/Dataset contains the programming tasks and their corresponding implementations in different programming languages. Each subfolder under /Datset corresponds to a single programming task. Notably, we do not include the commercial script to crawl data.

/TestCases contains the test cases for the programming tasks in folder /Dataset. The names of the subfolders in /TestCases specify the names of the programming tasks. According to such names you may find the corresponding tasks under /Dataset. Notably, such test cases are collected from programming contest websites, and we do not leverage test case generation tools.

/Tools contains the source code of our tool kit.

/GeneratedPrograms contains the programs generated by each approach.

/CodeOfApproaches Implementation of evaluated approaches.

/DatasetInSQL: This folder is composed of a database that contains the whole dataset (Python programs only).

In case your approach is specially designed for Python, this database is strongly suggested for usage (compated to the /Dataset folder).

How to use the database:

#Retrieving original data (without preprocessing)

  1. RetrieveTasks(): Returns the description (requirements) of all tasks, each requirement is a text string

    SQL: select question from question

  2. RetrieveTask(ID): Return the task description of the specified ID

    SQL: select question from question where numId = ID

  3. RetreiveImplementations():Return all codes, and each code corresponds to a python file.

    SQL:select code from code

  4. RetreiveImplementations(ID):Return all codes corresponding to the specified topic id, and each code corresponds to a python file. s

    SQL: select code from code where numId = ID

#Retrieving processed data (preprocessing includes word segmentation and standarlization, et al.)

  1. RetrieveTasks(): Returns the description (requirements) of all tasks, each requirement is a text string

    SQL: select question from process_question

  2. RetrieveTask(ID): Return the task description of the specified ID

    SQL: select question from process_question where numId = ID

  3. RetreiveImplementations():Return all codes, and each code corresponds to a python file.

    SQL:select code from process_code

  4. RetreiveImplementations(ID):Return all codes corresponding to the specified topic id, and each code corresponds to a python file.

    SQL: select code from process_code where numId = ID

  5. RetreiveTestCasess(ID):Returns the test case corresponding to the specified question id.

    SQL:select input,output from testcase where numId= ID

/AssistantTools: This folder contains tools to calculate the BLEU, and to detect compilation errors in generated programs.

1.ComputeBLEU(pred, refer): Retrun BLEU between the generated code pred and the reference code refer

2.ComputeBLEU2(pred,refers):Retrun BLEU of code pred according to a series of refer

3.hasCompilerErrors(File name):Check whether the code has static and dynamic compilation errors

4.PreProcessALL(File requirements, File implements): Preprocess related requirements and programs

5.PreProcessReq(File requirements, File implements): Preprocess requirements (tasks)

6.PreProcessImp(File requirements, File implements): Preprocess the program

Copyright.

You should get official permission from the authors for commercial use.

About

Contests based Dataset for Code Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •