source code for build ManyTypes4Py Version 0.8
main script: build_dataset.py collect & download & deduplicate the dataset
git clone https://github.com/LangFeng0912/build_MTV0.8.git
pip install -r build_MTV0.8/requirements.txt
pip install build_MTV0.8/
buildmt build --p raw_projects --l 200 --j 4
[--p]
refers the location to collect the raw dataset :raw_projects
[--l]
refers the numbers of project you want to collect
[--j]
refers the numbers of multi-processors
[--c]
Whether to collect repos from Github [Optional, default=False]
buildmt split --p raw_projects
[--p]
refers the location to collect the raw dataset :raw_projects
requires Ubuntu version ubuntu 20 or newer, based on Libsa4Py
docker build -t libsa4py .
docker run -v [result]:/results libsa4py -l 32 -j 8
[source]
refers the location for the raw dataset in the local machine, for example:raw_projects
[result]
refers the location for the processed dataset in the local machine, for example:processed_projects
[--l]
refers the number of projects to download
[--j]
refers the number of processors to use parallel