Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dategen in three phases with slurm job #275

Closed
wants to merge 8 commits into from
Closed

dategen in three phases with slurm job #275

wants to merge 8 commits into from

Conversation

hyjorc1
Copy link
Contributor

@hyjorc1 hyjorc1 commented Jun 23, 2020

  1. Download repositories: GitHubRepoBareDownloader.java

  2. Generate seq files for each project: SeqRepoGenerator.java, SeqRepoBuilder.java, run-generator.sh, slurmJob.sh

  3. Combine seq files: SeqRepoCombiner.java, run-combiner.sh

@hridesh
Copy link
Member

hridesh commented Jun 23, 2020

What would be the impact of this change on local data generation, e.g. on just a single machine owned by Boa user?


File input = new File(INPUT_PATH);

DownloadWorker[] workers = new DownloadWorker[THREAD_NUM];
Copy link
Member

@hridesh hridesh Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Boa code is intended to compile using Java 7 or higher, one could use concurrency features such as ExecutorService that is meant to provide all of these features. Just a thought. See https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html

Copy link
Member

@hridesh hridesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be merged provided it works for the local data-generation. The enhancement to use advanced concurrency features could be done later.


boolean assigned = false;
while (!assigned) {
for (int j = 0; j < THREAD_NUM; j++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Boa code is intended to compile using Java 7 or higher, one could use concurrency features such as ExecutorService that is meant to provide all of these features. Just a thought. See https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html

@hyjorc1
Copy link
Contributor Author

hyjorc1 commented Mar 1, 2021

These changes fix and optimize the JSON retriever for extracting repo metadata.

GetReposByLanguage.java takes 4 arguments:

  1. The path to the token file.
  2. JSON files (100 repositories per file) output path.
  3. A number to which the repository stars are greater than or equal to.
  4. The programming languages (split by semicolons, e.g. "java;python") used by the repository.

@hyjorc1 hyjorc1 closed this Aug 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants