The Landscape of Deep Learning Supply Chain
-
Get all the metadata for every distribution released on PyPI from Google Big query.
SELECT metadata_version, name, version, summary, author, author_email, maintainer, maintainer_email, license, keywords, classifiers, platform, home_page, download_url, requires_python, requires, provides, obsoletes, requires_dist, provides_dist, obsoletes_dist, requires_external, project_urls, upload_time, filename, size, python_version, packagetype, comment_text FROM `bigquery-public-data.pypi.distribution_metadata`
-
Download query results to local file
/fast/pypi/distribution_metadata.json
and import to MongoDB database.mongoimport --db=pypi --collection=distribution_metadata --quiet --drop --numInsertionWorkers=8 --file=/fast/pypi/distribution_metadata.json
-
Parse each package's dependencies from
requires_dist
field.python extract_dependencies.py
-
Parse dependencies versions
python versioned_packages.py
- Collect package metadata from DL SC
python dl_package_metadata.py
- Get the number of dependent packages for each package
python package_stats.py
- Get WoC dependents for each package
# Get import names for each package python top_level_packages.py # extract python dependencies from woc python build_woc_dbs.py python clean_import_names.py
- Get GitHub dependents for each package
# Get the repository url python pkg_repo_url.py # Crawl and parse the dependency network page python github_dependents.py
- Append downstream statistics
python package_stats.py