Skip to content

The code recipes, scripts and results of the analysis of the documentation practices of data papers

License

Notifications You must be signed in to change notification settings

JoanGi/DataPaperAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the data supporting the study: "On the Readiness of Scientific Data for a Fair and Transparent Use in Machine learning"

DOI

In this repository you will find:

1 - Full Results: The full results of the extraction process containing 4041 data papers annotated using the scripts in the root of this project

The fullResults.xlsx file contains the whole results of the extraction process, and the ResultsSData.xlsx and ResultsDBrief.xlsx contanins the results for each journal.

2 - Analysis sheet: The sheet with the charts, counts and analysis done to write the study

The FullStudyAnalysis.xlsx contains the full data, the charts, the topic analysis, and high-level insights of the data

3 - Code: The code used to extract the data. One for each journal. This will help into replicating the experiment.

dataPaperScrapping.ipynb notebook contains the code used to filter all the data papers type of both journals, and get the PDF (when possible). If you want to reproduce the experiment you may start by this notebook.

Once you have all the PDF of the journals, SDataExtractor.py and DBriefExtractor.py contains the code to perform the extraction for each journal. Note you will need and OpenAI ApiKey and a GROBID service running to execute the notebooks.

About

The code recipes, scripts and results of the analysis of the documentation practices of data papers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published