XtractHub File Service is a user metadata file service based on XtractHub from Globus Labs. XtractHub File Service allows users to manage files, process metadata from uploaded files using XtractHub, and view extracted file metadata using a REST API.
These instructions will get a copy of XtractHub File Service running on your local machine for development and testing purposes.
First, clone this repository and activate a virtual environment:
git clone https://github.com/rewong03/xtract_file_service
cd xtract_file_service
python3 -m venv venv
source venv/bin/activate
Next, install the requirements:
pip install -r requirements.txt
deactivate
First, open a terminal and start a redis server:
cd /path/to/redis/
src/redis-server
Then in a second terminal, start a celery worker:
cd /path/to/xtract_file_service/
source venv/bin/activate
venv/bin/celery -A app.celery_app worker -Q celery,priority
In a third terminal, start the flask app:
cd /path/to/xtract_file_service
source venv/bin/activate
venv/bin/flask run
The server is a REST API with no GUI or HTML. All interactions are done using curl
or the xtract_file_service command
line interface. This section is documentation of how to interact with all of the server's features with curl
.
While you can use curl
to interact with the server, it is recommended that you use the xtract_file_service CLI
instead. To learn how to use the CLI, click here.
To create a new user, run:
curl -X POST -d '{"Username": "your_username", "Email": "your_email", "Password": "your_password"}' http://localhost:5000
- This will either return a success message, or a failure message if your input wasn't formatted correctly or if the username has already been taken.
To login, run:
curl -X GET -d '{"Username": "your_username", "Password": "your_password"}' http://localhost:5000/login
- This will return a user ID that will become your
"Authentication"
for other server interactions. Save or copy this so you won't have to login again.
To delete an account, run:
curl -X DELETE -H "Authentication: your_authentication" -d '{"Username": "your_username", "Password": "your_password"}' http://localhost:5000/delete_user
To upload files for automatic metadata processing, run:
curl -X POST -H "Authentication: your_authentication" -H "Extractor: extractor_name" -F "file=@/local/file/path.txt" http://localhost:5000/files
- Note: See the available extractors section below.
- This will return a task ID for the metadata processing job, which can be used to view the status of your job. Omitting
extractor_name
will still return a task ID but will not result in any metadata being processed. - Compressed files will be automatically be decompressed.
To view uploaded files, run:
curl -X GET -H "Authentication: your_authentication" http://localhost:5000/files
- This will return a string containing the names of uploaded files as well as their size.
To delete files, run:
curl -X DELETE -H "Authentication: your_authentication" -d filename http://localhost:5000/files
- This will return a success message or an error message if the file doesn't exist.
To view processed metadata, run:
curl -X GET -H "Authentication: your_authentication" -d filename http://localhost:5000/metadata
- This will return all metadata extracted for a given file.
To process metadata for an uploaded file, run:
curl -X POST -H "Authentication: your_authentication" -d '{"Filename": "filename", "Extractor": "extractor_name"}' http://localhost:5000/metadata
- Note: See the available extractors section below.
- Note: You cannot process metadata for a file and extractor if you have already done so for that file and extractor combination.
- This will return a task ID for the metadata processing job, which can be used to view the status of your job.
Omitting
extractor_name
will still return a task ID but will not result in any metadata being processed.
To delete all metadata for a file, run:
curl -X DELETE -H "Authentication: your_authentication" -d filename http://localhost:5000/metadata
- An additional
-H "Extractor: extractor
header can be passed to specify extractor metadata to delete forfilename
. If omitted, all metadata forfilename
will be deleted.
To view a task status, run:
curl -X GET -d task_id http://localhost:5000/tasks
- This returns a status message for the given task ID.
tabular
:
- Can process tabular/columnar files (.tsv, .csv, etc.).
- Returns preamble, headers, and means, medians, modes, max, min for each column.
- Tabular files containing preamble will automatically process the preamble using the
keyword
extractor.
- Tabular files containing preamble will automatically process the preamble using the
keyword
:
- Can process text files.
- Returns a list of keywords as well as scores for those keywords (calculated using (degree)/(frequency)).
jsonxml
:
- Can process .json/.xml style files.
- Returns depth, headers, columns, and all text found in the file.
- Json/xml files containing text will automatically process the text using the
keyword
extractor.
- Json/xml files containing text will automatically process the text using the
netcdf
:
- Can process netcdf style files.
- Returns attributes, dimensions, size, and variables.
image
:
- Can process any image file.
- Returns a classification of plot, map, map-plot, graphic, or photograph.
map
:
- Can process a map image (works better for maps with text or coordinates).
- Returns any text found from the image as well as locations on the map.
matio
:
- Can process material science files.
matio
was built using the MaterialsIO library. For a more extensive list of accepted file types and outputs, please visit their documentation.