Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
src/python/paperai		src/python/paperai
test/python		test/python
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
demo.png		demo.png
setup.py		setup.py

Repository files navigation

paperai: AI-powered literature discovery and review engine for medical/scientific papers

paperai is an AI-powered literature discovery and review engine for medical/scientific papers. paperai is used to analyze the COVID-19 Open Research Dataset (CORD-19) dataset, winning multiple awards in the CORD-19 Kaggle challenge.

paperai builds an index over medical articles to assist in analysis and data discovery. With the CORD-19 challenge, a series of COVID-19 related research topics were explored to identify relevant articles and help find answers to key scientific questions. paperai can be applied to other medical and scientific research domains.

paperai and/or NeuML has been recognized in the following articles:

Installation

The easiest way to install is via pip and PyPI

pip install paperai

You can also install paperai directly from GitHub. Using a Python Virtual Environment is recommended.

pip install git+https://github.com/neuml/paperai

Python 3.6+ is supported

Check out troubleshooting link to help resolve environment-specific install issues.

Building a model

paperai indexes models previously built with paperetl. paperai currently supports querying SQLite databases.

To build an index for a SQLite articles database:

# Can optionally use pre-trained vectors
# https://www.kaggle.com/davidmezzetti/cord19-fasttext-vectors#cord19-300d.magnitude
# Default location: ~/.cord19/vectors/cord19-300d.magnitude
python -m paperai.vectors

# Build embeddings index
python -m paperai.index

The model will be stored in ~/.cord19

See the CORD-19 Analysis with Sentence Embeddings notebook for a comprehensive example of paperai in action.

Building a report file

A report file is simply a markdown file created from a list of queries. An example report call:

python -m paperai.report tasks/risk-factors.yml

Once complete a file named tasks/risk-factors.md will be created.

Running queries

The fastest way to run queries is to start a paperai shell

paperai

A prompt will come up. Queries can be typed directly into the console.

Tech Overview

The tech stack is built on Python and creates a sentence embeddings index with FastText + BM25. Background on this method can be found in this Medium article and an existing repository using this method codequestion.

The model is a combination of the sentence embeddings index and a SQLite database with the articles. Each article is parsed into sentences and stored in SQLite along with the article metadata. FastText vectors are built over the full corpus. The sentence embeddings index only uses tagged articles, which helps produce most relevant results.

Multiple entry points exist to interact with the model.

paperai.report - Builds a markdown report for a series of queries. For each query, the best articles are shown, top matches from those articles and a highlights section which shows the most relevant sections from the embeddings search for the query.
paperai.query - Runs a single query from the terminal
paperai.shell - Allows running multiple queries from the terminal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

paperai: AI-powered literature discovery and review engine for medical/scientific papers

Installation

Building a model

Building a report file

Running queries

Tech Overview

About

Releases

Packages

Languages

License

xuaikun/paperai

Folders and files

Latest commit

History

Repository files navigation

paperai: AI-powered literature discovery and review engine for medical/scientific papers

Installation

Building a model

Building a report file

Running queries

Tech Overview

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages