llm-testbed

Python 3 interface used to extract data from PubMed publications using LLMs, part of the PubLLican project.

See the Experimental branch for latest updates, although this may not run without configureation changes.

Setup

Create create and activate a virtual environment if your IDE does not do so automatically
Install package dependencies by running pip install requirements.txt
Create .env file by running cp .env.example .env
Be careful as this will overwrite your current .env file in case you already have one setup
Add any API keys or other environment variables to .env file
Create a config file by running cp config.json.example config.json
Be carefulas this will overwrite your current config.json file in case you already have one setup
Run setup script by running python setup.py

Configuration

Most things are able to be configured in config.json if desired. The fields are pretty self-explanatory.

Changing LLM

In the config file, there is a field called "llm", which looks something like this:

{
  "llm": {
    "current": {
      "type": "anthropic",
      "model": "claude-3-haiku-20240307"
    }
  },
  "rest of config.json file..."
}

The type parameter tells the llms package what model type it is and what code to run for it to work with that model. Here are the currently supported types:

Type	Description	Requirements
`anthropic`	Anthropic's language-based models e.g. Claude	`$ANTHROPIC_API_KEY` environment variable must be set
`openai`	OpenAI's language-based models e.g. ChatGPT	`$OPENAI_API_KEY` environment variable must be set

The model parameter tells the API what specific model to use (if applicable). See documentation for more details.

PRs adding support for more LLMs are welcome

Running the workflow

(Pipeline for the whole workflow is coming soon. For now, the steps can be run manually.)

To run the workflow manually:

Download the paper. There are two options:
- To get the paper JSON (preferred), run python getPaperJSON.py <pmid>
- To get the paper PDF, run: python getPaperPDF.py <pmid>
Note that not every publication will have a downloadable PDF, in which case getPaperJSON can be used instead
Convert the paper into plaintext
- If getPaperJSON was used, run python getTextFromJSON.py <pmid>
- If getPaperPDF was used, run python getTextFromPDF.py <pmid>
Query the LLM for the paper's species by running python getPaperSpecies.py <pmid>
Query the LLM for the paper's genes by running python getPaperGeness.py <pmid>
Query the LLM for the paper's GO terms by running python getPaperGOTerms.py <pmid>
Validate the GO terms by running python validateGOTermDescriptions.py <pmid>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-testbed

Contents

Setup

Configuration

Changing LLM

Running the workflow

(Pipeline for the whole workflow is coming soon. For now, the steps can be run manually.)

To run the workflow manually:

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
llms		llms
references		references
utils		utils
.env.example		.env.example
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
config.json.example		config.json.example
getPaperGOTerms.py		getPaperGOTerms.py
getPaperGenes.py		getPaperGenes.py
getPaperJSON.py		getPaperJSON.py
getPaperPDF.py		getPaperPDF.py
getPaperSpecies.py		getPaperSpecies.py
getTextFromJSON.py		getTextFromJSON.py
getTextFromPDF.py		getTextFromPDF.py
requirements.txt		requirements.txt
scoreGOTerms.py		scoreGOTerms.py
setup.py		setup.py
validateGOTermDescriptions.py		validateGOTermDescriptions.py

tonyatliv/llm-testbed

Folders and files

Latest commit

History

Repository files navigation

llm-testbed

Contents

Setup

Configuration

Changing LLM

Running the workflow

(Pipeline for the whole workflow is coming soon. For now, the steps can be run manually.)

To run the workflow manually:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages