AI Model Information Extractor

This project is designed to automatically extract and analyze information about AI models from academic papers. It processes both PDF and LaTeX sources, extracts text and images, and uses advanced natural language processing techniques to answer specific questions about the models described in the papers.

Features

Paper acquisition from sources like arXiv
Content extraction from PDF and LaTeX files
Text and image analysis using advanced AI models (Claude and GPT-4)
Information extraction for various model fields (e.g., parameters, training compute, dataset size)
Reasoning and calculation based on extracted information
User interface for validation and results viewing

Installation

Clone the repository
Install the required dependencies:

pip install -r requirements.txt

Set up your environment variables:

Create a .env file in the root directory and add your OpenAI API key:

OPENAI_API_KEY=your_api_key_here

Usage

Run the main script to process a paper:

python main.py

The script will download the paper, extract information, and present a user interface for validation and viewing results.

Project Structure

src/: Contains the main source code
- paper_acquisition/: Handles downloading papers
- content_extraction/: Processes PDF and LaTeX files
- information_extraction/: Analyzes text and images
- reasoning/: Performs calculations and reasoning on extracted data
- user_interface/: Provides GUI for validation and results viewing
tests/: Contains unit tests
data/: Stores downloaded papers and extracted data
config/: Contains configuration files, including questions.yaml

Key Components

PaperDownloader: Downloads papers from sources like arXiv
PDFProcessor and LaTeXProcessor: Extract content from papers
TextAnalyzer and ImageAnalyzer: Analyze extracted content
PromptingSystem: Manages interactions with AI models for information extraction
ReasoningCalculator: Performs final calculations and reasoning
ValidationInterface and ResultsViewer: Provide user interfaces for interaction

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
config		config
src		src
tests		tests
.gitignore		.gitignore
app.py		app.py
environment.yml		environment.yml
main.py		main.py
playground.ipynb		playground.ipynb
readme.md		readme.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Model Information Extractor

Features

Installation

Usage

Project Structure

Key Components

Contributing

License

About

Releases

Packages

Languages

JayThibs/epoch-paper-extractor

Folders and files

Latest commit

History

Repository files navigation

AI Model Information Extractor

Features

Installation

Usage

Project Structure

Key Components

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages