Skip to content

Effortless transform academic papers into concise, easy-to-understand threads πŸ§΅βœ¨πŸ“„

Notifications You must be signed in to change notification settings

wiskojo/thread-gpt

Repository files navigation

ThreadGPT

ThreadGPT Logo

Struggling to keep up with the latest AI research papers? ThreadGPT is here to help. It seamlessly transforms complex academic papers into concise, easy-to-understand threads. Not only does it summarize the text, but it also includes relevant figures, tables, and visuals from the papers directly into the threads. πŸ§΅βœ¨πŸ“„

Gradio UI
Gradio App UI

Example Threads
Examples of threads generated by ThreadGPT (@paper_threadoor)

πŸ› οΈ Installation

Clone the repo

git clone https://github.com/wiskojo/thread-gpt

Install dependencies

# Install PyTorch, torchvision, and torchaudio
# Please refer to the official PyTorch website (https://pytorch.org) for the installation command that matches your system. Example:
pip install torch==2.0.0 torchvision==0.15.1

# Install all other dependencies
pip install -r requirements.txt

Configure environment variables

Copy the .env.template file and fill in your OPENAI_API_KEY.

cp .env.template .env

πŸš€ Getting Started

Before proceeding, please ensure that all the installation steps have been successfully completed.

🚨 Cost Warning

Please be aware that usage of GPT-4 with the assistant API can incur high costs. Make sure to monitor your usage and understand the pricing details provided by OpenAI before proceeding.

Gradio

python app.py

CLI

🧡 Create Thread

To create a thread, you can either provide a URL to a file or a local path to a file. Use the following commands:

# For a URL
python thread.py <URL_TO_PDF>

# For a local file
python thread.py <LOCAL_PATH_TO_PDF>

By default, you will find all outputs under ./data/<PDF_NAME>. It will have the following structure.

./data/<PDF_NAME>/
β”œβ”€β”€ figures/
β”‚   β”œβ”€β”€ <figure_1_name>.jpg
β”‚   β”œβ”€β”€ <figure_2_name>.png
β”‚   └── ...
β”œβ”€β”€ <PDF_NAME>.pdf
β”œβ”€β”€ results.json
β”œβ”€β”€ thread.json
β”œβ”€β”€ processed_thread.json
└── processed_thread.md

The final output for user consumption is located at ./data/<PDF_NAME>/processed_thread.md. This file is formatted in Markdown and can be conveniently viewed using any Markdown editor.

All Contents

  1. figures/: This directory contains all the figures, tables, and visuals that have been extracted from the paper.
  2. <PDF_NAME>.pdf: This is the original PDF file.
  3. results.json: This file contains the results of the layout parsing. It includes an index of all figures, their paths, and captions that were passed to OpenAI.
  4. thread.json: This file contains the raw thread that was generated by OpenAI before any post-processing was done.
  5. processed_thread.json: This file is a post-processed version of thread.json. The post-processing includes steps such as removing source annotations and duplicate figures.
  6. processed_thread.md: This is a markdown version of processed_thread.json. It is the final output provided for user consumption.

πŸ“¨ Share Thread

To actually share the thread on X/Twitter, you need to set up the credentials in the .env file. This requires creating a developer account and filling in your CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, and ACCESS_SECRET. Then run this command on the created JSON file:

python tweet.py ./data/<PDF_NAME>/processed_thread.json

πŸ”§ Customize Assistant

ThreadGPT utilizes OpenAI's assistant API. To customize the assistant's behavior, you need to modify the create_assistant.py file. This script has defaults for the prompt, name, tools, and model (gpt-4-1106-preview). You can customize these parameters to your liking.

About

Effortless transform academic papers into concise, easy-to-understand threads πŸ§΅βœ¨πŸ“„

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages