Struggling to keep up with the latest AI research papers? ThreadGPT is here to help. It seamlessly transforms complex academic papers into concise, easy-to-understand threads. Not only does it summarize the text, but it also includes relevant figures, tables, and visuals from the papers directly into the threads. π§΅β¨π
Examples of threads generated by ThreadGPT (@paper_threadoor)
git clone https://github.com/wiskojo/thread-gpt
# Install PyTorch, torchvision, and torchaudio
# Please refer to the official PyTorch website (https://pytorch.org) for the installation command that matches your system. Example:
pip install torch==2.0.0 torchvision==0.15.1
# Install all other dependencies
pip install -r requirements.txt
Copy the .env.template
file and fill in your OPENAI_API_KEY
.
cp .env.template .env
Before proceeding, please ensure that all the installation steps have been successfully completed.
Please be aware that usage of GPT-4 with the assistant API can incur high costs. Make sure to monitor your usage and understand the pricing details provided by OpenAI before proceeding.
python app.py
To create a thread, you can either provide a URL to a file or a local path to a file. Use the following commands:
# For a URL
python thread.py <URL_TO_PDF>
# For a local file
python thread.py <LOCAL_PATH_TO_PDF>
By default, you will find all outputs under ./data/<PDF_NAME>
. It will have the following structure.
./data/<PDF_NAME>/
βββ figures/
β βββ <figure_1_name>.jpg
β βββ <figure_2_name>.png
β βββ ...
βββ <PDF_NAME>.pdf
βββ results.json
βββ thread.json
βββ processed_thread.json
βββ processed_thread.md
The final output for user consumption is located at ./data/<PDF_NAME>/processed_thread.md
. This file is formatted in Markdown and can be conveniently viewed using any Markdown editor.
figures/
: This directory contains all the figures, tables, and visuals that have been extracted from the paper.<PDF_NAME>.pdf
: This is the original PDF file.results.json
: This file contains the results of the layout parsing. It includes an index of all figures, their paths, and captions that were passed to OpenAI.thread.json
: This file contains the raw thread that was generated by OpenAI before any post-processing was done.processed_thread.json
: This file is a post-processed version ofthread.json
. The post-processing includes steps such as removing source annotations and duplicate figures.processed_thread.md
: This is a markdown version ofprocessed_thread.json
. It is the final output provided for user consumption.
To actually share the thread on X/Twitter, you need to set up the credentials in the .env
file. This requires creating a developer account and filling in your CONSUMER_KEY
, CONSUMER_SECRET
, ACCESS_KEY
, and ACCESS_SECRET
. Then run this command on the created JSON file:
python tweet.py ./data/<PDF_NAME>/processed_thread.json
ThreadGPT utilizes OpenAI's assistant API. To customize the assistant's behavior, you need to modify the create_assistant.py
file. This script has defaults for the prompt, name, tools, and model (gpt-4-1106-preview
). You can customize these parameters to your liking.