A framework for the science of machine thinking
ThoughtSource is a central, open resource and community around data and tools related to chain-of-thought reasoning in large language models (Wei 2022). Our long-term goal is to enable trustworthy and robust reasoning in advanced AI systems for driving scientific research and development.
- Create a repository of chain-of-thought (CoT) datasets converted to a unified format. ✅
- Create a conceptual model of different CoT reasoning styles and errors.
- Create tools for diagnosing, annotating and evaluating CoT reasoning.
- Provide models fine-tuned on high-quality CoT data.
- Apply CoT reasoning to high-impact use-cases such as biomedical research or clinical decision making.
Datasets can be browsed online through the Dataset Viewer 🔎.
We created dataloaders that allow you to access the following datasets in a standardized chain-of-thought format. The dataloaders create objects in the Hugginface 🤗 Datasets format.
- commonsense_qa: Multiple-choice commonsense knowledge question answering dataset (Talmor 2018) enriched with explanations ECQA (Aggarwal 2021). License: Unknown for CommonsenseQA, Community Data License Agreements Sharing license 1.0 for ECQA.
- strategy_qa: General-domain question-answering data from the StrategyQA dataset (Geva 2021). License: MIT.
- qed: General-domain question-answering data from the QED dataset (Lamm 2020). License: CC BY-SA 3.0.
- worldtree: Scientific question-answering data from the WorldTree v2 dataset (Xie 2020) License: Unknown.
- entailment_bank: Science exam questions with expert-authored explanations from the EntailmentBank dataset (Dalvi 2022). License: CC BY 4.0.
- open_book_qa: Scientific question-answering modeled after open book exams for assessing human understanding from the OpenBookQA dataset (Mihaylov 2018). License: Unknown.
- aqua: Math word problems from the AQUA-RAT (Algebra Question Answering with Rationales) dataset (Ling 2017). License: Apache 2.0.
- asdiv: Math word problems from the Academia Sinica Diverse MWP dataset (Miao 2020). License: Unknown.
- gsm8k: Math word problems from the GSM8K dataset (Cobbe 2021). License: MIT.
- mawps: Math word problems from MAWPS, the Math Word Problem Repository dataset (Koncel-Kedziorski 2016). License: Unknown.
- svamp: Math word problems. Source: SVAMP (Patel 2021) License: MIT.
We are working on collecting and generating additional datasets, and on further improving the quality of existing datasets (see dataset issues). We welcome suggestions for the inclusion of other datasets!
- dataloader: Library for creating and processing of ThoughtSource datasets (based on the Hugging Face 🤗 Datasets library).
- dataset-viewer: Streamlit application for browsing ThoughtSource datasets
- annotator: Web-based tool for annotating chain-of-thought data (soon to be released)
The annotator allows for highlighting similarities between different generated reasoning chains, making it easier to spot strenghts and weaknesses and to select best results.