A framework for the science of machine thinking
ThoughtSource is a central, open resource and community around data and tools related to chain-of-thought reasoning in large language models (Wei 2022). Our long-term goal is to enable trustworthy and robust reasoning in advanced AI systems for driving scientific research and development.
- Create a repository of chain-of-thought (CoT) datasets converted to a unified format. ✅
- Create a conceptual model of different CoT reasoning styles and errors.
- Create tools for diagnosing, annotating and evaluating CoT reasoning.
- Provide models fine-tuned on high-quality CoT data.
- Apply CoT reasoning to high-impact use-cases such as biomedical research or clinical decision making.
Datasets can be browsed online through the Dataset Viewer 🔎.
We created dataloaders that allow you to access the following datasets in a standardized chain-of-thought format. The dataloaders create objects in the Hugginface 🤗 Datasets format. We (sometimes extensively) post-processed the source datasets in different ways to create coherent reasoning chains.
-
commonsense_qa: Multiple-choice commonsense knowledge question answering dataset (Talmor 2018, License: Unknown). Reasoning chains from three different sources are included:
- Human-generated reasoning chains derived from the ECQA dataset (Aggarwal 2021). Used as gold standard. License: Community Data License Agreements Sharing license 1.0.
- AI-generated (few-shot prompting) reasoning chains from Wei 2022. Only available for validation split. License: Unknown
- AI-generated (zero-shot prompting) generated reasoning chains from Kojima 2022. Only available for validation split. License: Unknown
-
strategy_qa: General-domain question-answering data from the StrategyQA dataset, reasoning chains are derived from original dataset. (Geva 2021). License: MIT.
- Human-generated reasoning chains derived from the original dataset. Used as gold standard. License: MIT.
- AI-generated (few-shot) reasoning chains from Wei 2022. Only available for train split. License: Unknown
- AI-generated (zero-shot) generated reasoning chains from Kojima 2022. Only available for train split. License: Unknown
-
qed: General-domain question-answering data and justifications from the QED dataset (Lamm 2020). License: CC BY-SA 3.0.
- worldtree: Scientific question-answering data from the WorldTree v2 dataset (Xie 2020). Human-generated reasoning chains derived from the original dataset. License: Unknown.
- entailment_bank: Science exam questions with expert-authored explanations from the EntailmentBank dataset (Dalvi 2022). Human-generated reasoning chains derived from the original dataset. License: CC BY 4.0. (Note: significant overlap with worldtree v2)
- open_book_qa: Scientific question-answering modeled after open book exams for assessing human understanding from the OpenBookQA dataset (Mihaylov 2018). Human-generated reasoning chains derived from the original dataset. License: Unknown.
- aqua: Math word problems from the AQUA-RAT (Algebra Question Answering with Rationales) dataset (Ling 2017). Reasoning chains derived from the original dataset. License: Apache 2.0.
- asdiv: Math word problems from the Academia Sinica Diverse MWP dataset (Miao 2020). Reasoning chains derived from the original dataset. License: Unknown.
- gsm8k: Math word problems from the GSM8K dataset (Cobbe 2021). Reasoning chains derived from the original dataset. License: MIT.
- mawps: Math word problems from MAWPS, the Math Word Problem Repository dataset (Koncel-Kedziorski 2016). Reasoning chains derived from the original dataset. License: Unknown.
- svamp: Math word problems. Source: SVAMP (Patel 2021). Reasoning chains derived from the original dataset. License: MIT.
We are working on collecting and generating additional datasets, and on further improving the quality of existing datasets (see dataset issues). We welcome suggestions for the inclusion of other datasets!
- dataloader: Library for creating and processing of ThoughtSource datasets (based on the Hugging Face 🤗 Datasets library).
- dataset-viewer: Streamlit application for browsing ThoughtSource datasets
- annotator: Web-based tool for annotating chain-of-thought data (soon to be released)
The annotator allows for highlighting similarities between different generated reasoning chains, making it easier to spot strenghts and weaknesses and to select best results.