Skip to content

wlg1/Easy-Transformer

 
 

Repository files navigation

This repository contains the code for all experiments in the paper "Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small" (Wang et al, 2022).

This is intended as a one-time code drop. The authors recommend those interested in mechanistic interpretability using the Transformer Lens library. Contact [email protected] or comment on this PR (sadly issues don't work for forks) for proposed changes.

Quick Start

See and run the experiments on Google Colab: https://colab.research.google.com/drive/1n4Wgulv5ev5rgRUL7ypOw0odga9LEWHA?usp=sharing .

Setup

Option 1) install with pip

pip install git+https://github.com/redwoodresearch/Easy-Transformer.git

Option 2) clone repository (for development, and finer tuning)

git clone https://github.com/redwoodresearch/Easy-Transformer/
pip install -r requirements.txt

In this repo

In this repo, you can find the following notebooks (some are in easy_transformer/):

  • experiments.py: a notebook of several of the most interesting experiments of the IOI project.
  • completeness.py: a notebook that generate the completeness plots in the paper, and implements the completeness functions.
  • minimality.py: as above for minimality.
  • advex.py: a notebook that generates adversarial examples as in the paper. `

Easy Transformer

(later renamed "TransformerLens")

An implementation of transformers tailored for mechanistic interpretability.

It supports the importation of open sources models, a convenient handling of hooks to get access to intermediate activations and features to perform simple emperiments such as ablations and patching.

A demo notebook can be found here and a more comprehensive description of the library can be found here

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%