Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw
A Mutation Testing Framework of In-Context Learning Systems
Code and example data for the paper: Rule Based Rewards for Language Model Safety
Code release for "Improved baselines for vision-language pre-training"
Schedule-Free Optimization in PyTorch
🚀 Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Utilities intended for use with Llama models.
Official code for NeurIPS 2023 paper "Laplacian Canonization: A Minimalist Approach to Sign and Basis Invariant Spectral Embedding".
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
Transformers are Sample-Efficient World Models. ICLR 2023, notable top 5%.
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
The nnsight package enables interpreting and manipulating the internals of deep learned models.
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
A library for mechanistic interpretability of GPT-style language models
Diffusion on syntax trees for program synthesis
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift. ICML 2024 and ICLRW-DMLR 2024
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥