Stars
Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"
Simple, unified interface to multiple Generative AI providers
Solve Visual Understanding with Reinforced VLMs
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
Benchmark to evaluate different LLMs for pragmatic (individual/context-specific) harms
aider is AI pair programming in your terminal
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
We focus on the behavior of AI, and the Cyber Soul. We investigate the alignment dynamics with deliberately designed experiments.
Inspect: A framework for large language model evaluations
A trivial programmatic Llama 3 jailbreak. Sorry Zuck!
A collection of different ways to implement accessing and modifying internal model activations for LLMs
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
Evaluating LLMs with fewer examples
Explore what LLMs are really leanring over SFT
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
Representation Engineering: A Top-Down Approach to AI Transparency
Solve puzzles. Improve your pytorch.
Conference schedule, top papers, and analysis of the data for NeurIPS 2023!
List of papers on hallucination detection in LLMs.
efosong / epymarl
Forked from uoe-agents/epymarlAn extension of the PyMARL codebase that includes additional algorithms and environment support