Stars
A collaborative note taking, wiki and documentation platform that scales. Built with Django and React. Opensource alternative to Notion or Outline.
Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!
Lawma: A lightly fine-tuned Llama model for legal classification tasks.
Code to reproduce the experiments in the paper Training on the Test Task Confounds Evaluation and Emergence.
BenchBench is a Python package to evaluate multi-task benchmarks.
Code to reproduce the paper "Do causal predictors generalize better to new domains?"
The accompanying code of "What Makes ImageNet Look Unlike LAION."
Code to reproduce the paper "Questioning the Survey Responses of Large Language Models"
Achieve error-rate fairness between societal groups for any score-based classifier.
Test-time-training on nearest neighbors for large language models
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Compute the inverse of a matrix using the Gauss-Jordan method.
Datasets derived from US census data
This repository provides example code for loading and analyzing data from AHRQ's Medical Expenditure Panel Survey (MEPS). More information about the survey and access to public use data files is av…
Replication materials for "Measuring the predictability of life outcomes using a scientific mass collaboration"
A Python package to assess and improve fairness of machine learning models.
Package for typesetting a book into PDF and HTML using pandoc and a bunch of other tools
A Python sandbox for decision making in dynamics
Differentially private synthetic data
Compile markdown into an html and pdf book based on pandoc.
A work-in-progress, open-source, multi-player city simulation game.
signal-cli provides an unofficial commandline, JSON-RPC and dbus interface for the Signal messenger.
Starter files for using Pandoc Markdown with Tufte CSS
A tool that translates augmented markdown into HTML or latex