Stars
Agent Framework / shim to use Pydantic with LLMs
An Open Large Reasoning Model for Real-World Solutions
Efficient Dictionary Learning with Switch Sparse Autoencoders (SAEs)
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
Efficient Deep Learning Systems course materials (HSE, YSDA)
An open-source non-official community implementation of the model from the paper: Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks: https://surgical-robot-transformer.github.io/
Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"
Collection of kernels written in Triton language
Implementation of Bit Diffusion, Hinton's group's attempt at discrete denoising diffusion, in Pytorch
Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
Contextual Object Detection with Multimodal Large Language Models
A UI-Focused Agent for Windows OS Interaction.
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
Scenic: A Jax Library for Computer Vision Research and Beyond
SVIT: Scaling up Visual Instruction Tuning
Data release for the ImageInWords (IIW) paper.
List of references and online resources related to data science, machine learning and deep learning.
Universal LLM Deployment Engine with ML Compilation
Using pre-trained Diffusion models as priors for inference tasks
Generative Diffusion Prior for Unified Image Restoration and Enhancement (CVPR2023)
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
🧬 Generative modeling of regulatory DNA sequences with diffusion probabilistic models 💨
Medical Image Segmentation with Diffusion Model
v objective diffusion inference code for PyTorch.
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Dataset introduced in PlotQA: Reasoning over Scientific Plots