![:electron: :electron:](https://github.githubassets.com/images/icons/emoji/electron.png)
Stars
A fork of Anthropic Computer Use that you can run on Mac computers to give Claude and other AI models autonomous access to your computer.
Memento is a Python app that records everything you do on your computer and lets you go back in time, search, and chat with a LLM (Large Language Model) to find back information about what you did.
Official repository for our work on micro-budget training of large-scale diffusion models.
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
RAGEN is the first open-source reproduction of DeepSeek-R1 on AGENT training.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Benchmarking physical understanding in generative video models
[ARXIV'25] GameFactory: Creating New Games with Generative Interactive Videos
Large World Model -- Modeling Text and Video with Millions Context
Implementation snake game based on Diffusion model
A generative world for general-purpose robotics & embodied AI learning.
A small open source 3D agent simulator based on LLM.
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Everything about the SmolLM2 and SmolVLM family of models
experiment with concept encoding in LLM
Official Implementation of Iterative Graph Alignment https://arxiv.org/abs/2408.16667
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
NeurIPS 2024 tutorial on LLM Inference
Next Generation Visual Programming System
End-to-end Generative Optimization for AI Agents
A simple OpenAI Gym environment for single and multi-agent reinforcement learning
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
Visualize any repo or codebase into diagram or animation
A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.
Original implementation of "3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes"
Model Context Protocol Servers
Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?