Starred repositories
Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗
Low-latency ONNX and TensorRT based zero-shot classification and detection with contrastive language-image pre-training based prompts
An open source implementation of CLIP.
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
🔊 Text-Prompted Generative Audio Model
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-spee…
Skip YouTube video sponsors (browser extension)
The simplest, fastest repository for training/finetuning medium-sized GPTs.
llama3 implementation one matrix multiplication at a time
Reaching LLaMA2 Performance with 0.1M Dollars
A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
Visualize Your Ideas With Code
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
JavaScript syntax tree transformer, nondestructive pretty-printer, and automatic source map generator
An incremental parsing system for programming tools
GritQL is a query language for searching, linting, and modifying code.
Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"
Simple image captioning model
Rust+OpenCL+AVX2 implementation of LLaMA inference code
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
A PyTorch implementation of EmpiricalMVM
A Survey on video and language understanding.
A high-throughput and memory-efficient inference and serving engine for LLMs
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Display an image created by Vulkan compute shader, with OpenGL
An open-source framework for training large multimodal models.