Lists (1)
Sort Name ascending (A-Z)
Stars
A modular graph-based Retrieval-Augmented Generation (RAG) system
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
repository for 360 panorama image generation based on Stable Diffusion
Official Implementation of LOTUS: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
CUDA accelerated rasterization of gaussian splatting
GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Inference and training library for high-quality TTS models.
Zero-Shot Speech Editing and Text-to-Speech in the Wild
An automation tool that enumerates subdomains then filters out xss, sqli, open redirect, lfi, ssrf and rce parameters and then scans for vulnerabilities.
Reverse Engineering: Decompiling Binary Code with Large Language Models
Garnet is a remote cache-store from Microsoft Research that offers strong performance (throughput and latency), scalability, storage, recovery, cluster sharding, key migration, and replication feat…
V3D: Video Diffusion Models are Effective 3D Generators
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Upload a photo of your room to generate your dream room with AI.
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
SpeechGPT Series: Speech Large Language Models
A fancy self-hosted monitoring tool
Code for the paper: "ODIN: A Single Model for 2D and 3D Segmentation" (CVPR 2024)
Instant voice cloning by MIT and MyShell.