Stars
Project Page repo of OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication
fay是一个帮助数字人(2.5d、3d、移动、pc、网页)或大语言模型(openai兼容、deepseek)连通业务系统的agent框架。
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
No fortress, purely open ground. OpenManus is Coming.
Unitree robot sdk version 2. https://support.unitree.com/home/zh/developer
Use GitHub Actions to automatically get Microsoft Edge offline installation package
Make websites accessible for AI agents
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
This is a speech interaction system built on an open-source model, integrating ASR, LLM, and TTS in sequence. The ASR model is SenceVoice, the LLM models are QWen2.5-0.5B/1.5B, and there are three …
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and …
The customization marketplace for Windows programs: https://windhawk.net/
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
A framework helps you quickly build AI Native IDE products. MCP Client, supports Model Context Protocol (MCP) tools via MCP server.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Janus-Series: Unified Multimodal Understanding and Generation Models
Admin Web Interface for juanfont/headscale
An open source, self-hosted implementation of the Tailscale control server
The easiest, most secure way to use WireGuard and 2FA.
🌐 The Internet OS! Free, Open-Source, and Self-Hostable.
Borgo is a statically typed language that compiles to Go.