Skip to content
View liusongxiang's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report liusongxiang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A quick guide (especially) for trending instruction finetuning datasets

2,780 178 Updated Nov 28, 2023

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 14,920 1,082 Updated Jan 18, 2025

Sky-T1: Train your own O1 preview model within $450

Python 1,826 193 Updated Jan 17, 2025
Python 827 99 Updated Jan 10, 2025

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 292 17 Updated Jan 14, 2025

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,559 97 Updated Jan 17, 2025

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Jupyter Notebook 2,521 163 Updated Dec 20, 2024

✨✨Latest Advances on Multimodal Large Language Models

13,562 864 Updated Jan 17, 2025

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

444 18 Updated Dec 14, 2024
Python 110 9 Updated Aug 13, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 4,226 258 Updated Jan 11, 2025

The repo provides information about KeSpeech dataset.

131 9 Updated Oct 13, 2022

A generative world for general-purpose robotics & embodied AI learning.

Python 22,908 1,887 Updated Jan 18, 2025

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 7,467 581 Updated Jan 17, 2025

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,714 164 Updated Dec 26, 2024

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

Python 299 32 Updated Jan 8, 2025

Let's build better datasets, together!

Jupyter Notebook 244 29 Updated Dec 20, 2024

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 802 72 Updated Jan 16, 2025

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 234 30 Updated Jan 15, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 2,023 143 Updated Jan 17, 2025

Ongoing research training transformer models at scale

Python 11,116 2,485 Updated Jan 17, 2025

Convert any PDF into a podcast episode!

Python 1,819 200 Updated Dec 7, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 8,985 1,192 Updated Jan 15, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 33,921 5,204 Updated Jan 18, 2025

播客 🎧 编程、设计、Vlog、音乐、访谈、博客...

1,992 109 Updated Oct 6, 2023

o1-engineer is a command-line tool designed to assist developers in managing and interacting with their projects efficiently. Leveraging the power of OpenAI's API, this tool provides functionalitie…

Python 2,861 295 Updated Dec 16, 2024
Jupyter Notebook 1,151 145 Updated Sep 24, 2024

Convert any PDF into a podcast episode!

Python 654 281 Updated Nov 15, 2024

Prompt工程师指南,源自英文版,但增加了AIGC的prompt部分,为了降低同学们的学习门槛,翻译更新

MDX 1,100 116 Updated Sep 14, 2024

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

Python 37,698 5,482 Updated Jan 18, 2025
Next