Skip to content
View sky-walker's full-sized avatar

Block or report sky-walker

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥

Python 31,905 2,120 Updated Feb 22, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 7,924 561 Updated Feb 20, 2025

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 17,136 1,310 Updated Feb 23, 2025
Python 3,300 250 Updated Feb 21, 2025

Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.

Go 129,088 10,525 Updated Feb 24, 2025

Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…

Python 5,780 494 Updated Feb 24, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 39,146 5,863 Updated Feb 24, 2025

SOTA Open Source TTS

Python 19,473 1,506 Updated Feb 18, 2025

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,346 1,113 Updated Nov 14, 2024

Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms

Python 2,368 237 Updated Feb 24, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,539 604 Updated Feb 24, 2025

This is a speech interaction system built on an open-source model, integrating ASR, LLM, and TTS in sequence. The ASR model is SenceVoice, the LLM models are QWen2.5-0.5B/1.5B, and there are three …

Python 427 84 Updated Jan 12, 2025

End-to-end stack for WebRTC. SFU media server and SDKs.

Go 11,678 1,010 Updated Feb 24, 2025

呼叫中心,智能外呼,大模型呼入机器人,大模型呼出机器人,客服系统,工单系统,开源呼叫中心系统,话务系统,智能外呼系统,智能电话外呼,呼叫中心系统,大模型客服,电话外呼,客服中心,在线客服,大模型呼叫中心,呼入机器人,大模型机器人,智能电话外呼,开源呼叫中心系统,电话外呼,在线客服,大模型callcenter,contactcenter,Call,IPCC,Customer Service,V…

26 4 Updated Feb 24, 2025

A latent text-to-image diffusion model

Jupyter Notebook 69,685 10,337 Updated Jun 18, 2024

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 18,638 1,332 Updated Feb 21, 2025

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程

Jupyter Notebook 13,006 1,498 Updated Feb 19, 2025

Get your documents ready for gen AI

Python 22,003 1,251 Updated Feb 24, 2025

Repository containing all necessary codes to get started on the SoccerNet Action Spotting challenge. This repository also contains several benchmark methods.

Python 69 9 Updated Feb 7, 2024

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 14,506 1,677 Updated Feb 12, 2025

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Go 32,661 3,039 Updated Feb 24, 2025

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

Python 29,749 2,970 Updated Feb 9, 2025

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 8,784 801 Updated Feb 24, 2025

LlamaIndex is the leading framework for building LLM-powered agents over your data.

Python 39,245 5,594 Updated Feb 24, 2025
Jupyter Notebook 3,479 1,036 Updated Jul 9, 2024

"LightRAG: Simple and Fast Retrieval-Augmented Generation"

Python 12,137 1,691 Updated Feb 24, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 11,021 1,076 Updated Feb 16, 2025

GLM-4-Voice | 端到端中英语音对话模型

Python 2,683 216 Updated Dec 5, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,168 275 Updated Nov 5, 2024
Next