Skip to content
View SSebo's full-sized avatar

Organizations

@FeatureProbe

Block or report SSebo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SGLang is a fast serving framework for large language models and vision language models.

Python 5,465 409 Updated Oct 8, 2024

Large Language Model Text Generation Inference

Python 8,880 1,048 Updated Oct 8, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,605 177 Updated Oct 7, 2024

Large Language Model (LLM) Systems Paper List

594 24 Updated Oct 6, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,233 115 Updated Oct 8, 2024

Sample codes for my CUDA programming book

Cuda 1,536 319 Updated Jul 27, 2023

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 10,823 1,055 Updated Aug 15, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 5,471 924 Updated Oct 7, 2024

A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer

C++ 84 21 Updated Feb 28, 2024

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 6,205 659 Updated Sep 30, 2024

Robust Speech Recognition via Large-Scale Weak Supervision

Python 69,069 8,125 Updated Sep 30, 2024

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

Python 12,190 916 Updated Oct 8, 2024

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 133,078 26,563 Updated Oct 8, 2024

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 8,148 1,459 Updated Oct 8, 2024

Source code for "On the Relationship between Self-Attention and Convolutional Layers"

Python 1,077 127 Updated Jan 10, 2023

GoReplay is an open-source tool for capturing and replaying live HTTP traffic into a test environment in order to continuously test your system with real data. It can be used to increase confidence…

Go 18,597 19 Updated Jul 28, 2024

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 36,593 5,766 Updated Aug 19, 2024

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 19,958 2,469 Updated Aug 15, 2024

Development repository for the Triton language and compiler

C++ 12,961 1,577 Updated Oct 8, 2024

Distributed LLM and StableDiffusion inference for mobile, desktop and server.

Rust 2,520 132 Updated Aug 30, 2024

Boot WSL2 machine with static IP

PowerShell 96 13 Updated Dec 21, 2022

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,485 197 Updated Oct 8, 2024

LLM 推理服务性能测试

Jupyter Notebook 25 2 Updated Dec 17, 2023

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 1,489 152 Updated Aug 17, 2024

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Python 1,383 143 Updated Sep 27, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 28,016 4,137 Updated Oct 8, 2024

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,172 850 Updated Sep 13, 2024

Spising: ⚡️Open-source AI LangChain-like RAG (Retrieval-Augmented Generation) knowledge database with web UI and Enterprise SSO⚡️, supports OpenAI, Azure, LLaMA, Google Gemini, HuggingFace, Claude,…

Go 2,745 352 Updated Oct 1, 2024

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

Python 10,425 1,253 Updated Oct 8, 2024
Next