SSebo

SSebo SSebo

28 followers · 40 following

Organizations

Stars

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 5,465 409 Updated Oct 8, 2024

AmadeusChan / Awesome-LLM-System-Papers

482 22 Updated Sep 5, 2024

huggingface / text-generation-inference

Large Language Model Text Generation Inference

Python 8,880 1,048 Updated Oct 8, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,605 177 Updated Oct 7, 2024

AmberLJC / LLMSys-PaperList

Large Language Model (LLM) Systems Paper List

594 24 Updated Oct 6, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,233 115 Updated Oct 8, 2024

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

Cuda 1,536 319 Updated Jul 27, 2023

facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 10,823 1,055 Updated Aug 15, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 5,471 924 Updated Oct 7, 2024

tlc-pack / cutlass_fpA_intB_gemm

A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer

C++ 84 21 Updated Feb 28, 2024

modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 6,205 659 Updated Sep 30, 2024

openai / whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Python 69,064 8,125 Updated Sep 30, 2024

chidiwilliams / buzz

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

Python 12,190 916 Updated Oct 8, 2024

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 133,076 26,561 Updated Oct 8, 2024

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 8,148 1,459 Updated Oct 8, 2024

epfml / attention-cnn

Source code for "On the Relationship between Self-Attention and Convolutional Layers"

Python 1,077 127 Updated Jan 10, 2023

buger / goreplay

Forked from taboola/goreplay

GoReplay is an open-source tool for capturing and replaying live HTTP traffic into a test environment in order to continuously test your system with real data. It can be used to increase confidence…

Go 18,597 19 Updated Jul 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSebo SSebo

Organizations

Block or report SSebo

Stars

sgl-project / sglang

AmadeusChan / Awesome-LLM-System-Papers

huggingface / text-generation-inference

DefTruth / Awesome-LLM-Inference

AmberLJC / LLMSys-PaperList

flashinfer-ai / flashinfer

brucefan1983 / CUDA-Programming

facebookresearch / seamless_communication

NVIDIA / cutlass

tlc-pack / cutlass_fpA_intB_gemm

modelscope / FunASR

openai / whisper

chidiwilliams / buzz

huggingface / transformers

triton-inference-server / server

epfml / attention-cnn

buger / goreplay

karpathy / nanoGPT

karpathy / minGPT

triton-lang / triton

evilsocket / cake

ocroz / wsl2-boot

ModelTC / lightllm

pandada8 / llm-inference-benchmark

gkamradt / LLMTest_NeedleInAHaystack

jiaweizzhao / GaLore

vllm-project / vllm

OpenBMB / MiniCPM-V

casibase / casibase

danswer-ai / danswer