-
Shanghai AI Laboratory
- Shanghai
-
20:25
(UTC +08:00)
Lists (3)
Sort Name ascending (A-Z)
Stars
A native PyTorch Library for large model training
星辰语义大模型TeleChat2是由中国电信人工智能研究院研发训练的大语言模型,是首个完全国产算力训练并开源的千亿参数模型
Zero Bubble Pipeline Parallelism
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
alibaba / Megatron-LLaMA
Forked from NVIDIA/Megatron-LMBest practice for training LLaMA models in Megatron-LM
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Ring attention implementation with flash attention
Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
A mod manager for Baldur's Gate 3.
Flash Attention in ~100 lines of CUDA (forward pass only)
Transformers with Arbitrarily Large Context
A natural language interface for computers
High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph
Building a quick conversation-based search demo with Lepton AI.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
FpgaNIC is an FPGA-based Versatile 100Gb SmartNIC for GPUs
FlagScale is a large model toolkit based on open-sourced projects.
Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch.
Fast Hadamard transform in CUDA, with a PyTorch interface
Training and serving large-scale neural networks with auto parallelization.
GLake: optimizing GPU memory management and IO transmission.
Synthesizer for optimal collective communication algorithms