Skip to content
View SolenoidWGT's full-sized avatar
🐋
🐋
  • Shanghai AI Laboratory
  • Shanghai
  • 20:25 (UTC +08:00)

Block or report SolenoidWGT

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

A native PyTorch Library for large model training

Python 2,319 170 Updated Oct 4, 2024

星辰语义大模型TeleChat2是由中国电信人工智能研究院研发训练的大语言模型,是首个完全国产算力训练并开源的千亿参数模型

Python 67 4 Updated Sep 29, 2024

Zero Bubble Pipeline Parallelism

Python 263 13 Updated Sep 4, 2024

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!

Python 3,058 231 Updated Aug 10, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,854 309 Updated Oct 4, 2024

Best practice for training LLaMA models in Megatron-LM

Python 613 51 Updated Jan 2, 2024

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 323 20 Updated Sep 19, 2024

Ring attention implementation with flash attention

Python 547 42 Updated Sep 20, 2024

Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

Python 188 9 Updated Aug 19, 2024

A mod manager for Baldur's Gate 3.

C# 1,226 190 Updated Sep 23, 2024

The official Meta Llama 3 GitHub site

Python 26,484 2,992 Updated Aug 12, 2024

LLM training in simple, raw C/CUDA

Cuda 23,746 2,651 Updated Oct 2, 2024

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 572 50 Updated Apr 7, 2024

Transformers with Arbitrarily Large Context

Python 620 48 Updated Aug 12, 2024
Jupyter Notebook 135 20 Updated Sep 10, 2024

A natural language interface for computers

Python 52,483 4,628 Updated Sep 26, 2024

High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph

C++ 46 5 Updated Jul 3, 2022

Building a quick conversation-based search demo with Lepton AI.

TypeScript 7,764 988 Updated Sep 18, 2024

LLM inference in C/C++

C++ 65,878 9,459 Updated Oct 6, 2024

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 82,701 22,271 Updated Oct 6, 2024

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

Python 288 47 Updated Sep 29, 2024

FpgaNIC is an FPGA-based Versatile 100Gb SmartNIC for GPUs

TeX 116 17 Updated Aug 17, 2023

RAPIDS Memory Manager

C++ 478 195 Updated Oct 5, 2024

FlagScale is a large model toolkit based on open-sourced projects.

Python 135 41 Updated Oct 5, 2024

Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch.

Cuda 16 2 Updated Apr 25, 2024

Fast Hadamard transform in CUDA, with a PyTorch interface

C 95 14 Updated May 24, 2024

Training and serving large-scale neural networks with auto parallelization.

Python 3,053 353 Updated Dec 9, 2023

GLake: optimizing GPU memory management and IO transmission.

Python 358 33 Updated Aug 3, 2024

Synthesizer for optimal collective communication algorithms

Python 95 23 Updated Apr 8, 2024
HTML 72 11 Updated Dec 2, 2022
Next