SolenoidWGT

🐋

Guoteng SolenoidWGT

🐋

love coding , love life

24 followers · 45 following

Shanghai AI Laboratory
Shanghai
20:25 (UTC +08:00)

Achievements

x2 x3

Achievements

x2 x3

Lists (3)

Sort

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

pytorch / torchtitan

A native PyTorch Library for large model training

Python 2,319 170 Updated Oct 4, 2024

Tele-AI / TeleChat2

星辰语义大模型TeleChat2是由中国电信人工智能研究院研发训练的大语言模型，是首个完全国产算力训练并开源的千亿参数模型

Python 67 4 Updated Sep 29, 2024

sail-sg / zero-bubble-pipeline-parallelism

Forked from NVIDIA/Megatron-LM

Zero Bubble Pipeline Parallelism

Python 263 13 Updated Sep 4, 2024

lm-sys / RouteLLM

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!

Python 3,058 231 Updated Aug 10, 2024

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,854 309 Updated Oct 4, 2024

alibaba / Megatron-LLaMA

Forked from NVIDIA/Megatron-LM

Best practice for training LLaMA models in Megatron-LM

Python 613 51 Updated Jan 2, 2024

feifeibear / long-context-attention

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 323 20 Updated Sep 19, 2024

zhuzilin / ring-flash-attention

Ring attention implementation with flash attention

Python 547 42 Updated Sep 20, 2024

RulinShao / LightSeq

Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

Python 188 9 Updated Aug 19, 2024

LaughingLeader / BG3ModManager

Forked from LaughingLeader-DOS2-Mods/DivinityModManager

A mod manager for Baldur's Gate 3.

C# 1,226 190 Updated Sep 23, 2024

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 26,484 2,992 Updated Aug 12, 2024

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 23,746 2,651 Updated Oct 2, 2024

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 572 50 Updated Apr 7, 2024

forhaoliu / ringattention

Transformers with Arbitrarily Large Context

Python 620 48 Updated Aug 12, 2024

mindspore-lab / mindformers

Jupyter Notebook 135 20 Updated Sep 10, 2024

OpenInterpreter / open-interpreter

A natural language interface for computers

Python 52,483 4,628 Updated Sep 26, 2024

quiver-team / quiver-feature

High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph

C++ 46 5 Updated Jul 3, 2022

leptonai / search_with_lepton

Building a quick conversation-based search demo with Lepton AI.

TypeScript 7,764 988 Updated Sep 18, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 65,878 9,459 Updated Oct 6, 2024

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 82,701 22,271 Updated Oct 6, 2024

InternLM / InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

Python 288 47 Updated Sep 29, 2024

RC4ML / FpgaNIC

FpgaNIC is an FPGA-based Versatile 100Gb SmartNIC for GPUs

TeX 116 17 Updated Aug 17, 2023

rapidsai / rmm

RAPIDS Memory Manager

C++ 478 195 Updated Oct 5, 2024

FlagOpen / FlagScale

FlagScale is a large model toolkit based on open-sourced projects.

Python 135 41 Updated Oct 5, 2024

ParCoreLab / CPU-Free-model

Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch.

Cuda 16 2 Updated Apr 25, 2024