-
The University of Texas at Austin
- Austin, TX, USA
-
15:12
(UTC -06:00) - zhenyu.gallery
- @KyriectionZhang
Lists (8)
Sort Name ascending (A-Z)
🦾 Benchmarking
LLM Hospital💎 Efficient ML
Prune & Sparse & Quantization & KD & NAS🤖 General Topics
Architectures & Optimization & BlockChain & SSL & Speech & Recsys💍 Large Language Models
Next Step of LLMs🚀 My Stack
Open-source of Our Works💁 Quantum ML
ML for Quantum & Quantum for ML🗼 Toolbox
Visualization & Coding Tool🚩 Trustworthy ML
OoD & Adversarial & BackdoorStars
Training Large Language Model to Reason in a Continuous Latent Space
Minimalistic 4D-parallelism distributed training framework for education purpose
A library for mechanistic interpretability of GPT-style language models
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)
A Telegram bot to recommend arXiv papers
DSPy: The framework for programming—not prompting—language models
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought and OpenAI o1 🍓
Code used in Novy-Marx and Velikov (2024), AI-Powered (Finance) Scholarship
🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch
Recipes to scale inference-time compute of open models
My learning notes/codes for ML SYS.
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
A library for advanced large language model reasoning
Optimisers.jl defines many standard optimisers and utilities for learning loops.
Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead
APOLLO: SGD-like Memory, AdamW-level Performance
Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure (NeurIPS 2024) + Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count (a…
Stochastic Automatic Differentiation library for PyTorch.