Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
A latent text-to-image diffusion model
Yahoo! news article recommendation system by linUCB
Notebooks for learning deep learning
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
VIP cheatsheets for Stanford's CS 229 Machine Learning
songhappy / models
Forked from tensorflow/modelsModels and examples built with TensorFlow
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU su…