![unity logo](https://raw.githubusercontent.com/github/explore/80688e429a7d4ef2fca1e82350fe8e3517d3494d/topics/unity/unity.png)
Founder mode | ex-@google-deepmind Researcher | ex-@aws | ex-@microsoft | MS CS/ME @Stanford | Robotics | RL | AI/ML | Learn everything
-
Google DeepMind
- Mountain View
- @0xLingjieKong
- in/0xlingjiekong
Highlights
- Pro
Lists (20)
Sort Name ascending (A-Z)
- All languages
- Bikeshed
- C
- C#
- C++
- CSS
- Cairo
- Common Workflow Language
- Cuda
- Cython
- Dart
- Emacs Lisp
- Go
- Groovy
- HTML
- Java
- JavaScript
- Jupyter Notebook
- Kotlin
- Lua
- MDX
- Makefile
- Markdown
- OCaml
- PostScript
- Python
- R
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Solidity
- Svelte
- Swift
- TeX
- Thrift
- TypeScript
- Verilog
- Vue
- WebAssembly
- Zig
Starred repositories
5
stars
written in Cuda
Clear filter
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
FlashInfer: Kernel Library for LLM Serving
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.