Stars
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics…
Fast and memory-efficient exact attention
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Fully open reproduction of DeepSeek-R1
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models
Collect the EN name with the highest frequency of use-The English Name List
This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
A script to reset Cursor editor's device identification system. Helps resolve account restrictions and trial-related issues.
Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.
An Engine-Agnostic Deep Learning Framework in Java
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Open standard for machine learning interoperability
Multilingual G2P in 100 languages
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
PyTorch android examples of usage in applications
On-device AI across mobile, embedded and edge for PyTorch
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Examples for using ONNX Runtime for machine learning inferencing.
Using system APIs directly with adb/root privileges from normal apps through a Java process started with app_process.
All-in-One Development Tool based on PaddlePaddle(飞桨低代码开发工具)
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
Streamlit — A faster way to build and share data apps.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!