Stars
Seamless operability between C++11 and Python
The open source mesh processing python library
Monorepo hosting the proton web clients
🐳 A curated list of Docker resources and projects
A curated list of amazingly awesome open-source sysadmin resources.
A list of Free Software network services and web applications which can be hosted on your own servers
Empowering everyone to build reliable and efficient software.
Flipper Zero firmware source code
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
A tool for extracting plain text from Wikipedia dumps
The simplest, fastest repository for training/finetuning medium-sized GPTs.
QLoRA: Efficient Finetuning of Quantized LLMs
A straightforward collection of Music Generation research resources.
📋 A list of open LLMs available for commercial use.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Finetuning InstructLLaMA with portuguese data
Alpaca dataset from Stanford, cleaned and curated
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Code and documentation to train Stanford's Alpaca models, and generate the data.
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath