Stars
Formerly known as code.google.com/p/1-billion-word-language-modeling-benchmark
Llama from scratch, or How to implement a paper without crying
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Fast and memory-efficient exact attention
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Code and documentation to train Stanford's Alpaca models, and generate the data.
Instruct-tune LLaMA on consumer hardware
The simplest, fastest repository for training/finetuning medium-sized GPTs.
End-to-End Speech Recognition Using Tensorflow
这个工程的目的是从视频中获取语音识别的训练数据,用于训练字幕自动生成
Implementing Recurrent Neural Network from Scratch
A build-it-yourself, 6-wheel rover based on the rovers on Mars!
End-to-end ASR/LM implementation with PyTorch
Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.
high-performance graph database for real-time use cases
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
The Elements of Statistical Learning (ESL)的中文翻译、代码实现及其习题解答。
YSDA course in Natural Language Processing