Stars
veRL: Volcano Engine Reinforcement Learning for LLM
A high-throughput and memory-efficient inference and serving engine for LLMs
OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.
[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
tpgh24 / ag4masses
Forked from google-deepmind/alphageometryMaking Google Deepmind's AlphaGeometry accessible to the Masses
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
The user home repository for the Mathematics in Lean tutorial.
PyTorch implementation of AlphaZero Connect from scratch (with results)
PPO x Family DRL Tutorial Course(决策智能入门级公开课:8节课帮你盘清算法理论,理顺代码逻辑,玩转决策AI应用实践 )
Refine high-quality datasets and visual AI models
Making your benchmark of optimization algorithms simple and open
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
Original transformer paper: Implementation of Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.
A PyTorch implementation of the Transformer model in "Attention is All You Need".
Proper implementation of ResNet-s for CIFAR10/100 in pytorch that matches description of the original paper.
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
This is a list of peer-reviewed representative papers on deep learning dynamics (optimization dynamics of neural networks). The success of deep learning attributes to both network architecture and …
Python implementation of Tabu Search (TB), Genetic Algorithm (GA), and Simulated Annealing (SA) solving Travelling Salesman Problem (TSP). Term project of Intelligent Optimization Methods, UCAS cou…
TSP算法全复现:遗传(GA)、粒子群(PSO)、模拟退火(SA)、禁忌搜索(ST)、蚁群算法(ACO)、自自组织神经网络(SOM)
Model the sudoku puzzle as an Integer Program using google's ortools package in Python