- St Petersburg
- https://t.me/borisshapa
Stars
VC-FB and MC-FB algorithms from "Zero-Shot Reinforcement Learning from Low Quality Data" (NeurIPS 2024)
An easy-to-use Python framework to generate adversarial jailbreak prompts.
veRL: Volcano Engine Reinforcement Learning for LLM
This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.
[COLM 2024] A Survey on Deep Learning for Theorem Proving
The official implementation of Self-Play Preference Optimization (SPPO)
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 ๐ and reasoning techniques.
Efficient Triton Kernels for LLM Training
A library for advanced large language model reasoning
The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".
[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
RodkinIvan / associative-recurrent-memory-transformer
Forked from booydar/recurrent-memory-transformer[ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluating
Recipes to train reward model for RLHF.
Training Sparse Autoencoders on Language Models
๐ Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Minimal but scalable implementation of large language models in JAX
A JAX research toolkit for building, editing, and visualizing neural networks.