Skip to content
View borisshapa's full-sized avatar
๐ŸŒธ
๐ŸŒธ

Block or report borisshapa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Showing results

VC-FB and MC-FB algorithms from "Zero-Shot Reinforcement Learning from Low Quality Data" (NeurIPS 2024)

Python 12 1 Updated Dec 3, 2024

An easy-to-use Python framework to generate adversarial jailbreak prompts.

Python 533 42 Updated Sep 2, 2024

veRL: Volcano Engine Reinforcement Learning for LLM

Python 624 50 Updated Jan 9, 2025

This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.

HTML 116,597 15,821 Updated Jan 7, 2025

[COLM 2024] A Survey on Deep Learning for Theorem Proving

159 10 Updated Sep 7, 2024

The official implementation of Self-Play Preference Optimization (SPPO)

Python 513 62 Updated Nov 23, 2024

Deep learning for audio processing

Jupyter Notebook 607 105 Updated Dec 27, 2024

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 ๐Ÿ“ and reasoning techniques.

6,120 336 Updated Jan 9, 2025

Dateset Reset Policy Optimization

Python 28 1 Updated Apr 12, 2024

Efficient Triton Kernels for LLM Training

Python 4,139 240 Updated Jan 9, 2025

A library for advanced large language model reasoning

Python 1,636 141 Updated Jan 7, 2025

The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".

Python 15 1 Updated Jun 20, 2024

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Python 33 5 Updated Jul 28, 2024

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Jupyter Notebook 260 28 Updated Aug 6, 2024

Monte Carlo tree search in JAX

Python 2,401 195 Updated Dec 11, 2024

Library for industrial alignment.

Python 366 22 Updated Dec 24, 2024

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 3,664 348 Updated Jan 8, 2025

[ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluating

Jupyter Notebook 33 6 Updated Jan 9, 2025

Recipes to train reward model for RLHF.

Python 1,123 77 Updated Dec 12, 2024

Training Sparse Autoencoders on Language Models

Jupyter Notebook 562 133 Updated Dec 29, 2024

๐Ÿš€ Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,624 80 Updated Jan 7, 2025

Minimal but scalable implementation of large language models in JAX

Python 28 Updated Nov 2, 2024

Kolmogorov Arnold Networks

Jupyter Notebook 15,307 1,428 Updated Dec 11, 2024

A JAX research toolkit for building, editing, and visualizing neural networks.

Python 1,710 55 Updated Dec 16, 2024

A simple library for scaling up JAX programs

Python 128 10 Updated Nov 2, 2024

Grok open release

Python 49,765 8,343 Updated Aug 30, 2024
Python 4 1 Updated Mar 15, 2024
Next