Skip to content
View FairyLinya's full-sized avatar

Block or report FairyLinya

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Python 91 2 Updated Dec 24, 2024

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

498 50 Updated Dec 31, 2024

AN O1 REPLICATION FOR CODING

Python 307 20 Updated Dec 11, 2024

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

Python 800 53 Updated Nov 4, 2024

Reference implementation for DPO (Direct Preference Optimization)

Python 2,326 192 Updated Aug 11, 2024

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

Python 329 13 Updated Jul 15, 2024

The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"

Python 34 3 Updated Jan 12, 2024
Jupyter Notebook 10 Updated Oct 22, 2024

The Open Cookbook for Top-Tier Code Large Language Model

Python 1,554 92 Updated Dec 8, 2024

A large-scale, fine-grained, diverse preference dataset (and models).

Python 325 15 Updated Dec 29, 2023

Baselines for all tasks from Long Code Arena benchmarks 🏟️

Python 25 3 Updated Sep 11, 2024

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & V…

735 42 Updated Oct 22, 2024
Python 30 1 Updated Sep 14, 2024

👨‍🎓 北京交通大学计算机科学与技术学院研究生课程资料、笔记、回忆和整理的期末考试卷及课程作业。希望对你们有所帮助❤️,如果喜欢记得给个star🌟

Jupyter Notebook 154 31 Updated Sep 19, 2024

获取学堂在线的练习答案

Python 3 5 Updated May 11, 2020

Recipes to train reward model for RLHF.

Python 1,094 76 Updated Dec 12, 2024

A recipe for online RLHF and online iterative DPO.

Python 456 51 Updated Dec 28, 2024

强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/

Jupyter Notebook 9,946 1,909 Updated Jan 19, 2025

[动手学强化学习]系列,基于pytorch。

Python 54 25 Updated Jun 2, 2021

AI量化实验室,专注将前沿人工智能技术(深度学习/强化学习/知识图谱)应用于金融量化投资。

HTML 718 178 Updated Sep 19, 2023

[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.

1,970 125 Updated Jan 17, 2025

BeHonest: Benchmarking Honesty in Large Language Models

JavaScript 31 Updated Aug 15, 2024

Awesome LLMs on Device: A Comprehensive Survey

922 101 Updated Jan 12, 2025

TinyChatEngine: On-Device LLM Inference Library

C++ 793 77 Updated Jul 4, 2024

Representation Engineering: A Top-Down Approach to AI Transparency

Jupyter Notebook 777 89 Updated Aug 14, 2024

A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights i…

1,124 57 Updated Jan 19, 2025

大麦自动抢票,支持人员、城市、日期场次、价格选择

Python 879 134 Updated Apr 28, 2024

[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.

Python 159 17 Updated Nov 12, 2024

The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”

Python 15 1 Updated Feb 26, 2024
Next
Showing results