Skip to content
View initial-h's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report initial-h

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A family of open-sourced Mixture-of-Experts (MoE) Large Language Models

Python 1,412 74 Updated Mar 8, 2024
Python 1,233 176 Updated Nov 20, 2024

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python 30,730 6,432 Updated Oct 18, 2024

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

Python 897 48 Updated Dec 6, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,811 118 Updated Oct 30, 2024

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,323 106 Updated Dec 26, 2024

A library for advanced large language model reasoning

Python 1,590 138 Updated Dec 23, 2024

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 3,363 315 Updated Dec 27, 2024

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Python 4,062 371 Updated Dec 6, 2024

O1 Replication Journey: A Strategic Progress Report – Part I

1,747 54 Updated Nov 30, 2024

Related papers for Continual Reinforcement Learning.

9 Updated May 27, 2024
Python 625 66 Updated Nov 27, 2024

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

5,913 322 Updated Dec 27, 2024

[NeurIPS 2022] PerfectDou: Dominating DouDizhu with Perfect Information Distillation

Python 164 32 Updated May 14, 2024

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

Python 4,155 599 Updated Jun 26, 2024
Python 38 11 Updated Oct 21, 2022

Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.

Python 284 22 Updated Nov 26, 2024
Java 14 Updated Oct 28, 2024

本人的科研经验

6,114 365 Updated Dec 8, 2024
Python 1,056 36 Updated Nov 21, 2024

This project aims to collect the latest "call for reviewers" links from various top CS/ML/AI conferences/journals

638 15 Updated Dec 23, 2024

Proximal Policy Optimization Algorithm applied to MountainCar in discrete environment

Jupyter Notebook 2 Updated Sep 15, 2020

A hyperparameter optimization framework

Python 11,144 1,050 Updated Dec 26, 2024

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Python 9,400 1,734 Updated Dec 21, 2024

Isaac Gym Reinforcement Learning Environments

Python 2,127 439 Updated Oct 26, 2024

一个简洁优雅的词典翻译 macOS App。开箱即用,支持离线 OCR 识别,支持有道词典,🍎 苹果系统词典,🍎 苹果系统翻译,OpenAI,Gemini,DeepL,Google,Bing,腾讯,百度,阿里,小牛,彩云和火山翻译。A concise and elegant Dictionary and Translator macOS App for looking up words an…

Objective-C 7,703 385 Updated Dec 25, 2024

TPAMI2020 "Unsupervised Multi-Class Domain Adaptation: Theory, Algorithms, and Practice"

Python 75 14 Updated Apr 14, 2021

Code released for ICML 2019 paper "Bridging Theory and Algorithm for Domain Adaptation".

Python 131 27 Updated Jun 14, 2019

Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"

Python 205 9 Updated Apr 22, 2024

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Python 437 61 Updated Dec 23, 2024
Next