initial-h

Follow

🎯

Focusing

Hongming Zhang initial-h

🎯

Focusing

Follow

Shape the way you think.

35 followers · 32 following

www.cnblogs.com/initial-h/

Achievements

Achievements

Highlights

Pro

Stars

XueFuzhao / OpenMoE

A family of open-sourced Mixture-of-Experts (MoE) Large Language Models

Python 1,412 74 Updated Mar 8, 2024

databricks / megablocks

Python 1,233 176 Updated Nov 20, 2024

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python 30,730 6,432 Updated Oct 18, 2024

pjlab-sys4nlp / llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

Python 897 48 Updated Dec 6, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,811 118 Updated Oct 30, 2024

openreasoner / openr

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,323 106 Updated Dec 26, 2024

maitrix-org / llm-reasoners

A library for advanced large language model reasoning

Python 1,590 138 Updated Dec 23, 2024

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 3,363 315 Updated Dec 27, 2024

bklieger-groq / g1

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Python 4,062 371 Updated Dec 6, 2024

GAIR-NLP / O1-Journey

O1 Replication Journey: A Strategic Progress Report – Part I

1,747 54 Updated Nov 30, 2024

datake / Papers-Of-Continual-RL

Related papers for Continual Reinforcement Learning.

9 Updated May 27, 2024

zhentingqi / rStar

Python 625 66 Updated Nov 27, 2024

hijkzzz / Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

5,913 322 Updated Dec 27, 2024

Netease-Games-AI-Lab-Guangzhou / PerfectDou

[NeurIPS 2022] PerfectDou: Dominating DouDizhu with Perfect Information Distillation

Python 164 32 Updated May 14, 2024

kwai / DouZero

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

Python 4,155 599 Updated Jun 26, 2024

submit-paper / Doudizhu_plus

Python 38 11 Updated Oct 21, 2022

DigiRL-agent / digirl

Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.

Python 284 22 Updated Nov 26, 2024

SijiaCui / play-urts

Java 14 Updated Oct 28, 2024

pengsida / learning_research

本人的科研经验

6,114 365 Updated Dec 8, 2024

Open-Source-O1 / Open-O1

Python 1,056 36 Updated Nov 21, 2024

jiahaoli57 / Call-for-Reviewers

This project aims to collect the latest "call for reviewers" links from various top CS/ML/AI conferences/journals

638 15 Updated Dec 23, 2024

rossettisimone / PPO_MOUNTAINCAR_DISCRETE

Proximal Policy Optimization Algorithm applied to MountainCar in discrete environment

Jupyter Notebook 2 Updated Sep 15, 2020

optuna / optuna

A hyperparameter optimization framework

Python 11,144 1,050 Updated Dec 26, 2024

DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Python 9,400 1,734 Updated Dec 21, 2024

isaac-sim / IsaacGymEnvs

Isaac Gym Reinforcement Learning Environments

Python 2,127 439 Updated Oct 26, 2024

tisfeng / Easydict

一个简洁优雅的词典翻译 macOS App。开箱即用，支持离线 OCR 识别，支持有道词典，🍎 苹果系统词典，🍎 苹果系统翻译，OpenAI，Gemini，DeepL，Google，Bing，腾讯，百度，阿里，小牛，彩云和火山翻译。A concise and elegant Dictionary and Translator macOS App for looking up words an…

Objective-C 7,703 385 Updated Dec 25, 2024

Gorilla-Lab-SCUT / MultiClassDA

TPAMI2020 "Unsupervised Multi-Class Domain Adaptation: Theory, Algorithms, and Practice"

Python 75 14 Updated Apr 14, 2021

thuml / MDD

Code released for ICML 2019 paper "Bridging Theory and Algorithm for Domain Adaptation".

Python 131 27 Updated Jun 14, 2019

bytedance / GR-1

Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"

Python 205 9 Updated Apr 22, 2024

mees / calvin

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Python 437 61 Updated Dec 23, 2024