- Atlanta Georgia
Highlights
- Pro
Stars
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
Witness the aha moment of VLM with less than $3.
Solve Visual Understanding with Reinforced VLMs
A jounery to real multimodel R1 ! We are doing on large-scale experiment
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
a state-of-the-art-level open visual language model | 多模态预训练模型
A framework to enable multimodal models to operate a computer.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Utilize a Raspberry Pi and a Nuand BladeRF to generate your own portable local cell network
Powerful, fast and robust engine for converting 3D models into g-code instructions for 3D printers. It is part of the larger open source project Cura.
Bearle / django-web3-auth
Forked from atereshkin/django-web3-authA pluggable Django app that enables login/signup via an Ethereum wallet (a la CryptoKitties)
The project where literally anything* goes.