🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
OpenMMLab Detection Toolbox and Benchmark
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Code for the paper "Language Models are Unsupervised Multitask Learners"
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
End-to-End Object Detection with Transformers
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐
Chinese version of GPT2 training code, using BERT tokenizer.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
a state-of-the-art-level open visual language model | 多模态预训练模型
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Count the MACs / FLOPs of your PyTorch model.
Robust recipes to align language models with human and AI preferences
Sequence modeling benchmarks and temporal convolutional networks
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Efficient 3D human pose estimation in video using 2D keypoint trajectories
SOTA Re-identification Methods and Toolbox
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model