Lists (3)
Sort Name ascending (A-Z)
Stars
An open source implementation of CLIP.
本项目是一个面向小白开发者的大模型应用开发教程,在线阅读地址:https://datawhalechina.github.io/llm-universe/
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
isolated & continuous sign language recognition using CNN+LSTM/3D CNN/GCN/Encoder-Decoder
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
The official GitHub page for the survey paper "A Survey of Large Language Models".
TensorFlow code and pre-trained models for BERT
Code for ALBEF: a new vision-language pre-training method
哈尔滨工业大学(深圳)计算机专业课程攻略 | Guidance for courses in Department of Computer Science, Harbin Institute of Technology (Shenzhen)
ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))
All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
awesome grounding: A curated list of research papers in visual grounding
COCO API - Dataset @ http://cocodataset.org/
Chinese Vision-Language Understanding Evaluation
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答(KBQA),基于文本的问答系统(TextQA),基于表格的问答系统(TableQA)、基于视觉的问答系统(VisualQA)和机器阅读理解(MRC)等,每类任务分别对学术界和工业界进行了相关总结。
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)