[CVPR2023] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning (https://arxiv.org/abs/2212.04500)

Python 115 11 Updated May 21, 2023

facebookresearch / mvit

Code Release for MViTv2 on Image Recognition.

Python 412 47 Updated Nov 26, 2024

lucidrains / TimeSformer-pytorch

Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification

Python 703 86 Updated Aug 25, 2021

facebookresearch / TimeSformer

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Python 1,595 217 Updated Apr 9, 2024

rishikksh20 / ViViT-pytorch

Implementation of ViViT: A Video Vision Transformer

Python 520 66 Updated Jun 21, 2021

xuejianhuang / EMRFM

An effective multimodal representation and fusion method for multimodal intent recognition

Python 6 1 Updated Jun 7, 2024

Theia-4869 / FasterVLM

Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.

Python 41 Updated Dec 14, 2024

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,078 486 Updated May 3, 2024

huggingface / smol-course

A course on aligning smol models.

Jupyter Notebook 3,774 1,225 Updated Dec 30, 2024

Instruction-Tuning-with-GPT-4 / GPT-4-LLM

Instruction Tuning with GPT-4

HTML 4,251 301 Updated Jun 11, 2023

tatsu-lab / stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 29,706 4,056 Updated Jul 17, 2024

Mooler0410 / LLMsPracticalGuide

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

9,610 741 Updated May 31, 2024

AlibabaResearch / DAMO-ConvAI

DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.

Python 1,286 195 Updated Jan 3, 2025

Tramac / tiny-kinetics-400

Tiny Kinetics-400 for test

Python 86 10 Updated Feb 21, 2024

qiuqiangkong / audioset_tagging_cnn

Python 1,385 258 Updated Jul 25, 2024

OpenGVLab / VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Python 874 62 Updated Jul 6, 2024

ViTAE-Transformer / QFormer

The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"

Python 189 10 Updated Apr 10, 2024

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 20,969 2,305 Updated Aug 12, 2024

spmallick / learnopencv

Learn OpenCV : C++ and Python Examples

Jupyter Notebook 21,483 11,642 Updated Jan 2, 2025

fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.

Python 249 44 Updated Nov 20, 2024