[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
-
Updated
Dec 11, 2024 - Python
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Papers, code and datasets about deep learning and multi-modal learning for video analysis
[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving & Foundation Models in Autonomous System
Generic PyTorch dataset implementation to load and augment VIDEOS for deep learning training loops.
Awesome papers & datasets specifically focused on long-term videos.
500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型(TensorFlow2.0)。
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
SoccerAct10 is a dataset which contains 10 different soccer actions. This dataset was developed using the videos from YouTube.
Surveillance Perspective Human Action Recognition Dataset: 7759 Videos from 14 Action Classes, aggregated from multiple sources, all cropped spatio-temporally and filmed from a surveillance-camera like position.
Tools for loading video dataset and transforms on video in pytorch. You can directly load video files without preprocessing.
The Most Comprehensive Survey of Video Quality Assessment to Date.
🌱 Starter kit for working with the EPIC-KITCHENS-55 dataset for action recognition or anticipation
Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)
Official repository for the paper titled "Bitstream-corrupted Video Recovery: A Novel Benchmark Dataset and Method", accepted by NeurIPS 2023 Dataset and Benchmark Track
Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling
Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
[AAAI 2023] AVCAffe: A Large Scale Audio-Visual Dataset of Cognitive Load and Affect for Remote Work
[NeurIPS'22] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Official This-Is-My Dataset published in CVPR 2023
Improving Transfer Learning with a Dual Image and Video Transformer for Multi-label Movie Trailer Genre Classification
Add a description, image, and links to the video-dataset topic page so that developers can more easily learn about it.
To associate your repository with the video-dataset topic, visit your repo's landing page and select "manage topics."