The AVA dataset densely annotates 80 atomic visual actions in 351k movie clips with actions localized in space and time, resulting in 1.65M action labels with multiple labels per human occurring fr…

318 29 Updated Feb 9, 2022

lucidrains / local-attention

An implementation of local windowed attention for language modeling

Python 410 43 Updated Jan 16, 2025

muhammadshahidwandar / Visual-VAD-Unsupervised-Domain-Adaptation

Tensorflow and Matlab code for "RealVAD: A Real-world Dataset for Voice Activity Detection" and "Voice Activity Detection by Upper Body Motion Analysis and Unsupervised Domain Adaptation "

Python 5 5 Updated Jul 8, 2020

krantiparida / awesome-audio-visual

A curated list of different papers and datasets in various areas of audio-visual processing

690 69 Updated Jan 30, 2024

okankop / ASDNet

Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset

Python 59 7 Updated Jan 18, 2022

google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

Python 1,569 320 Updated Sep 25, 2024

tuanchien / asd

Active Speaker Detection

Jupyter Notebook 19 4 Updated Jun 19, 2020

afourast / avobjects

Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"

Python 111 26 Updated Nov 16, 2020

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 4,936 477 Updated Dec 26, 2024

TaoRuijie / TalkNet-ASD

ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'

Python 341 79 Updated Oct 23, 2023

SwinTransformer / Video-Swin-Transformer

Forked from open-mmlab/mmaction2

This is an official implementation for "Video Swin Transformers".

Python 1,489 202 Updated Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seon92

Achievements

Achievements

Block or report seon92

Stars

state-spaces / mamba

seon92 / Chainization

seon92 / GOL

ByZ0e / Glance-Focus

doc-doc / NExT-QA

VRU-NExT / VideoQA

Alvin-Zeng / Awesome-Temporal-Action-Localization

ttengwang / Awesome_Long_Form_Video_Understanding

youssefHosni / Efficient-Python-for-Data-Scientists-Book

HuiZeng / Grid-Anchor-based-Image-Cropping-Pytorch

fuankarion / active-speakers-context

SRA2 / SPELL

cvdfoundation / ava-dataset

lucidrains / local-attention

muhammadshahidwandar / Visual-VAD-Unsupervised-Domain-Adaptation

krantiparida / awesome-audio-visual

okankop / ASDNet

google / uis-rnn

tuanchien / asd

afourast / avobjects

snakers4 / silero-vad

TaoRuijie / TalkNet-ASD

SwinTransformer / Video-Swin-Transformer

zihangm / RAL_GNN

KevinMusgrave / pytorch-metric-learning

ksaito-ut / atda

Gorilla-Lab-SCUT / SymNets

yue-zhongqi / tcm

yassersouri / ghiaseddin

zhoushengisnoob / DeepClustering