Yanan Luo*, Jinhui Yi*, Yazan Abu Farha, Moritz Wolter, Juergen Gall
*Equal Contribution
If you like our project, please give us a star✨ on Github for latest update.
This is the official implementation for paper "Rethinking temporal self-similarity for repetitive action counting"
- 2025-03-10: The
*_feature_npz
folders are released. The test script directly on videos is released. - 2024-11-14: The
Oral
video has released on youtube channel. [video] - 2024-09-25: The code and pre-trained model are avaliable. [pretrained]
- 2024-08-09: This paper has been accepted by
WICV workshop 2024
inECCV 2024
as an extended abstract. - 2024-07-12: The preprint of the paper is available [paper].
- 2024-06-07: This paper has been accepted by
ICIP 2024
asOral
.
We rethink how a temporal self-similarity matrix (TSM) can be utilized for counting repetitive actions and propose a framework (RACnet) that learns embeddings and predicts action start probabilities at full temporal resolution. The number of repeated actions is then inferred from the action start probabilities. We propose a novel loss based on a generated reference TSM, which enforces that the self-similarity of the learned frame-wise embeddings is consistent with the self-similarity of repeated actions.
Datasets | MAE⬇️ | OBO⬆️ |
---|---|---|
RepCountA | 0.4441 | 0.3933 |
UCFRep | 0.5260 | 0.3714 |
Countix | 0.5278 | 0.3924 |
We provide the *_feature_npz
folders (without annotation files), please check in RACnet_feature_npy. In this case, you can directly skip into step 5 and start training, testing:)
- Download pretrained backbone model: Video Swin Transformer tiny(github).
- Feature extractor: extract and flatten the feature map as 7 X 7 X 768 with the backbone model.
- Generate the reference TSM.
cd dataset
python gen_refTSM.py
- Save several arrays into a final file in compressed .npz for each video data.
# The arrays to be included in the file:
# per_frame_features: from step 2
# refTSM: from step 3
# frame_length: see metadata/RepCountA_frame_length.csv
# frame_name: same as above
import numpy as np
# RepCountA
np.savez(file='video_name.npz', img_feature=per_frame_features, gt_tsm=refTSM, length=frame_length)
# UCFRep and Countix, inference only
np.savez(file='video_name.npz', img_feature=per_frame_features, length=frame_length, count=count)
- Data structure.
# RepCountA dataset
# .csv files are the annotations from the original dataset
RepCountA_feature_npz/
├── train.csv
├── valid.csv
├── test.csv
├── train/
│ ├── video_1.npz
│ ├── video_2.npz
│ └── ...
├── valid/
│ ├── video_4.npz
│ ├── video_5.npz
│ └── ...
└── test/
├── video_7.npz
├── video_8.npz
└── ...
# UCFRep and Countix dataset
*_feature_npz/
├── test.csv
└── test/
├── video_7.npz
├── video_8.npz
└── ...
Note:
- We will upload the
*_feature_npz
folders soon. - Countix is a subset of Kinetics. Some videos are not avaliable at test time any more, we provide the features of avalible videos in
Countix_feature_npz
.
Recommend to use conda virtual env.
conda create -n racnet python=3.11.5 -y
conda activate racnet
pip install -r requirements.txt # you may need to change the cuda version based on your machine
Please refer to the configs for train and test seperately.
python train.py configs/train_RACnet.py
python test.py configs/test_RACnet.py
Inference on videos directly: see test4video.py
.
@inproceedings{luo2024rethinking,
title={Rethinking temporal self-similarity for repetitive action counting},
author={Luo, Yanan and Yi, Jinhui and Farha, Yazan Abu and Wolter, Moritz and Gall, Juergen},
booktitle={2024 IEEE International Conference on Image Processing (ICIP)},
pages={2187--2193},
year={2024},
organization={IEEE}
}
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. You can view the full license here.