In this repo we study the problem of action recognition(recognizing actions in videos) on UCF101 famous dataset.
Here, I reimplemented two-stream approach for action recognition using pre-trained Xception networks for both streams(Look at references).
Just clone Live Demo Two-steam net.ipynb notebook to your drive and run the cells on Google Colab (Something like the demo gif will be generated in video format)
A full demo of the code in the repo can be found in Action_Recognition_Walkthrough.ipynb notebook.
Please clone Action_Recognition_Walkthrough.ipynb notebook to your drive account and run it on Google Colab on python3 GPU-enabled instance.
This code requires python 3.6,
Tensorflow 1.11.0 (GPU enabled-the code uses keras associated with Tensorflow)
Imgaug 0.2.6
opencv 3.4.2.17
numpy 1.14.1
All of these requirements are satisfied by (python3 Colab GPU-enabled instance) Just use it and the notebook Action_Recognition_Walkthrough.ipynb will install the rest :)
I used UCF101 dataset originally found here.
Also the dataset is processed and published by feichtenhofer/twostreamfusion)
- RGB images(single zip file split into three parts)
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.001
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.002
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.003
- Optical Flow u/v frames(single zip file split into three parts)
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_tvl1_flow.zip.001
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_tvl1_flow.zip.002
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_tvl1_flow.zip.003
- You have variety of models to exchange between them easily.
- Saves checkpoints on regular intervals and those checkpoints are synchronized to google drive using Drive API which means you can resume training anywhere for any Goggle Colab Instance.
- Accesses the public models on my drive and you can resume and fine-tune them at different time stamps.
Where the name of every checkpoint is as follows, EPOCH.BEST_TOP_1_ACC.CURRENT_TOP_1_ACC
for example this
which is 300-0.84298-0.84166.zip in folder heavy-mot-xception-adam-1e-05-imnet
at this checkpoint,
- epoch=300
- best top 1 accuracy was 0.84298 (obtained in checkpoint before 300)
- the current accuracy is 0.84166
- in the experiment heavy-mot-xception-adam-1e-05-imnet
I used pre-trained models on imagenet provided by keras applications here.
The best results are obtained using Xception architecture.
Network | Top1-Acc |
---|---|
Spatial VGG19 stream | ~75% |
Spatial Resnet50 stream | 81.2% |
Spatial Xception stream | 86.04% |
------------------------ | ------- |
Motion Resnet50 stream | ~75% |
Motion xception stream | 84.4% |
------------------------ | ------- |
Average fusion | 91.25% |
------------------------ | ------- |
Recurrent network fusion | 91.7% |
All the pre-trained models could be found here.
It's the same drive folder accessed by the code while training and resuming training from a checkpoint.
- [1] Two-stream convolutional networks for action recognition in videos
- [2] Real-time Action Recognition with Enhanced Motion Vector CNNs
- [3] Towards Good Practices for Very Deep Two-Stream ConvNets
- [1] Nice two-stream reimplementation using pytorch using resnets My code is inspired by this repo.
- [2] Two-stream-pytorch
- [3] Hidden-Two-Stream
- [1] Hidden-Two-stream Which achieves real-time performance by using a deep neural net for generating the optical flow.
- [2] Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? Discuses how 3d convolutions is the perfect architecture for videos and Kinetics dataset pre-training could retrace imagenet pre-training.
- [3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset