Non-Driving-Activiy_recognition

We use a spatial and motion stream cnn with ResNet50 for modeling video information in NDAs (Non-Driving-Activities) dataset that collected ourselves. NDAs are one of driver's activity, however, when level 3 Autonomous Vehicle is in autopilot mode, the primary task of drivers are no longer driving. But we still need to understand what the drivers are doing to know the reaction time, because if the driver needs to take control the reaction time of the period became a crucial information. For the motion stream we use FlowNet 2.0 to generate the optical flow.

1. Data (Not open sourced yet)

1.1 Spatial input data -> rgb videos

This is the original video from our NDAs dataset. The size is about 3G.

1.2 Temporal(motion) input data -> optical flow frame

In temporal stream, we used FlowNet 2.0 to extract the optical flow. The size is about 3G.
You could follow the FlowNet2.0 Colab Notebook for obtaining the optical flow of the video stream

2. Model

2.1 Spatial Convolutional Neural Network (Spatial CNN)

We use ResNet50 first pre-trained with ImageNet to be the main structure, the input of the Spatial CNN is the frame that extract from the original video stream. The input shape is (3, 244, 244). The resize and data augmentation are all automotically done in the code.

2.2 Temporal Convolutional Neural Network (temporal CNN)

Input data of temporal cnn is a stack of optical flow images which is a RGB-visualised optiacal flow image using FlowNet2.0, So the input shape is (30, 224, 224) which can be considered as a 30-channel image, 30 comes from RGB (3-channel) times 10 images per stack.
In order to utilize ImageNet pre-trained weight on our model, we have to modify the weights of the first convolution layer pre-trained with ImageNet from (64, 3, 7, 7) to (64, 30, 7, 7).

3. Training methodology

3.1 Spatial cnn

For every videos in a mini-batch, we randomly select 3 frames from each video. Then a consensus among the frames will be derived as the video-level prediction for calculating loss.

3.1 Temporal cnn

In every mini-batch, we randomly select 16 (batch size) videos from the training videos and futher randomly select 1 stacked optical flow in each video.

3.3 Data augmentation

Both stream apply the same data augmentation technique such as random cropping.

4. Testing method

For every testing videos, we uniformly sample 19 frames in each video and the video level prediction is the voting result of all 19 frame level predictions.

5. Performace

network	top1
Spatial stream	80.8%
Motion stream	83.3%
Fusion	96.4%

6. Pre-trained Model

7. Testing by yourself

The below is the link of my Google Colab notebook, all the instruction could be found on it. If there is any problem, leave a issue on this git repository.
Google Colab notebook

Acknowledgment

This code is modified from jeffreyhuang. All the edition is completed by myself in order to fit the NDAs Recognition project.
The optical flow images was estimated using FlowNet 2.0 from NVIDIA git repository

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
NDA_list		NDA_list
dataloader		dataloader
n_NDA_list		n_NDA_list
record		record
README.md		README.md
average_fusion.py		average_fusion.py
get_class.py		get_class.py
get_frame_count_pickle.py		get_frame_count_pickle.py
get_list_70_20_10.py		get_list_70_20_10.py
get_list_70_20_10_by_people.py		get_list_70_20_10_by_people.py
make_opt_rgb_video.py		make_opt_rgb_video.py
make_opt_rgb_video_with_predict_level.py		make_opt_rgb_video_with_predict_level.py
motion_cnn.py		motion_cnn.py
network.py		network.py
predict_pickle_reader.py		predict_pickle_reader.py
predict_pickle_reader_split_by_people.py		predict_pickle_reader_split_by_people.py
spatial_cnn.py		spatial_cnn.py
split_g2c.py		split_g2c.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Non-Driving-Activiy_recognition

1. Data (Not open sourced yet)

1.1 Spatial input data -> rgb videos

1.2 Temporal(motion) input data -> optical flow frame

2. Model

2.1 Spatial Convolutional Neural Network (Spatial CNN)

2.2 Temporal Convolutional Neural Network (temporal CNN)

3. Training methodology

3.1 Spatial cnn

3.1 Temporal cnn

3.3 Data augmentation

4. Testing method

5. Performace

6. Pre-trained Model

7. Testing by yourself

Acknowledgment

About

Releases

Packages

Languages

timthegod/Non-Driving-Activiy_recognition

Folders and files

Latest commit

History

Repository files navigation

Non-Driving-Activiy_recognition

1. Data (Not open sourced yet)

1.1 Spatial input data -> rgb videos

1.2 Temporal(motion) input data -> optical flow frame

2. Model

2.1 Spatial Convolutional Neural Network (Spatial CNN)

2.2 Temporal Convolutional Neural Network (temporal CNN)

3. Training methodology

3.1 Spatial cnn

3.1 Temporal cnn

3.3 Data augmentation

4. Testing method

5. Performace

6. Pre-trained Model

7. Testing by yourself

Acknowledgment

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages