VARify

This repository contains the code to train a multi-task AI model designed for Video Assistant Referee (VAR) applications in football. The model leverages a two-stream neural network architecture to analyze RGB and optical flow inputs, classifying various aspects of fouls, such as action type, offense severity, and body part involved.

Features

Two-Stream Neural Network: Combines RGB and optical flow data for robust video analysis.
Multi-Class Classification: Simultaneously predicts:
- Action class (e.g., Tackling, Diving)
- Offense type (e.g., Offense, No Offense)
- Severity level (1 to 5)
- Body part involved (Upper or Lower body)
- Offense severity (e.g., No Card, Yellow Card, Red Card)
Customizable Training: Supports adjustable parameters like batch size, learning rate, and number of epochs.
Data Preprocessing: Includes utilities for loading and preprocessing video clips.

Requirements

Dependencies

The codebase uses Python and the following libraries:

torch
torchvision
numpy
opencv-python
Pillow
tqdm
scikit-learn

Install dependencies using:

pip install -r requirements.txt

SoccerNet-MVFoul

Follow the link to easily download the SoccerNet pip package.

If you want to download the data and annotations, you will need to fill a NDA to get the password.

Then use the below command to install the dataset by updating the password variable in code:

python dataloader.py

To obtain the data in 720p, add version = "720p" to the input arguments. Unzip each folder while maintaining the naming conventions. (Train, Valid, Test, Chall). If you face issues extracting data from the train_720p.zip folder, the error may come from using the default unzip extractor. Using the app "The Unarchiver" should enable you to unzip it successfully.

The dataset consists of 3901 available actions. Each action is composed of at least two videos depicting the live action and at least one replay. The dataset is divided into a training set (2916 actions), validation set (411 actions), test set (301 actions) and challenge set (273 actions without the annotations).

Dataset

The model uses the SoccerNet dataset or similar datasets containing annotated football match clips. Ensure the dataset is structured as follows:

mvfouls/
├── train/
│   ├── annotations.json
│   ├── action_1/
│   │   ├── clip_0.mp4
│   │   ├── clip_1.mp4
│   ├── ...
├── valid/
│   ├── annotations.json
│   ├── ...
├── test/
│   ├── annotations.json
│   ├── ...

Annotations should include:

Action class
Offense type
Severity level
Body part involved
Offense severity

How to Use

1. Data Preparation

Download and extract the SoccerNet dataset or your custom dataset. Update the data_path variable in the code to point to your dataset directory.

2. Training the Model

Run the training script using:

python train.py

You can customize training parameters such as:

Number of epochs (num_epochs)
Batch size (batch_size)
Learning rate (learning_rate)
Max Samples (max_samples_o, max_samples_no)

Example:

main(data_path='mvfouls', num_epochs=40, batch_size=4, learning_rate=1e-3, max_samples_o=500, max_samples_no=400)

3. Model Architecture

The model is implemented in model.py using PyTorch. It consists of:

Two ResNet50 backbones (one for RGB and one for optical flow).
Fully connected layers for each classification task.

4. Preprocessing

The preprocessing pipeline in preprocess.py includes:

Resizing frames to $$112 \times 112$$
Normalizing RGB frames using ImageNet statistics.
Computing optical flow using Farneback's method.

5. Evaluation

After training, the best model is saved as best_model.pth. Use this model for inference or further evaluation on the test set.

Project Structure

VAR/
├── Interface/          # Interface for using the trained model
│   ├── ui.py           # FastAPI-based web interface
│   ├── model.py        # Model loading and inference logic
│   ├── best_model.pth  # Trained model weights
├── varfoul/            # Model training and preprocessing scripts
│   ├── train.py        # Training script for the model
│   ├── model.py        # Two-stream neural network architecture
│   ├── preprocess.py   # Preprocessing pipeline for RGB and optical flow inputs

Example Outputs

During training, the script logs:

Training loss and accuracy per epoch.
Validation loss and accuracy per epoch.
Saves the best-performing model automatically.

Customization

Adjusting Frame Count

Modify DESIRED_FRAME_COUNT in preprocess.py to change the number of frames used per clip.
Modify num_views in preprocess.py to change number of views per action.

Adding New Tasks

To add new classification tasks:

Update the TwoStreamNetwork class in model.py.
Modify labels in preprocess.py.
Update training logic in train.py.

VARify UI

This section focuses on the implementation of the interactive interface for the VARify system. The interface allows users to upload football match video clips and obtain predictions for various foul-related classifications, such as action type, offense severity, and body part involved.

Interface Overview

The interface is built using FastAPI, providing a simple web-based platform for interacting with the trained VARify model. Users can upload video files, and the system processes them using the trained model to return predictions.

Download the weights of the model: https://drive.google.com/drive/u/0/folders/1aLEd2oMR30rEDBYSFFUs2s15OG047j61 or paste the trained model in the Interface folder.

And save the best_model.pth file in the folder Interface.

1. Running the Interface

To launch the interactive interface:

Navigate to the Interface directory.
Run the FastAPI server:
```
python ui.py
```
Open your browser and go to http://127.0.0.1:8000.

2. Uploading Videos

The interface allows you to upload video clips for analysis. Once uploaded, the system will process the video using the trained model and return predictions for:

Action class
Offense type
Severity level
Body part involved
Offense severity

Note: Upload upto 4 video clips through a web interface and get predictions in real time.

Interface Details

The Interface folder contains:

ui.py: Implements a FastAPI-based web application for uploading videos and returning predictions.
model.py: Loads the trained model (best_model.pth) and performs inference.
best_model.pth: Pre-trained weights of the two-stream neural network.

Example API Endpoints

Homepage
```
GET /
```
Returns an HTML page with instructions.
Upload Endpoint
```
POST /upload/predict_multiple
```
Accepts a video file as input and returns predictions.

Example Outputs

When running inference on a video clip through the interface or directly using model.py, you will receive outputs like:

{
    "action_class": "Tackling",
    "offense_type": "Yes",
    "severity_level": "3",
    "body_part": "Upper body",
    "offense_severity": "Yellow card"
}

Customization

Adding New Tasks or Classes

To add new tasks or classes:

Update TwoStreamNetwork in varfoul/model.py.
Modify label mappings in EVENT_DICTIONARY in Interface/ui.py.
Adjust preprocessing logic in preprocess.py.

Experiments

If you would like to explore and understand more about different model implementations, please refer to notebooks folder.

Future Improvements

Potential enhancements include:

Improving real-time processing speed.
Incorporating advanced optical flow algorithms.

Individual Contributions

Contributor 1: Jainil K. Rana (300362571)

Responsibilities:

UI Development:
- Designed and implemented the FastAPI-based web interface for the VAR system.
- Developed endpoints for video upload, preprocessing, and generating predictions from trained models.
- Integrated the MViT, ResNet R3D-18, and TwoStreamResNet models into the interface for real-time foul classification.
- Ensured smooth interaction between the backend and the trained models.
Model Training - MViT:
- Trained the Multi-scale Vision Transformer (MViT) model for multi-task foul classification.
- Fine-tuned hyperparameters such as learning rate, batch size, and dropout to optimize performance.
- Evaluated MViT's performance on tasks like action classification, offense detection, and severity assessment.

Contributor 2: Omkar P. Harkare (300359531)

Responsibilities:

Model Training - ResNet R3D-18:
- Trained the ResNet R3D-18 model for multi-task classification using spatiotemporal video data.
- Implemented data preprocessing pipelines, including frame resizing and normalization.
- Evaluated ResNet's performance across tasks like body part detection and offense severity classification.
Model Training - TwoStreamResNet:
- Trained the TwoStreamResNet model by combining RGB and optical flow streams.
- Handled feature extraction from both streams using ResNet-50 backbones.
- Tuned task-specific fully connected layers for action recognition, severity assessment, and offense detection.
Dataset Preparation:
- Processed the SoccerNet-MVFoul dataset, ensuring balanced class distributions for training.
- Computed optical flow using Farneback's method and prepared data for multi-task learning.

Citations:

If you use this code or parts of it in your research or projects, please cite appropriately.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Interface		Interface
varfoul		varfoul
MLDiagrams.jpg		MLDiagrams.jpg
Readme.md		Readme.md
UI.png		UI.png
declaration.md		declaration.md
declaration.txt		declaration.txt
requirements.txt		requirements.txt

omkarharkare/VAR

Folders and files

Latest commit

History

Repository files navigation

VARify

Features

Requirements

Dependencies

SoccerNet-MVFoul

Dataset

How to Use

1. Data Preparation

2. Training the Model

3. Model Architecture

4. Preprocessing

5. Evaluation

Project Structure

Example Outputs

Customization

Adjusting Frame Count

Adding New Tasks

VARify UI

Interface Overview

1. Running the Interface

2. Uploading Videos

Interface Details

Example API Endpoints

Example Outputs

Customization

Adding New Tasks or Classes

Experiments

Future Improvements

Individual Contributions

Contributor 1: Jainil K. Rana (300362571)

Responsibilities:

Contributor 2: Omkar P. Harkare (300359531)

Responsibilities:

Citations:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages