Official PyTorch Implementation of our CVPR 2025 paper.
Authors: Nikola Zubić, Davide Scaramuzza
Figure: Overview of the GG-SSM pipeline applied to various tasks, such as event-based vision tasks, time series forecasting, image classification, and optical flow estimation.
State Space Models (SSMs) are powerful tools for modeling sequential data in computer vision and time series analysis domains. However, traditional SSMs are limited by fixed, one-dimensional sequential processing, which restricts their ability to model non-local interactions in high-dimensional data. While methods like Mamba and VMamba introduce selective and flexible scanning strategies, they rely on predetermined paths, which fails to efficiently capture complex dependencies.
We introduce Graph-Generating State Space Models (GG-SSMs), a novel framework that overcomes these limitations by dynamically constructing graphs based on feature relationships. Using Chazelle's Minimum Spanning Tree algorithm, GG-SSMs adapt to the inherent data structure, enabling robust feature propagation across dynamically generated graphs and efficiently modeling complex dependencies.
We validate GG-SSMs on 11 diverse datasets, including event-based eye-tracking, ImageNet classification, optical flow estimation, and six time series datasets. GG-SSMs achieve state-of-the-art performance across all tasks, surpassing existing methods by significant margins. Specifically, GG-SSM attains a top-1 accuracy of 84.9% on ImageNet, outperforming prior SSMs by 1%, reducing the KITTI-15 error rate to 2.77%, and improving eye-tracking detection rates by up to 0.33% with fewer parameters. These results demonstrate that dynamic scanning based on feature relationships significantly improves SSMs' representational power and efficiency, offering a versatile tool for various applications in computer vision and beyond.
If you find this work helpful, please cite our paper:
@inproceedings{Zubic_2025_CVPR,
title = {Graph-Generating State Space Models (GG-SSMs)},
author = {Zubic, Nikola and Scaramuzza, Davide},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025}
}
Below are the commands to set up a conda environment and install all necessary dependencies, including custom libraries for graph-based state scanning:
# 1. Create and activate conda environment
conda create -y -n gg_ssms python=3.11
conda activate gg_ssms
# 2. Install PyTorch and CUDA
conda install -y pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.4 -c pytorch -c nvidia
conda install -y nvidia::cuda-toolkit
# 3. Install custom dependencies (TreeScan and TreeScanLan)
cd core/convolutional_graph_ssm/third-party/TreeScan/
pip install -v -e .
cd $(git rev-parse --show-toplevel)
cd core/graph_ssm/third-party/TreeScanLan/
pip install -v -e .
Depending on which tasks or modules you want to run, you may need extra Python packages beyond the core requirements listed above. Below is a breakdown of recommended installations for each sub-project:
- INI-30 dataset event-based eye tracking (eye_tracking_ini_30)
cd eye_tracking_ini_30 pip install dv-processing sinabs tonic thop samna fire
- LPW dataset event-based eye tracking (eye_tracking_lpw)
cd eye_tracking_lpw pip install matplotlib opencv-python tqdm tables easydict wandb timm einops
- MambaTS (Time Series)
- Check the requirements.txt inside the
MambaTS
folder:
cd MambaTS pip install -r requirements.txt
- Check the requirements.txt inside the
We provide a Convolutional Graph-Generating SSM for image-based feature extraction and classification in:
core/convolutional_graph_ssm/classification/models/graph_ssm.py
- Choosing Model Size: On line 545, you can set
config_path
to one ofbase
,small
, ortiny
to pick the desired model variant. - Pretrained Weights: Place the corresponding pretrained weight files (e.g.,
gg_ssm_base.pth
,gg_ssm_small.pth
,gg_ssm_tiny.pth
) inside:These weights can be downloaded from the Releases page.core/convolutional_graph_ssm/classification/weights/
To run a forward pass on an image:
python core/convolutional_graph_ssm/classification/models/graph_ssm.py
- By default, this script will load the base model from
config_path='base'
.
A purely temporal Graph-Generating SSM (for sequential or time-series data) is available in:
core/graph_ssm/main.py
- This module focuses on modeling temporal dependencies using dynamically constructed graphs.
You can combine the Convolutional Graph SSM (for spatial modeling) and the Temporal Graph SSM (for sequential/temporal modeling) to create a unified spatio-temporal pipeline. Our event-based eye tracking tasks (see Ini-30 Eye Tracking or LPW Dataset Eye Tracking) demonstrate exactly how these two components are integrated for end-to-end training.
We incorporate Graph-Generating SSMs into the MambaTS codebase by replacing the default encoder in MambaTS/models/MambaTS.py
with our TemporalGraphSSM
. This allows graph-based temporal modeling for long-horizon forecasting.
-
Scripts Location
All relevant scripts can be found here. -
Adjusting Paths & Parameters
In each script (e.g.,run.py
), you can modify:CUDA_VISIBLE_DEVICES
: Set to your GPU index (e.g.,export CUDA_VISIBLE_DEVICES=3
).root_path
/data_path
: Point these to the folder containing your time-series dataset.model_id
/model_name
: Namespacing for checkpoints and logging.seq_len
,pred_len
: Sequence length and prediction horizon you want to experiment with.- Hyperparameters: Adjust
e_layers
,d_layers
,batch_size
,learning_rate
, etc.
-
Datasets Download All datasets can be downloaded from here.
To run, do cd MambaTS/
from the root and then bash ./scripts/MambaTS_ETTh2.sh
to run ETTh2 dataset training. All the other scripts for any of the 6 time series datasets are available. All the logs and outputs will be generated inside the MambaTS folder.
Our implementation for Ini-30 event-based eye tracking can be found in the retina
folder:
/training/models/baseline_3et.py
:
Contains the code where our GG-SSM architecture is integrated for eye tracking with aspatial_backbone=ConvGraphSSM
andtemporal_ssm=TemporalGraphSSM
.- From the root you can run
CUDA_VISIBLE_DEVICES=i python retina/scripts/train.py --run_name=graph_ssm --device=i
, where i is the GPU ID. The script will automatically log and create a project in Weights & Biases (wandb), namedeye_tracking_ini_30
.
When installing Tonic (needed for event-based data processing), you may encounter a pip dependency error like:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.
...
python-tsp 0.5.0 requires numpy<3.0.0,>=2.0.0, but you have numpy 1.26.4 which is incompatible.
This means Tonic and python-tsp (used for certain Time Series tasks) have conflicting NumPy requirements. If you plan to run Time Series tasks in the same environment, you can:
- Uninstall Tonic once finished with eye tracking,
- Downgrade or reinstall NumPy, and
- Reinstall
python-tsp
for time series.
Alternatively, keep separate environments for each task to avoid conflicts.
Our integration for the LPW dataset eye tracking is located in the eye_tracking_lpw
folder.
-
Data Preparation
Follow the instructions provided by cb-convlstm-eyetracking to download and prepare the LPW dataset. -
Path Configuration
In theeye_tracking_lpw/graph_ssm_train.py
file, set:DATA_DIR_ROOT = "/path/to/your/LPW/dataset"
so that it points to the root directory containing the LPW dataset.
-
Run Training
From the project root directory, simply execute:python eye_tracking_lpw/graph_ssm_train.py
This will start the training process for LPW eye tracking with the Graph-Generating SSM architecture.
This project has used code from the following projects:
- MambaTS - Improved Selective State Space Models for Long-term Time Series Forecasting
- Retina - Low-Power Eye Tracking with Event Camera and Spiking Hardware
- 3ET - Efficient Event-based Eye Tracking using a Change-Based ConvLSTM Network
- MemFlow - Optical Flow Estimation and Prediction with Memory
- GrootVL - Tree Topology is All You Need in State Space Model