Skip to content

kaist-dmlab/MOBINS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mobility Networked Time-Series Forecasting Benchmark Datasets

This is the official repository of MOBINS (MOBIlity Networked time Series), a novel dataset collection designed for networked time-series forecasting of movements.

1. Overview

Human mobility is crucial for urban planning (e.g., public transportation) and epidemic response strategies. However, existing research often neglects integrating comprehensive perspectives on spatial dynamics, temporal trends, and other contextual views due to the limitations of existing mobility datasets. To bridge this gap, we introduce MOBINS (MOBIlity Networked time Series), a novel dataset collection designed for networked time-series forecasting of dynamic human movements. MOBINS features diverse and explainable datasets that capture various mobility patterns across different transportation modes in four cities and two countries and cover both transportation and epidemic domains at the administrative area level. Our experiments with nine baseline methods reveal the significant impact of different model backbones on the proposed six datasets. We provide a valuable resource for advancing urban mobility research, and our dataset collection is available at DOI 10.5281/zenodo.14590709.

2. Proposed Benchmark Datasets

Due to the big dataset size, we released it on the anonymous drive.

Additional information

Formats of datasets

  • csv format datasets in every environment: each dataset has three components.
    • SPATIAL_NETWORK.csv: ( $n * n$ where $n$ = # of nodes )
      • Column name list: INDEX, $N_{0}$, $N_{1}$, $\dots$, $N_{n}$
      • INDEX list: $N_{0}$, $N_{1}$, $\dots$, $N_{n}$
    • NODE_TIME_SERIES_FEATURES.csv: ( $t$ * $p$ ) * ( $n$ * $d$ ) where $t$ = # of timestamps in a day, $p$ = total period, and $d$ = # of variables from time series
      • Column name list: datetime, $N_{0}$ _{VARIABLE_NAME}, $N_{1}$ _{VARIABLE_NAME}, $\dots$, $N_{n}$ _{VARIABLE_NAME}
      • VARIABLE_NAME list: Transportation-[Seoul, Busan, Deagu]} datasets (INFLOW, OUTFLOW), Transportation-NYC dataset (RIDERSHIP), Epidemic-[Korea, NYC] dataset (INFECTION)
    • OD_MOVEMENTS.csv: ( $t$ * $p$ ) * ( $n$, $n$ )
      • Column name list: $N_{0}$ _ $N_{0}$, $N_{0}$ _ $N_{1}$, $N_{0}$ _ $N_{2}$, $\dots$ , $N_{n}$ _ $N_{n-1}$ , $N_{n}$ _ $N_{n}$
  • npy format datasets for directly training the models on the Python environments: each dataset has three components.
    • adj_matrix(shape: ( $n$ , $n$ )) where $n$ = # of nodes
    • node_npy(each file is constructed in a daily manner with shape: ( $n$, $t$ , $d$ )) where $t$ = # of timestamps in a day and $d$ = # of variables from time series
    • od_npy (each file is constructed in a daily manner with shape: ( $n$, $n$, $t$ ))

More detailed metadata is located in the datasets/README.md.

Dataset Descriptions

  • Overview of dataset description

  • Target dimension: OD Movements + Time Series = $n^2 + d*n$ where $n$ = # of nodes and $d$ = # of variables
Dataset Locations Spatial node units Edges Domain Daily Movements Daily Amounts Time interval Time Range Frames Target dimension
Transportation Seoul 128 290 Station-based administrative area SmartCard:2.68M In/Out-flow:4.02M 1 hour 01/01/2022-12/31/2023 17520 16640
Busan 60 121 Station-based administrative area SmartCard:0.63M In/Out-flow:0.75M 1 hour 01/01/2021-12/31/2023 26280 3720
Daegu 61 123 Station-based administrative area SmartCard:0.10M In/Out-flow:0.34M 1 hour 01/01/2021-12/31/2023 26280 3843
NYC 5 12 Borough Taxi:0.10M Ridership:3.03M 1 hour 02/01/2022-03/31/2024 17280 30
Epidemic Korea 16 45 City&Province SmartCards:13.41M Infection:25834 1 day 01/20/2020-08/31/2023 1320 272
NYC 5 12 Borough Taxi:2418 Infection:2038 1 day 03/01/2020-12/31/2023 1401 30

3. Experiments with baselines

All experiments are conducted on a server with an NVIDIA RTX 3090Ti GPU.

Baseline list

  • Linear-based: DLinear, NLinear
  • RNN-based: SegRNN
  • Transformer-based: Informer, Reformer, PatchTST
  • CNN-based: TimesNet
  • GNN-based: STGCN, MPNNLSTM

3-1. Requirements and Installations

3-2. How to prepare datasets

  • After you download the MOBINS dataset collection in the npy format files, please refer to the below file structure.
  • Each dataset has a NODE_TIME_SERIES_FEATURES folder, an OD_MOVEMENTS folder, and an SPATIAL_NETWORK.npy file (i.e. same file structure).
datasets
|_ README.md
|_ Transportation-Seoul
    |_ NODE_TIME_SERIES_FEATURES          # daily record files
       |_ ....
    |_ OD_MOVEMENTS                       # daily record files
       |_ ....
    |_ SPATIAL_NETWORK.npy
|_ Transportation-Busan
    |_ ...
|_ Transportation-Daegu
    |_ ...
|_ Transportation-NYC
    |_ ...
|_ Epidemic-Korea
    |_ ...
|_ Epidemic-NYC
    |_ ...
  • If you want to change your change dataset dir, please change dataset_loader.py > function load_datasets > ROOT_PATH
def load_datasets(dataset, khop=0, only_adj=False, ar_adj=False,opts=None):
    # Dataset directory
    ROOT_PATH = './dataset/'  # CHANGE YOUR DIR

3-3. Configuration

MOBINS was implemented in Python 3.11.5.

  • Edit main.py file to set experiment parameters (dataset, seq_length, gpu_id(e.g. 0,1,2,3,4,5), etc.)
python3 main.py

3-4. How to run

  • How to change the parameter options: Please refer to the parameter options below.
  --dataset: the name of dataset (string) (e.g) seoul, busan, daegu, nyc, korea_covid, nyc_covid
  --seq_day: the size of a lookback window (integer)
  --pred_day: the size of a prediction window (integer)
  --gpu_id: an integer gpu id
  • How to execute bash files: (e.g.) DLinear model about Epidemic-korea with diverse seq_day 4 and pred_day (7, 14, 30)
bash example_bash.sh
  • How to directly execute the Python file in the terminal: (e.g.) DLinear model about the Epidemic-korea dataset
python3 main.py --gpu_id 1 --model_name DLinear --gpu_id 0 --batch_size 8 --dataset korea_covid --seq_day 4 --pred_day 7 

4. Evaluate Your Models

  1. Upload your model into the comparisons folder.
  2. Add your model information on baseline.py file.
# first, add your model name on from comparisons import  
from comparisons import DLinear, Autoformer, PatchTST, TimesNet, Informer, NLinear, SegRNN, Reformer, STGCN, MPNNLSTM, your_model_name

# second, add your model name on model_dict
class Prediction(object):
   # .... other code 
    def build_models(self):
        ### Add your model ####################
        model_dict = {
            'Autoformer': Autoformer,
            'PatchTST': PatchTST,
            'DLinear': DLinear,
            'TimesNet': TimesNet,
            'Informer': Informer,
            'NLinear': NLinear,
            'SegRNN': SegRNN,
            'Reformer': Reformer,
            'STGCN': STGCN,
            'MPNNLSTM':MPNNLSTM,
            'your_model_name':your_model_name
        }
        #########################################
  1. Run your code.
python3 main.py --model_name 'your_model_name'

5. License

6. Data Source Reference

[note] All source websites support the official English version except Smart Transit Card Information System and Korea Disease Control and Prevention Agency. Therefore, we write down how to contact or use two source datasets.

7. Code Reference

we implemented our benchmark code based on Time Series Library (TSLib) .

Citation

@inproceedings{na2025mobility,
  title={Mobility Networked Time Series Benchmark Datasets},
  author={Na, Jihye, and Nam, Youngeun, and Yoon, Susik and Song, Hwanjun and Lee, Byung Suk and Lee, Jae-Gil},
  booktitle={ICWSM},
  year={2025},
}

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (No. 2023R1A2C2003690).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published