This is the official repository of MOBINS (MOBIlity Networked time Series), a novel dataset collection designed for networked time-series forecasting of movements.
Human mobility is crucial for urban planning (e.g., public transportation) and epidemic response strategies. However, existing research often neglects integrating comprehensive perspectives on spatial dynamics, temporal trends, and other contextual views due to the limitations of existing mobility datasets. To bridge this gap, we introduce MOBINS (MOBIlity Networked time Series), a novel dataset collection designed for networked time-series forecasting of dynamic human movements. MOBINS features diverse and explainable datasets that capture various mobility patterns across different transportation modes in four cities and two countries and cover both transportation and epidemic domains at the administrative area level. Our experiments with nine baseline methods reveal the significant impact of different model backbones on the proposed six datasets. We provide a valuable resource for advancing urban mobility research, and our dataset collection is available at DOI 10.5281/zenodo.14590709.
Due to the big dataset size, we released it on the anonymous drive.
csv format
data: all csv files link. If you want to download each dataset, please use the below links.Transportation-Seoul
: csv file linkTransportation-Busan
: csv file linkTransportation-Daegu
: csv file linkTransportation-NYC
: csv file linkEpidemic-Korea
: csv file linkEpidemic-NYC
: csv file link
npy format
dataset to train the baselines: all npy files link. If you want to download each dataset, please use the below links.Transportation-Seoul
: npy file linkTransportation-Busan
: npy file linkTransportation-Daegu
: npy file linkTransportation-NYC
: npy file linkEpidemic-Korea
: npy file linkEpidemic-NYC
: npy file link
Additional information
-
csv format
datasets in every environment: each dataset has three components.-
SPATIAL_NETWORK.csv
: ($n * n$ where$n$ = # of nodes )- Column name list: INDEX,
$N_{0}$ ,$N_{1}$ ,$\dots$ ,$N_{n}$ - INDEX list:
$N_{0}$ ,$N_{1}$ ,$\dots$ ,$N_{n}$
- Column name list: INDEX,
-
NODE_TIME_SERIES_FEATURES.csv
: ($t$ *$p$ ) * ($n$ *$d$ ) where$t$ = # of timestamps in a day,$p$ = total period, and$d$ = # of variables from time series- Column name list: datetime,
$N_{0}$ _{VARIABLE_NAME},$N_{1}$ _{VARIABLE_NAME},$\dots$ ,$N_{n}$ _{VARIABLE_NAME} - VARIABLE_NAME list: Transportation-[Seoul, Busan, Deagu]} datasets (INFLOW, OUTFLOW), Transportation-NYC dataset (RIDERSHIP), Epidemic-[Korea, NYC] dataset (INFECTION)
- Column name list: datetime,
-
OD_MOVEMENTS.csv
: ($t$ *$p$ ) * ($n$ ,$n$ )- Column name list:
$N_{0}$ _$N_{0}$ ,$N_{0}$ _$N_{1}$ ,$N_{0}$ _$N_{2}$ ,$\dots$ ,$N_{n}$ _$N_{n-1}$ ,$N_{n}$ _$N_{n}$
- Column name list:
-
-
npy format
datasets for directly training the models on the Python environments: each dataset has three components.-
adj_matrix
(shape: ($n$ ,$n$ )) where$n$ = # of nodes -
node_npy
(each file is constructed in a daily manner with shape: ($n$ ,$t$ ,$d$ )) where$t$ = # of timestamps in a day and$d$ = # of variables from time series -
od_npy
(each file is constructed in a daily manner with shape: ($n$ ,$n$ ,$t$ ))
-
More detailed metadata is located in the datasets/README.md
.
- Overview of dataset description
- Target dimension: OD Movements + Time Series =
$n^2 + d*n$ where$n$ = # of nodes and$d$ = # of variables
Dataset | Locations | Spatial node units | Edges | Domain | Daily Movements | Daily Amounts | Time interval | Time Range | Frames | Target dimension |
---|---|---|---|---|---|---|---|---|---|---|
Transportation | Seoul | 128 | 290 | Station-based administrative area | SmartCard:2.68M | In/Out-flow:4.02M | 1 hour | 01/01/2022-12/31/2023 | 17520 | 16640 |
Busan | 60 | 121 | Station-based administrative area | SmartCard:0.63M | In/Out-flow:0.75M | 1 hour | 01/01/2021-12/31/2023 | 26280 | 3720 | |
Daegu | 61 | 123 | Station-based administrative area | SmartCard:0.10M | In/Out-flow:0.34M | 1 hour | 01/01/2021-12/31/2023 | 26280 | 3843 | |
NYC | 5 | 12 | Borough | Taxi:0.10M | Ridership:3.03M | 1 hour | 02/01/2022-03/31/2024 | 17280 | 30 | |
Epidemic | Korea | 16 | 45 | City&Province | SmartCards:13.41M | Infection:25834 | 1 day | 01/20/2020-08/31/2023 | 1320 | 272 |
NYC | 5 | 12 | Borough | Taxi:2418 | Infection:2038 | 1 day | 03/01/2020-12/31/2023 | 1401 | 30 |
All experiments are conducted on a server with an NVIDIA RTX 3090Ti GPU.
Baseline list
- Linear-based:
DLinear
,NLinear
- RNN-based:
SegRNN
- Transformer-based:
Informer
,Reformer
,PatchTST
- CNN-based:
TimesNet
- GNN-based:
STGCN
,MPNNLSTM
- Node.js: 16.13.2+
- Anaconda 4 or Miniconda 3
- Python 3.11.5 (Recommend Anaconda)
- Ubuntu 18.04.6 LTS
- pytorch >= 2.1.2
- After you download the MOBINS dataset collection in the
npy format
files, please refer to the below file structure. - Each dataset has a
NODE_TIME_SERIES_FEATURES
folder, anOD_MOVEMENTS
folder, and anSPATIAL_NETWORK.npy
file (i.e. same file structure).
datasets
|_ README.md
|_ Transportation-Seoul
|_ NODE_TIME_SERIES_FEATURES # daily record files
|_ ....
|_ OD_MOVEMENTS # daily record files
|_ ....
|_ SPATIAL_NETWORK.npy
|_ Transportation-Busan
|_ ...
|_ Transportation-Daegu
|_ ...
|_ Transportation-NYC
|_ ...
|_ Epidemic-Korea
|_ ...
|_ Epidemic-NYC
|_ ...
- If you want to change your change dataset dir, please change
dataset_loader.py
> functionload_datasets
>ROOT_PATH
def load_datasets(dataset, khop=0, only_adj=False, ar_adj=False,opts=None):
# Dataset directory
ROOT_PATH = './dataset/' # CHANGE YOUR DIR
MOBINS was implemented in Python 3.11.5.
- Edit main.py file to set experiment parameters (dataset, seq_length, gpu_id(e.g. 0,1,2,3,4,5), etc.)
python3 main.py
- How to change the parameter options: Please refer to the parameter options below.
--dataset: the name of dataset (string) (e.g) seoul, busan, daegu, nyc, korea_covid, nyc_covid
--seq_day: the size of a lookback window (integer)
--pred_day: the size of a prediction window (integer)
--gpu_id: an integer gpu id
- How to execute bash files: (e.g.) DLinear model about
Epidemic-korea
with diverse seq_day 4 and pred_day (7, 14, 30)
bash example_bash.sh
- How to directly execute the Python file in the terminal: (e.g.) DLinear model about the
Epidemic-korea
dataset
python3 main.py --gpu_id 1 --model_name DLinear --gpu_id 0 --batch_size 8 --dataset korea_covid --seq_day 4 --pred_day 7
- Upload your model into the
comparisons
folder. - Add your model information on
baseline.py
file.
# first, add your model name on from comparisons import
from comparisons import DLinear, Autoformer, PatchTST, TimesNet, Informer, NLinear, SegRNN, Reformer, STGCN, MPNNLSTM, your_model_name
# second, add your model name on model_dict
class Prediction(object):
# .... other code
def build_models(self):
### Add your model ####################
model_dict = {
'Autoformer': Autoformer,
'PatchTST': PatchTST,
'DLinear': DLinear,
'TimesNet': TimesNet,
'Informer': Informer,
'NLinear': NLinear,
'SegRNN': SegRNN,
'Reformer': Reformer,
'STGCN': STGCN,
'MPNNLSTM':MPNNLSTM,
'your_model_name':your_model_name
}
#########################################
- Run your code.
python3 main.py --model_name 'your_model_name'
- The
Transportation-[Seoul, Busan, Daegu, NYC]
andEpidemic-NYC
datasets are released under a CC BY-NC 4.0 International License. - The
Epidemic-Korea
datasets are released under a CC BY-NC-ND 4.0 International License. - Our code implementation is released under the MIT License.
-
References of Origin-Destination Movements
Transportation-Seoul
: Korea Public Data Portal and Smart Transit Card Information SystemTransportation-[Busan,Daegu]
: Smart Transit Card Information SystemTransportation-NYC
: NYC Taxi and Limousine Commission(TLC)Epidemic-Korea
: Smart transit card information systemEpidemic-NYC
: NYC Taxi and Limousine Commission(TLC)
-
References of Time Series
Transportation-Seoul
: Korea Public Data Portal (Seoul subway line 1-8 and line 9)Transportation-[Busan,Daegu]
: Korea Public Data Portal (Busan and Daegu)Transportation-NYC
: NYC Data PortalEpidemic-Korea
: Korea Disease Control and Prevention AgencyEpidemic-NYC
: NYC Health
[note] All source websites support the official English version except Smart Transit Card Information System
and Korea Disease Control and Prevention Agency
. Therefore, we write down how to contact or use two source datasets.
- Uses of
Smart Transit Card Information System
: Please contact this email ([email protected]). - Time Series of
Epidemic-Korea
: direct download link. If you want to contact the reference, please use this official English link.
we implemented our benchmark code based on Time Series Library (TSLib) .
- DLinear: https://github.com/cure-lab/LTSF-Linear
- NLinear: https://github.com/cure-lab/LTSF-Linear
- SegRNN: https://github.com/lss-1138/SegRNN
- Informer: https://github.com/zhouhaoyi/Informer2020
- Reformer: https://github.com/lucidrains/reformer-pytorch
- PatchTST: https://github.com/yuqinie98/PatchTST
- TimesNet: https://github.com/thuml/TimesNet
- STGCN: https://github.com/hazdzz/STGCN
- MPNNLSTM: https://github.com/geopanag/pandemic_tgnn
@inproceedings{na2025mobility,
title={Mobility Networked Time Series Benchmark Datasets},
author={Na, Jihye, and Nam, Youngeun, and Yoon, Susik and Song, Hwanjun and Lee, Byung Suk and Lee, Jae-Gil},
booktitle={ICWSM},
year={2025},
}
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (No. 2023R1A2C2003690).