UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models

Yujie Li, Wenjia Xu*, Guangzuo Li, Zijian Yu, Zhiwei Wei, Jiuniu Wang and Mugen Peng

* Corresponding author

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications; Aerospace Information Research Institute, Chinese Academy of Sciences; School of Geographic Sciences, Hunan Normal University; Department of Computer Science, City University of Hong Kong

Overview

UniRS is a visual language model (VLM) that integrates multi-temporal remote sensing parsing capabilities. The model can parse three types of remote sensing inputs (i.e., single images, dual-temporal image pairs, and videos) and give text responses based on user instructions. We adopt a modular design that adapts to each task, design an inference mechanism that can fully utilize the prior knowledge of the base model ,VILA-1.5, and perform joint fine-tuning on large-scale datasets, ultimately obtaining a large remote sensing visual language model with excellent generalization capabilities on multi-temporal remote sensing tasks.

💡 News

[2025/1] The training code is released!
[2024/12] Paper is on Arxiv!

Installation

./environment_setup.sh

or follow the instructions below in order.

conda create -n unirs python=3.10 -y
conda activate unirs

pip install --upgrade pip  # enable PEP 660 support
# this is optional if you prefer to system built-in nvcc.
conda install -c nvidia cuda-toolkit -y
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.4.2/flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install -e .
pip install -e ".[train]"

pip install git+https://github.com/huggingface/[email protected]
site_pkg_path=$(python -c 'import site; print(site.getsitepackages()[0])')
cp -rv ./llava/train/transformers_replace/* $site_pkg_path/transformers/

Dataset

We mixed three datasets for joint training of UniRS, namely GeoChat-Instruct, LEVIR-CC and ERA.

Training

Evaluations

Checkpoints

🔒 License

The code is released under the Apache 2.0 license as found in the LICENSE file.
The pretrained weights are released under the CC-BY-NC-SA-4.0 license.
The service is a research preview intended for non-commercial use only, and is subject to the following licenses and terms:
- Model License of LLaMA. For LLAMA3-VILA checkpoints terms of use, please refer to the LLAMA3 License for additional details.
- Dataset Licenses for each one used during training.

Citations

@misc{li2024unirsunifyingmultitemporalremote,
      title={UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models},
      author={Yujie Li and Wenjia Xu and Guangzuo Li and Zijian Yu and Zhiwei Wei and Jiuniu Wang and Mugen Peng},
      year={2024},
      eprint={2412.20742},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.20742},
}

Acknowledgement

LLaVA, GeoChat and VILA: the codebase we built upon. Thanks for their wonderful work.
SigLIP: for open-sourcing SigLIP (used in UniRS).
GeoChat-Instruct, LEVIR-CC and ERA: the amazing open-sourced datasets!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data_prepare		data_prepare
images		images
inference_test		inference_test
llava		llava
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment_setup.sh		environment_setup.sh
export_peft_model.py		export_peft_model.py
predict.py		predict.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models

Yujie Li, Wenjia Xu*, Guangzuo Li, Zijian Yu, Zhiwei Wei, Jiuniu Wang and Mugen Peng

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications; Aerospace Information Research Institute, Chinese Academy of Sciences; School of Geographic Sciences, Hunan Normal University; Department of Computer Science, City University of Hong Kong

Overview

Contents

💡 News

Installation

Dataset

Training

Evaluations

Checkpoints

🔒 License

Citations

Acknowledgement

About

Languages

License

IntelliSensing/UniRS

Folders and files

Latest commit

History

Repository files navigation

UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models

Yujie Li, Wenjia Xu*, Guangzuo Li, Zijian Yu, Zhiwei Wei, Jiuniu Wang and Mugen Peng

Overview

Contents

💡 News

Installation

Dataset

Training

Evaluations

Checkpoints

🔒 License

Citations

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages