Skip to content

UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models

License

Notifications You must be signed in to change notification settings

IntelliSensing/UniRS

Repository files navigation

UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models

* Corresponding author

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications; Aerospace Information Research Institute, Chinese Academy of Sciences; School of Geographic Sciences, Hunan Normal University; Department of Computer Science, City University of Hong Kong

Website paper Python 3.10+


Overview

UniRS is a visual language model (VLM) that integrates multi-temporal remote sensing parsing capabilities. The model can parse three types of remote sensing inputs (i.e., single images, dual-temporal image pairs, and videos) and give text responses based on user instructions. We adopt a modular design that adapts to each task, design an inference mechanism that can fully utilize the prior knowledge of the base model ,VILA-1.5, and perform joint fine-tuning on large-scale datasets, ultimately obtaining a large remote sensing visual language model with excellent generalization capabilities on multi-temporal remote sensing tasks.


Contents

💡 News

  • [2025/1] The training code is released!
  • [2024/12] Paper is on Arxiv!

Installation

./environment_setup.sh

or follow the instructions below in order.

conda create -n unirs python=3.10 -y
conda activate unirs

pip install --upgrade pip  # enable PEP 660 support
# this is optional if you prefer to system built-in nvcc.
conda install -c nvidia cuda-toolkit -y
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.4.2/flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install -e .
pip install -e ".[train]"

pip install git+https://github.com/huggingface/[email protected]
site_pkg_path=$(python -c 'import site; print(site.getsitepackages()[0])')
cp -rv ./llava/train/transformers_replace/* $site_pkg_path/transformers/

Dataset

We mixed three datasets for joint training of UniRS, namely GeoChat-Instruct, LEVIR-CC and ERA.

Training

Evaluations

Checkpoints

🔒 License

  • The code is released under the Apache 2.0 license as found in the LICENSE file.
  • The pretrained weights are released under the CC-BY-NC-SA-4.0 license.
  • The service is a research preview intended for non-commercial use only, and is subject to the following licenses and terms:

Citations

@misc{li2024unirsunifyingmultitemporalremote,
      title={UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models},
      author={Yujie Li and Wenjia Xu and Guangzuo Li and Zijian Yu and Zhiwei Wei and Jiuniu Wang and Mugen Peng},
      year={2024},
      eprint={2412.20742},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.20742},
}

Acknowledgement

About

UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models

Resources

License

Stars

Watchers

Forks