$ conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
$ pip install ftfy regex tqdm torchinfo
$ pip install git+https://github.com/openai/CLIP.git
Dataset:
Dataset Link
*Note: only need left color images of object data set (12 GB) and training labels of object data set (5 MB).
# And you must organize files into the following structure:
kitti_dataset
├── testing
| └── image_2 #Only including testing img files
└── training
├── image_2 #Only including training img files
└── label_2 #Only including txt files
# You should modify the path of your training image_2 folder by yourself in the script (Line 4: kitti_label_file_path).
python text_generation.py
# Replace "../KITTI_DATASET_ROOT/training/image_2/" into the path of your training image_2 folder.
# General fine tune on whole model
python train.py --kitti_image_file_path "../KITTI_DATASET_ROOT/training/image_2/"
# Using adapter to fine tune
python train.py --adapter --kitti_image_file_path "../KITTI_DATASET_ROOT/training/image_2/"
# Using vpt to fine tune
python train.py --prompt --vpt_version 1or2 --kitti_image_file_path "../KITTI_DATASET_ROOT/training/image_2/"
# Replace "../KITTI_DATASET_ROOT/training/image_2/" into the path of your training image_2 folder.
# General fine tune on whole model
python test.py --kitti_image_file_path "../KITTI_DATASET_ROOT/training/image_2/"
# Using adapter to fine tune
python test.py --adapter --kitti_image_file_path "../KITTI_DATASET_ROOT/training/image_2/"
# Using vpt to fine tune
python test.py --prompt --vpt_version 1or2 --kitti_image_file_path "../KITTI_DATASET_ROOT/training/image_2/"
This repo benefits from CLIP, AIM, and VPT. Thanks for their wonderful works.