GitHub - JvThunder/BLIP: PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Our aim is to adapt BLIP to solve Image Retrieval Task, specifically for fashion domain. We added datasets such as Fashion200k, FashionIQ, CIRR. We also proposed a new architecture as shown below:

Data preparation

project_base_path

└───  cirr_dataset
       └─── train
            └─── 0
                | train-10108-0-img0.png
                | train-10108-0-img1.png
                | train-10108-1-img0.png
                | ...

            └─── 1
                | train-10056-0-img0.png
                | train-10056-0-img1.png
                | train-10056-1-img0.png
                | ...

            ...

       └─── dev
            | dev-0-0-img0.png
            | dev-0-0-img1.png
            | dev-0-1-img0.png
            | ...

       └─── test1
            | test1-0-0-img0.png
            | test1-0-0-img1.png
            | test1-0-1-img0.png
            | ...

       └─── cirr
            └─── captions
                | cap.rc2.test1.json
                | cap.rc2.train.json
                | cap.rc2.val.json

            └─── image_splits
                | split.rc2.test1.json
                | split.rc2.train.json
                | split.rc2.val.json

Preparing CIRR

Clone CIRR dataset git clone -b cirr_dataset https://github.com/Cuberick-Orion/CIRR.git cirr
Raw images from https://lil.nlp.cornell.edu/resources/NLVR2/

Preparing Fashion200k

The instructions to download Fashion 200k can be found here

For the generated test_queries.txt, we use the one generated by the authors of TIRG paper which can be found here.

The fashion-200k directory should follow the following structure and have these files.

/fashion-200k/labels/*.txt
/fashion-200k/women/<category>/<caption>/<id>/*.jpeg
/fashion-200k/test_queries.txt`

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
configs		configs
data		data
models		models
transform		transform
BLIP.gif		BLIP.gif
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.txt		LICENSE.txt
Original_BLIP_README.md		Original_BLIP_README.md
Proposed_Architecture.png		Proposed_Architecture.png
README.md		README.md
SECURITY.md		SECURITY.md
cog.yaml		cog.yaml
demo.ipynb		demo.ipynb
eval_nocaps.py		eval_nocaps.py
eval_retrieval_video.py		eval_retrieval_video.py
image.png		image.png
model_inspect.py		model_inspect.py
predict.py		predict.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt
train_caption.py		train_caption.py
train_cirr.py		train_cirr.py
train_fashion200k.py		train_fashion200k.py
train_fashioniq.py		train_fashioniq.py
train_nlvr.py		train_nlvr.py
train_retrieval.py		train_retrieval.py
train_vqa.py		train_vqa.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data preparation

Preparing CIRR

Preparing Fashion200k

About

Releases

Packages

Languages

License

JvThunder/BLIP

Folders and files

Latest commit

History

Repository files navigation

Data preparation

Preparing CIRR

Preparing Fashion200k

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages