Skip to content
/ BLIP Public
forked from salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

License

Notifications You must be signed in to change notification settings

JvThunder/BLIP

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Our aim is to adapt BLIP to solve Image Retrieval Task, specifically for fashion domain. We added datasets such as Fashion200k, FashionIQ, CIRR. We also proposed a new architecture as shown below: Architecture

Data preparation

project_base_path

└───  cirr_dataset
       └─── train
            └─── 0
                | train-10108-0-img0.png
                | train-10108-0-img1.png
                | train-10108-1-img0.png
                | ...

            └─── 1
                | train-10056-0-img0.png
                | train-10056-0-img1.png
                | train-10056-1-img0.png
                | ...

            ...

       └─── dev
            | dev-0-0-img0.png
            | dev-0-0-img1.png
            | dev-0-1-img0.png
            | ...

       └─── test1
            | test1-0-0-img0.png
            | test1-0-0-img1.png
            | test1-0-1-img0.png
            | ...

       └─── cirr
            └─── captions
                | cap.rc2.test1.json
                | cap.rc2.train.json
                | cap.rc2.val.json

            └─── image_splits
                | split.rc2.test1.json
                | split.rc2.train.json
                | split.rc2.val.json


Preparing CIRR

  1. Clone CIRR dataset git clone -b cirr_dataset https://github.com/Cuberick-Orion/CIRR.git cirr
  2. Raw images from https://lil.nlp.cornell.edu/resources/NLVR2/

Preparing Fashion200k

The instructions to download Fashion 200k can be found here

For the generated test_queries.txt, we use the one generated by the authors of TIRG paper which can be found here.

The fashion-200k directory should follow the following structure and have these files.

/fashion-200k/labels/*.txt
/fashion-200k/women/<category>/<caption>/<id>/*.jpeg
/fashion-200k/test_queries.txt`

About

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 69.6%
  • Python 30.4%