PixLore

Welcome to the repository of our state-of-the-art image captioning model. We have combined the strengths of the BLIP-2 (Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models) model with LoRa (Low-Rank Adaptation of Large Language Models) to create an effective and precise image captioning tool. Our dataset, rich in image descriptions, has been automatically labeled using a combination of multi-modal models.

Results

Check out our GitHub Page! https://diegobonilla98.github.io/PixLore/

Project Overview

The main objective of this project is to generate detailed and accurate captions for a wide range of images. By combining the power of BLIP-2 and LoRa, we aim to make a significant leap in the image captioning domain.

Dataset Overview

Our dataset consists of 11,000 examples that have been automatically labeled using an assortment of multi-modal models.

Data Structure

Images: High-quality images randomly sampled from the COCO dataset. Captions: Rich and detailed descriptions automatically generated for each image.

Model Architecture

We employ the BLIP-2 model architecture, known for effectively combining the strengths of image encoders and large language models. To further enhance its performance, we have fine-tuned the model using LoRa, a technique that provides low-rank adaptation for large language models.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
examples		examples
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PixLore

Results

Project Overview

Dataset Overview

Data Structure

Model Architecture

About

Releases

Packages

Languages

diegobonilla98/PixLore

Folders and files

Latest commit

History

Repository files navigation

PixLore

Results

Project Overview

Dataset Overview

Data Structure

Model Architecture

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages