Skip to content

Commit

Permalink
Merge pull request #3 from boomb0om/dev
Browse files Browse the repository at this point in the history
First release
  • Loading branch information
boomb0om authored Sep 8, 2023
2 parents 77e15a3 + ec57547 commit 2a15b98
Show file tree
Hide file tree
Showing 42 changed files with 2,096 additions and 3 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -157,4 +157,4 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/
104 changes: 102 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,102 @@
# text2image-benchmark
Benchmark for generative image models
![](assets/logo.png)

This project aims to unify the evaluation of generative text-to-image models and provide the ability to quickly and easily calculate most popular metrics.

Core features:
- **Unified** metrics and datasets for all models
- **Reproducible** results
- **User-friendly** interface for most popular metrics: FID, CLIP-score, IS

## Table of Contents

- [Introduction](#introduction)
- [Installation](#installation)
- [Getting started](#getting-started)
- [Project Structure](#project-structure)
- [Examples](#examples)
- [Documentation](#documentation)
- [Contribution](#contribution)
- [Contacts](#contacts)
- [Citing](#citing)
- [Acknowledgments](#acknowledgments)

## Introduction

Generative text-to-image models have become a popular and widely used tool for users.
There are many articles on the topic of image generation from text that present new, more advanced models.
However, there is still no uniform way to measure the quality of such models.
To address this issue, we provide an implementation of metrics to compare the quality of generative models.

We propose to use the metric MS-COCO FID-30K with OpenAI's CLIP score, which has already become a standard for measuring the quality of text2image models.
We provide the MS-COCO validation subset and precalculated metrics for it.
We also recorded 30,000 descriptions that needs to be used to generate images for MS-COCO FID-30K.

You can easily contribute your model into benchmark and make FID results reproducible! See more in [contribution](#contribution) section.

## Installation

```bash
pip install git+https://github.com/boomb0om/text2image-benchmark
```

## Getting started

Calculate FID for two sets of images:

```python
from T2IBenchmark import calculate_fid

fid, _ = calculate_fid('assets/images/cats/', 'assets/images/dogs/')
print(fid)
```

## Project Structure

- `T2IBenchmark/`
- `datasets/` - Datasets that can be used for evaluation
- `coco2014/` - MS-COCO 2014 validation subset
- `feature_extractors/` - Implementation of different neural nets used to extract features from images
- `metrics/` - Implementation of metrics
- `utils/` - Some utils
- `docs/` - Documentation
- `examples/` - Usage examples
- `experiments/` - Experiments
- `assets/` - Assets

## Examples



## Documentation



## Contribution



## Contacts

If you have any question, please email `[email protected]`.

## Citing

If you use this repository in your research, consider citing it using the following Bibtex entry:

```
@misc{boomb0omT2IBenchmark,
author={Pavlov, I. and Ivanov, A. and Stafievskiy, S.},
title={{Text-to-Image Benchmark: A benchmark for generative models}},
howpublished={\url{https://github.com/boomb0om/text2image-benchmark}},
month={September},
year={2023},
note={Version 0.1.0},
}
```

## Acknowledgments

Thanks to:

- [clean-fid](https://github.com/GaParmar/clean-fid/) - Explanation of influence of various parameters when calculating FID.
- [pytorch-fid](https://github.com/mseitzer/pytorch-fid) - Port of the official implementation of Frechet Inception Distance to PyTorch.
3 changes: 3 additions & 0 deletions T2IBenchmark/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .pipelines import calculate_fid, calculate_clip_score
from .model_wrapper import T2IModelWrapper, ModelWrapperDataloader
from .metrics import FIDStats
1 change: 1 addition & 0 deletions T2IBenchmark/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .coco2014 import COCOImageDataset
1 change: 1 addition & 0 deletions T2IBenchmark/datasets/coco2014/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .dataset import COCOImageDataset
20 changes: 20 additions & 0 deletions T2IBenchmark/datasets/coco2014/dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from datasets import load_dataset

from T2IBenchmark.loaders import ImageDataset
from typing import Optional, Callable, Any
from PIL import Image


class COCOImageDataset(ImageDataset):

def __init__(self, preprocess_fn: Optional[Callable[[Image.Image], Any]] = None):
super().__init__(paths=[], preprocess_fn=preprocess_fn)
self.ds = load_dataset("stasstaf/MS-COCO-validation")['test']

def __getitem__(self, idx: int) -> Any:
image = self.ds[idx]['image']
preproc = self.preprocess_fn(image)
return preproc

def __len__(self) -> int:
return len(self.ds)
2 changes: 2 additions & 0 deletions T2IBenchmark/feature_extractors/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .base_feature_extractor import BaseFeatureExtractor
from .inceptionV3_feature_extractor import InceptionV3FE
52 changes: 52 additions & 0 deletions T2IBenchmark/feature_extractors/base_feature_extractor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
from typing import Callable
from abc import ABC, abstractmethod

import numpy as np
from PIL import Image
import torch


class BaseFeatureExtractor(ABC):
"""
A base class for feature extraction methods.
This class serves as an interface for feature extraction techniques
and should be subclassed for specific implementations, such as InceptionV3FE.
"""

@abstractmethod
def get_preprocess_fn(self) -> Callable[[Image.Image], np.ndarray]:
"""
Get the preprocessing function for the input images.
This function should be implemented by the subclass and should
define the specific preprocessing steps needed for the feature
extractor.
Returns
-------
Callable[[Image.Image], np.ndarray]
The preprocessing function that takes an input PIL.Image.Image and
returns a preprocessed numpy array.
"""
pass

@abstractmethod
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""
Perform the forward pass for the feature extractor.
This function should be implemented by the subclass and should
define the forward pass logic for the feature extractor.
Parameters
----------
x : torch.Tensor
The input tensor to process.
Returns
-------
torch.Tensor
The output tensor with the extracted features.
"""
pass
Loading

0 comments on commit 2a15b98

Please sign in to comment.