-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3 from boomb0om/dev
First release
- Loading branch information
Showing
42 changed files
with
2,096 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,102 @@ | ||
# text2image-benchmark | ||
Benchmark for generative image models | ||
![](assets/logo.png) | ||
|
||
This project aims to unify the evaluation of generative text-to-image models and provide the ability to quickly and easily calculate most popular metrics. | ||
|
||
Core features: | ||
- **Unified** metrics and datasets for all models | ||
- **Reproducible** results | ||
- **User-friendly** interface for most popular metrics: FID, CLIP-score, IS | ||
|
||
## Table of Contents | ||
|
||
- [Introduction](#introduction) | ||
- [Installation](#installation) | ||
- [Getting started](#getting-started) | ||
- [Project Structure](#project-structure) | ||
- [Examples](#examples) | ||
- [Documentation](#documentation) | ||
- [Contribution](#contribution) | ||
- [Contacts](#contacts) | ||
- [Citing](#citing) | ||
- [Acknowledgments](#acknowledgments) | ||
|
||
## Introduction | ||
|
||
Generative text-to-image models have become a popular and widely used tool for users. | ||
There are many articles on the topic of image generation from text that present new, more advanced models. | ||
However, there is still no uniform way to measure the quality of such models. | ||
To address this issue, we provide an implementation of metrics to compare the quality of generative models. | ||
|
||
We propose to use the metric MS-COCO FID-30K with OpenAI's CLIP score, which has already become a standard for measuring the quality of text2image models. | ||
We provide the MS-COCO validation subset and precalculated metrics for it. | ||
We also recorded 30,000 descriptions that needs to be used to generate images for MS-COCO FID-30K. | ||
|
||
You can easily contribute your model into benchmark and make FID results reproducible! See more in [contribution](#contribution) section. | ||
|
||
## Installation | ||
|
||
```bash | ||
pip install git+https://github.com/boomb0om/text2image-benchmark | ||
``` | ||
|
||
## Getting started | ||
|
||
Calculate FID for two sets of images: | ||
|
||
```python | ||
from T2IBenchmark import calculate_fid | ||
|
||
fid, _ = calculate_fid('assets/images/cats/', 'assets/images/dogs/') | ||
print(fid) | ||
``` | ||
|
||
## Project Structure | ||
|
||
- `T2IBenchmark/` | ||
- `datasets/` - Datasets that can be used for evaluation | ||
- `coco2014/` - MS-COCO 2014 validation subset | ||
- `feature_extractors/` - Implementation of different neural nets used to extract features from images | ||
- `metrics/` - Implementation of metrics | ||
- `utils/` - Some utils | ||
- `docs/` - Documentation | ||
- `examples/` - Usage examples | ||
- `experiments/` - Experiments | ||
- `assets/` - Assets | ||
|
||
## Examples | ||
|
||
|
||
|
||
## Documentation | ||
|
||
|
||
|
||
## Contribution | ||
|
||
|
||
|
||
## Contacts | ||
|
||
If you have any question, please email `[email protected]`. | ||
|
||
## Citing | ||
|
||
If you use this repository in your research, consider citing it using the following Bibtex entry: | ||
|
||
``` | ||
@misc{boomb0omT2IBenchmark, | ||
author={Pavlov, I. and Ivanov, A. and Stafievskiy, S.}, | ||
title={{Text-to-Image Benchmark: A benchmark for generative models}}, | ||
howpublished={\url{https://github.com/boomb0om/text2image-benchmark}}, | ||
month={September}, | ||
year={2023}, | ||
note={Version 0.1.0}, | ||
} | ||
``` | ||
|
||
## Acknowledgments | ||
|
||
Thanks to: | ||
|
||
- [clean-fid](https://github.com/GaParmar/clean-fid/) - Explanation of influence of various parameters when calculating FID. | ||
- [pytorch-fid](https://github.com/mseitzer/pytorch-fid) - Port of the official implementation of Frechet Inception Distance to PyTorch. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from .pipelines import calculate_fid, calculate_clip_score | ||
from .model_wrapper import T2IModelWrapper, ModelWrapperDataloader | ||
from .metrics import FIDStats |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from .coco2014 import COCOImageDataset |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from .dataset import COCOImageDataset |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
from datasets import load_dataset | ||
|
||
from T2IBenchmark.loaders import ImageDataset | ||
from typing import Optional, Callable, Any | ||
from PIL import Image | ||
|
||
|
||
class COCOImageDataset(ImageDataset): | ||
|
||
def __init__(self, preprocess_fn: Optional[Callable[[Image.Image], Any]] = None): | ||
super().__init__(paths=[], preprocess_fn=preprocess_fn) | ||
self.ds = load_dataset("stasstaf/MS-COCO-validation")['test'] | ||
|
||
def __getitem__(self, idx: int) -> Any: | ||
image = self.ds[idx]['image'] | ||
preproc = self.preprocess_fn(image) | ||
return preproc | ||
|
||
def __len__(self) -> int: | ||
return len(self.ds) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
from .base_feature_extractor import BaseFeatureExtractor | ||
from .inceptionV3_feature_extractor import InceptionV3FE |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
from typing import Callable | ||
from abc import ABC, abstractmethod | ||
|
||
import numpy as np | ||
from PIL import Image | ||
import torch | ||
|
||
|
||
class BaseFeatureExtractor(ABC): | ||
""" | ||
A base class for feature extraction methods. | ||
This class serves as an interface for feature extraction techniques | ||
and should be subclassed for specific implementations, such as InceptionV3FE. | ||
""" | ||
|
||
@abstractmethod | ||
def get_preprocess_fn(self) -> Callable[[Image.Image], np.ndarray]: | ||
""" | ||
Get the preprocessing function for the input images. | ||
This function should be implemented by the subclass and should | ||
define the specific preprocessing steps needed for the feature | ||
extractor. | ||
Returns | ||
------- | ||
Callable[[Image.Image], np.ndarray] | ||
The preprocessing function that takes an input PIL.Image.Image and | ||
returns a preprocessed numpy array. | ||
""" | ||
pass | ||
|
||
@abstractmethod | ||
def forward(self, x: torch.Tensor) -> torch.Tensor: | ||
""" | ||
Perform the forward pass for the feature extractor. | ||
This function should be implemented by the subclass and should | ||
define the forward pass logic for the feature extractor. | ||
Parameters | ||
---------- | ||
x : torch.Tensor | ||
The input tensor to process. | ||
Returns | ||
------- | ||
torch.Tensor | ||
The output tensor with the extracted features. | ||
""" | ||
pass |
Oops, something went wrong.