Merge pull request #3 from boomb0om/dev

First release
boomb0om · Sep 8, 2023 · 2a15b98 · 2a15b98
2 parents 77e15a3 + ec57547
commit 2a15b98
Show file tree

Hide file tree

Showing 42 changed files with 2,096 additions and 3 deletions.
diff --git a/.gitignore b/.gitignore
@@ -157,4 +157,4 @@ cython_debug/
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
+.idea/
diff --git a/README.md b/README.md
@@ -1,2 +1,102 @@
-# text2image-benchmark
-Benchmark for generative image models
+![](assets/logo.png)
+
+This project aims to unify the evaluation of generative text-to-image models and provide the ability to quickly and easily calculate most popular metrics.
+
+Core features:
+- **Unified** metrics and datasets for all models
+- **Reproducible** results
+- **User-friendly** interface for most popular metrics: FID, CLIP-score, IS
+
+## Table of Contents
+
+- [Introduction](#introduction)
+- [Installation](#installation)
+- [Getting started](#getting-started)
+- [Project Structure](#project-structure)
+- [Examples](#examples)
+- [Documentation](#documentation)
+- [Contribution](#contribution)
+- [Contacts](#contacts)
+- [Citing](#citing)
+- [Acknowledgments](#acknowledgments)
+
+## Introduction
+
+Generative text-to-image models have become a popular and widely used tool for users. 
+There are many articles on the topic of image generation from text that present new, more advanced models.
+However, there is still no uniform way to measure the quality of such models.
+To address this issue, we provide an implementation of metrics to compare the quality of generative models.
+
+We propose to use the metric MS-COCO FID-30K with OpenAI's CLIP score, which has already become a standard for measuring the quality of text2image models. 
+We provide the MS-COCO validation subset and precalculated metrics for it. 
+We also recorded 30,000 descriptions that needs to be used to generate images for MS-COCO FID-30K.
+
+You can easily contribute your model into benchmark and make FID results reproducible! See more in [contribution](#contribution) section.
+
+## Installation
+
+```bash
+pip install git+https://github.com/boomb0om/text2image-benchmark
+```
+
+## Getting started
+
+Calculate FID for two sets of images:
+
+```python
+from T2IBenchmark import calculate_fid
+
+fid, _ = calculate_fid('assets/images/cats/', 'assets/images/dogs/')
+print(fid)
+```
+
+## Project Structure
+
+- `T2IBenchmark/`
+  - `datasets/` - Datasets that can be used for evaluation
+    - `coco2014/` - MS-COCO 2014 validation subset
+  - `feature_extractors/` - Implementation of different neural nets used to extract features from images
+  - `metrics/` - Implementation of metrics
+  - `utils/` - Some utils
+- `docs/` - Documentation
+- `examples/` - Usage examples
+- `experiments/` - Experiments
+- `assets/` - Assets
+
+## Examples
+
+
+
+## Documentation
+
+
+
+## Contribution
+
+
+
+## Contacts
+
+If you have any question, please email `[email protected]`.
+
+## Citing
+
+If you use this repository in your research, consider citing it using the following Bibtex entry:
+
+```
+@misc{boomb0omT2IBenchmark,
+  author={Pavlov, I. and Ivanov, A. and Stafievskiy, S.},
+  title={{Text-to-Image Benchmark: A benchmark for generative models}},
+  howpublished={\url{https://github.com/boomb0om/text2image-benchmark}},
+  month={September},
+  year={2023},
+  note={Version 0.1.0},
+}
+```
+
+## Acknowledgments
+
+Thanks to:
+
+- [clean-fid](https://github.com/GaParmar/clean-fid/) - Explanation of influence of various parameters when calculating FID.
+- [pytorch-fid](https://github.com/mseitzer/pytorch-fid) - Port of the official implementation of Frechet Inception Distance to PyTorch.
diff --git a/T2IBenchmark/__init__.py b/T2IBenchmark/__init__.py
@@ -0,0 +1,3 @@
+from .pipelines import calculate_fid, calculate_clip_score
+from .model_wrapper import T2IModelWrapper, ModelWrapperDataloader
+from .metrics import FIDStats
diff --git a/T2IBenchmark/datasets/__init__.py b/T2IBenchmark/datasets/__init__.py
@@ -0,0 +1 @@
+from .coco2014 import COCOImageDataset
diff --git a/T2IBenchmark/datasets/coco2014/__init__.py b/T2IBenchmark/datasets/coco2014/__init__.py
@@ -0,0 +1 @@
+from .dataset import COCOImageDataset
diff --git a/T2IBenchmark/datasets/coco2014/dataset.py b/T2IBenchmark/datasets/coco2014/dataset.py
@@ -0,0 +1,20 @@
+from datasets import load_dataset
+
+from T2IBenchmark.loaders import ImageDataset
+from typing import Optional, Callable, Any
+from PIL import Image
+
+
+class COCOImageDataset(ImageDataset):
+
+    def __init__(self, preprocess_fn: Optional[Callable[[Image.Image], Any]] = None):
+        super().__init__(paths=[], preprocess_fn=preprocess_fn)
+        self.ds = load_dataset("stasstaf/MS-COCO-validation")['test']
+
+    def __getitem__(self, idx: int) -> Any:
+        image = self.ds[idx]['image']
+        preproc = self.preprocess_fn(image)
+        return preproc
+
+    def __len__(self) -> int:
+        return len(self.ds)
diff --git a/T2IBenchmark/feature_extractors/__init__.py b/T2IBenchmark/feature_extractors/__init__.py
@@ -0,0 +1,2 @@
+from .base_feature_extractor import BaseFeatureExtractor
+from .inceptionV3_feature_extractor import InceptionV3FE
diff --git a/T2IBenchmark/feature_extractors/base_feature_extractor.py b/T2IBenchmark/feature_extractors/base_feature_extractor.py
@@ -0,0 +1,52 @@
+from typing import Callable
+from abc import ABC, abstractmethod
+
+import numpy as np
+from PIL import Image
+import torch
+
+
+class BaseFeatureExtractor(ABC):
+    """
+    A base class for feature extraction methods.
+
+    This class serves as an interface for feature extraction techniques
+    and should be subclassed for specific implementations, such as InceptionV3FE.
+    """
+
+    @abstractmethod
+    def get_preprocess_fn(self) -> Callable[[Image.Image], np.ndarray]:
+        """
+        Get the preprocessing function for the input images.
+
+        This function should be implemented by the subclass and should
+        define the specific preprocessing steps needed for the feature
+        extractor.
+
+        Returns
+        -------
+        Callable[[Image.Image], np.ndarray]
+            The preprocessing function that takes an input PIL.Image.Image and
+            returns a preprocessed numpy array.
+        """
+        pass
+
+    @abstractmethod
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """
+        Perform the forward pass for the feature extractor.
+
+        This function should be implemented by the subclass and should
+        define the forward pass logic for the feature extractor.
+
+        Parameters
+        ----------
+        x : torch.Tensor
+            The input tensor to process.
+
+        Returns
+        -------
+        torch.Tensor
+            The output tensor with the extracted features.
+        """
+        pass
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		from .base_feature_extractor import BaseFeatureExtractor
		from .inceptionV3_feature_extractor import InceptionV3FE