Reproducibility 3/3 (#1924)

* make tests deterministic * run slow tests * prepare for testing * finish * refactor * add print statements * finish more * correct some test failures * more fixes * set up to correct tests * more corrections * up * fix more * more prints * add * up * up * up * uP * uP * more fixes * uP * up * up * up * up * fix more * up * up * clean tests * up * up * up * more fixes * Apply suggestions from code review Co-authored-by: Suraj Patil <[email protected]> * make * correct * finish * finish Co-authored-by: Suraj Patil <[email protected]>
huggingface · Jan 25, 2023 · 6ba2231 · 6ba2231
1 parent 008c22d
commit 6ba2231
Show file tree

Hide file tree

Showing 47 changed files with 673 additions and 448 deletions.
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -32,6 +32,8 @@
       title: Text-Guided Depth-to-Image
     - local: using-diffusers/reusing_seeds
       title: Reusing seeds for deterministic generation
+    - local: using-diffusers/reproducibility
+      title: Reproducibility
     - local: using-diffusers/custom_pipeline_examples
       title: Community Pipelines
     - local: using-diffusers/contribute_pipeline

diff --git a/docs/source/en/using-diffusers/reproducibility.mdx b/docs/source/en/using-diffusers/reproducibility.mdx
@@ -0,0 +1,159 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Reproducibility
+
+Before reading about reproducibility for Diffusers, it is strongly recommended to take a look at 
+[PyTorch's statement about reproducibility](https://pytorch.org/docs/stable/notes/randomness.html).
+
+PyTorch states that 
+> *completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms.*
+While one can never expect the same results across platforms, one can expect results to be reproducible 
+across releases, platforms, etc... within a certain tolerance. However, this tolerance strongly varies 
+depending on the diffusion pipeline and checkpoint.
+
+In the following, we show how to best control sources of randomness for diffusion models.
+
+## Inference
+
+During inference, diffusion pipelines heavily rely on random sampling operations, such as the creating the 
+gaussian noise tensors to be denoised and adding noise to the scheduling step.
+
+Let's have a look at an example. We run the [DDIM pipeline](./api/pipelines/ddim.mdx) 
+for just two inference steps and return a numpy tensor to look into the numerical values of the output.
+
+```python
+from diffusers import DDIMPipeline
+import numpy as np
+
+model_id = "google/ddpm-cifar10-32"
+
+# load model and scheduler
+ddim = DDIMPipeline.from_pretrained(model_id)
+
+# run pipeline for just two steps and return numpy tensor
+image = ddim(num_inference_steps=2, output_type="np").images
+print(np.abs(image).sum())
+```
+
+Running the above prints a value of 1464.2076, but running it again prints a different 
+value of 1495.1768. What is going on here? Every time the pipeline is run, gaussian noise 
+is created and step-wise denoised. To create the gaussian noise with [`torch.randn`](https://pytorch.org/docs/stable/generated/torch.randn.html), a different random seed is taken every time, thus leading to a different result.
+This is a desired property of diffusion pipelines, as it means that the pipeline can create a different random image every time it is run. In many cases, one would like to generate the exact same image of a certain 
+run, for which case an instance of a [PyTorch generator](https://pytorch.org/docs/stable/generated/torch.randn.html) has to be passed:
+
+```python
+import torch
+from diffusers import DDIMPipeline
+import numpy as np
+
+model_id = "google/ddpm-cifar10-32"
+
+# load model and scheduler
+ddim = DDIMPipeline.from_pretrained(model_id)
+
+# create a generator for reproducibility
+generator = torch.Generator(device="cpu").manual_seed(0)
+
+# run pipeline for just two steps and return numpy tensor
+image = ddim(num_inference_steps=2, output_type="np", generator=generator).images
+print(np.abs(image).sum())
+```
+
+Running the above always prints a value of 1491.1711 - also upon running it again because we 
+define the generator object to be passed to all random functions of the pipeline.
+
+If you run this code snippet on your specific hardware and version, you should get a similar, if not the same, result.
+
+<Tip>
+
+It might be a bit unintuitive at first to pass `generator` objects to the pipelines instead of 
+just integer values representing the seed, but this is the recommended design when dealing with 
+probabilistic models in PyTorch as generators are *random states* that are advanced and can thus be 
+passed to multiple pipelines in a sequence.
+
+</Tip>
+
+Great! Now, we know how to write reproducible pipelines, but it gets a bit trickier since the above example only runs on the CPU. How do we also achieve reproducibility on GPU? 
+In short, one should not expect full reproducibility across different hardware when running pipelines on GPU 
+as matrix multiplications are less deterministic on GPU than on CPU and diffusion pipelines tend to require
+a lot of matrix multiplications. Let's see what we can do to keep the randomness within limits across 
+different GPU hardware.
+
+To achieve maximum speed performance, it is recommended to create the generator directly on GPU when running 
+the pipeline on GPU:
+
+```python
+import torch
+from diffusers import DDIMPipeline
+import numpy as np
+
+model_id = "google/ddpm-cifar10-32"
+
+# load model and scheduler
+ddim = DDIMPipeline.from_pretrained(model_id)
+ddim.to("cuda")
+
+# create a generator for reproducibility
+generator = torch.Generator(device="cuda").manual_seed(0)
+
+# run pipeline for just two steps and return numpy tensor
+image = ddim(num_inference_steps=2, output_type="np", generator=generator).images
+print(np.abs(image).sum())
+```
+
+Running the above now prints a value of 1389.8634 - even though we're using the exact same seed! 
+This is unfortunate as it means we cannot reproduce the results we achieved on GPU, also on CPU.
+Nevertheless, it should be expected since the GPU uses a different random number generator than the CPU.
+
+To circumvent this problem, we created a [`randn_tensor`](#diffusers.utils.randn_tensor) function, which can create random noise 
+on the CPU and then move the tensor to GPU if necessary. The function is used everywhere inside the pipelines allowing the user to **always** pass a CPU generator even if the pipeline is run on GPU:
+
+```python
+import torch
+from diffusers import DDIMPipeline
+import numpy as np
+
+model_id = "google/ddpm-cifar10-32"
+
+# load model and scheduler
+ddim = DDIMPipeline.from_pretrained(model_id)
+ddim.to("cuda")
+
+# create a generator for reproducibility
+generator = torch.manual_seed(0)
+
+# run pipeline for just two steps and return numpy tensor
+image = ddim(num_inference_steps=2, output_type="np", generator=generator).images
+print(np.abs(image).sum())
+```
+
+Running the above now prints a value of 1491.1713, much closer to the value of 1491.1711 when 
+the pipeline is fully run on the CPU.
+
+<Tip>
+
+As a consequence, we recommend always passing a CPU generator if Reproducibility is important.
+The loss of performance is often neglectable, but one can be sure to generate much more similar 
+values than if the pipeline would have been run on CPU.
+
+</Tip>
+
+Finally, we noticed that more complex pipelines, such as [`UnCLIPPipeline`] are often extremely 
+susceptible to precision error propagation and thus one cannot expect even similar results across 
+different GPU hardware or PyTorch versions. In such cases, one has to make sure to run 
+exactly the same hardware and PyTorch version for full Reproducibility.
+
+## Randomness utilities
+
+### randn_tensor
+[[autodoc]] diffusers.utils.randn_tensor
diff --git a/src/diffusers/pipelines/ddim/pipeline_ddim.py b/src/diffusers/pipelines/ddim/pipeline_ddim.py
@@ -17,7 +17,7 @@
 import torch
 
 from ...schedulers import DDIMScheduler
-from ...utils import deprecate, randn_tensor
+from ...utils import randn_tensor
 from ..pipeline_utils import DiffusionPipeline, ImagePipelineOutput
 
 
@@ -78,24 +78,6 @@ def __call__(
             True, otherwise a `tuple. When returning a tuple, the first element is a list with the generated images.
         """
 
-        if (
-            generator is not None
-            and isinstance(generator, torch.Generator)
-            and generator.device.type != self.device.type
-            and self.device.type != "mps"
-        ):
-            message = (
-                f"The `generator` device is `{generator.device}` and does not match the pipeline "
-                f"device `{self.device}`, so the `generator` will be ignored. "
-                f'Please use `generator=torch.Generator(device="{self.device}")` instead.'
-            )
-            deprecate(
-                "generator.device == 'cpu'",
-                "0.13.0",
-                message,
-            )
-            generator = None
-
         # Sample gaussian noise to begin loop
         if isinstance(self.unet.sample_size, int):
             image_shape = (batch_size, self.unet.in_channels, self.unet.sample_size, self.unet.sample_size)

diff --git a/src/diffusers/utils/__init__.py b/src/diffusers/utils/__init__.py
@@ -76,6 +76,7 @@
         load_numpy,
         nightly,
         parse_flag_from_env,
+        print_tensor_test,
         require_torch_gpu,
         slow,
         torch_all_close,

diff --git a/src/diffusers/utils/testing_utils.py b/src/diffusers/utils/testing_utils.py
@@ -8,7 +8,7 @@
 from distutils.util import strtobool
 from io import BytesIO, StringIO
 from pathlib import Path
-from typing import Union
+from typing import Optional, Union
 
 import numpy as np
 
@@ -45,6 +45,21 @@ def torch_all_close(a, b, *args, **kwargs):
     return True
 
 
+def print_tensor_test(tensor, filename="test_corrections.txt", expected_tensor_name="expected_slice"):
+    test_name = os.environ.get("PYTEST_CURRENT_TEST")
+    if not torch.is_tensor(tensor):
+        tensor = torch.from_numpy(tensor)
+
+    tensor_str = str(tensor.detach().cpu().flatten().to(torch.float32)).replace("\n", "")
+    # format is usually:
+    # expected_slice = np.array([-0.5713, -0.3018, -0.9814, 0.04663, -0.879, 0.76, -1.734, 0.1044, 1.161])
+    output_str = tensor_str.replace("tensor", f"{expected_tensor_name} = np.array")
+    test_file, test_class, test_fn = test_name.split("::")
+    test_fn = test_fn.split()[0]
+    with open(filename, "a") as f:
+        print(";".join([test_file, test_class, test_fn, output_str]), file=f)
+
+
 def get_tests_dir(append_path=None):
     """
     Args:
@@ -150,9 +165,13 @@ def require_onnxruntime(test_case):
     return unittest.skipUnless(is_onnx_available(), "test requires onnxruntime")(test_case)
 
 
-def load_numpy(arry: Union[str, np.ndarray]) -> np.ndarray:
+def load_numpy(arry: Union[str, np.ndarray], local_path: Optional[str] = None) -> np.ndarray:
     if isinstance(arry, str):
-        if arry.startswith("http://") or arry.startswith("https://"):
+        # local_path = "/home/patrick_huggingface_co/"
+        if local_path is not None:
+            # local_path can be passed to correct images of tests
+            return os.path.join(local_path, "/".join([arry.split("/")[-5], arry.split("/")[-2], arry.split("/")[-1]]))
+        elif arry.startswith("http://") or arry.startswith("https://"):
             response = requests.get(arry)
             response.raise_for_status()
             arry = np.load(BytesIO(response.content))

diff --git a/tests/models/test_models_vae.py b/tests/models/test_models_vae.py
@@ -166,7 +166,7 @@ def get_sd_vae_model(self, model_id="CompVis/stable-diffusion-v1-4", fp16=False)
 
     def get_generator(self, seed=0):
         if torch_device == "mps":
-            return torch.Generator().manual_seed(seed)
+            return torch.manual_seed(seed)
         return torch.Generator(device=torch_device).manual_seed(seed)
 
     @parameterized.expand(

diff --git a/tests/pipelines/altdiffusion/test_alt_diffusion.py b/tests/pipelines/altdiffusion/test_alt_diffusion.py
@@ -188,6 +188,7 @@ def test_alt_diffusion_pndm(self):
         expected_slice = np.array(
             [0.51605093, 0.5707241, 0.47365507, 0.50578886, 0.5633877, 0.4642503, 0.5182081, 0.48763484, 0.49084237]
         )
+
         assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
 
 
@@ -207,20 +208,16 @@ def test_alt_diffusion(self):
         alt_pipe.set_progress_bar_config(disable=None)
 
         prompt = "A painting of a squirrel eating a burger"
-        generator = torch.Generator(device=torch_device).manual_seed(0)
-        with torch.autocast("cuda"):
-            output = alt_pipe(
-                [prompt], generator=generator, guidance_scale=6.0, num_inference_steps=20, output_type="np"
-            )
+        generator = torch.manual_seed(0)
+        output = alt_pipe([prompt], generator=generator, guidance_scale=6.0, num_inference_steps=20, output_type="np")
 
         image = output.images
 
         image_slice = image[0, -3:, -3:, -1]
 
         assert image.shape == (1, 512, 512, 3)
-        expected_slice = np.array(
-            [0.8720703, 0.87109375, 0.87402344, 0.87109375, 0.8779297, 0.8925781, 0.8823242, 0.8808594, 0.8613281]
-        )
+        expected_slice = np.array([0.1010, 0.0800, 0.0794, 0.0885, 0.0843, 0.0762, 0.0769, 0.0729, 0.0586])
+
         assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
 
     def test_alt_diffusion_fast_ddim(self):
@@ -231,44 +228,14 @@ def test_alt_diffusion_fast_ddim(self):
         alt_pipe.set_progress_bar_config(disable=None)
 
         prompt = "A painting of a squirrel eating a burger"
-        generator = torch.Generator(device=torch_device).manual_seed(0)
+        generator = torch.manual_seed(0)
 
-        with torch.autocast("cuda"):
-            output = alt_pipe([prompt], generator=generator, num_inference_steps=2, output_type="numpy")
+        output = alt_pipe([prompt], generator=generator, num_inference_steps=2, output_type="numpy")
         image = output.images
 
         image_slice = image[0, -3:, -3:, -1]
 
         assert image.shape == (1, 512, 512, 3)
-        expected_slice = np.array(
-            [0.9267578, 0.9301758, 0.9013672, 0.9345703, 0.92578125, 0.94433594, 0.9423828, 0.9423828, 0.9160156]
-        )
-        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
-
-    def test_alt_diffusion_text2img_pipeline_fp16(self):
-        torch.cuda.reset_peak_memory_stats()
-        model_id = "BAAI/AltDiffusion"
-        pipe = AltDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None)
-        pipe = pipe.to(torch_device)
-        pipe.set_progress_bar_config(disable=None)
-
-        prompt = "a photograph of an astronaut riding a horse"
+        expected_slice = np.array([0.4019, 0.4052, 0.3810, 0.4119, 0.3916, 0.3982, 0.4651, 0.4195, 0.5323])
 
-        generator = torch.Generator(device=torch_device).manual_seed(0)
-        output_chunked = pipe(
-            [prompt], generator=generator, guidance_scale=7.5, num_inference_steps=10, output_type="numpy"
-        )
-        image_chunked = output_chunked.images
-
-        generator = torch.Generator(device=torch_device).manual_seed(0)
-        with torch.autocast(torch_device):
-            output = pipe(
-                [prompt], generator=generator, guidance_scale=7.5, num_inference_steps=10, output_type="numpy"
-            )
-            image = output.images
-
-        # Make sure results are close enough
-        diff = np.abs(image_chunked.flatten() - image.flatten())
-        # They ARE different since ops are not run always at the same precision
-        # however, they should be extremely close.
-        assert diff.mean() < 2e-2
+        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
diff --git a/tests/pipelines/altdiffusion/test_alt_diffusion_img2img.py b/tests/pipelines/altdiffusion/test_alt_diffusion_img2img.py
@@ -162,6 +162,7 @@ def test_stable_diffusion_img2img_default_case(self):
         expected_slice = np.array(
             [0.41293705, 0.38656747, 0.40876025, 0.4782187, 0.4656803, 0.41394007, 0.4142093, 0.47150758, 0.4570448]
         )
+
         assert np.abs(image_slice.flatten() - expected_slice).max() < 1.5e-3
         assert np.abs(image_from_tuple_slice.flatten() - expected_slice).max() < 1.5e-3
 
@@ -196,7 +197,7 @@ def test_stable_diffusion_img2img_fp16(self):
         alt_pipe.set_progress_bar_config(disable=None)
 
         prompt = "A painting of a squirrel eating a burger"
-        generator = torch.Generator(device=torch_device).manual_seed(0)
+        generator = torch.manual_seed(0)
         image = alt_pipe(
             [prompt],
             generator=generator,
@@ -227,7 +228,7 @@ def test_stable_diffusion_img2img_pipeline_multiple_of_8(self):
 
         prompt = "A fantasy landscape, trending on artstation"
 
-        generator = torch.Generator(device=torch_device).manual_seed(0)
+        generator = torch.manual_seed(0)
         output = pipe(
             prompt=prompt,
             image=init_image,
@@ -241,7 +242,8 @@ def test_stable_diffusion_img2img_pipeline_multiple_of_8(self):
         image_slice = image[255:258, 383:386, -1]
 
         assert image.shape == (504, 760, 3)
-        expected_slice = np.array([0.3252, 0.3340, 0.3418, 0.3263, 0.3346, 0.3300, 0.3163, 0.3470, 0.3427])
+        expected_slice = np.array([0.9358, 0.9397, 0.9599, 0.9901, 1.0000, 1.0000, 0.9882, 1.0000, 1.0000])
+
         assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-3
 
 
@@ -275,7 +277,7 @@ def test_stable_diffusion_img2img_pipeline_default(self):
 
         prompt = "A fantasy landscape, trending on artstation"
 
-        generator = torch.Generator(device=torch_device).manual_seed(0)
+        generator = torch.manual_seed(0)
         output = pipe(
             prompt=prompt,
             image=init_image,