Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image output of NVDS is invalid #28

Open
jparismorgan opened this issue Jan 23, 2024 · 9 comments
Open

Image output of NVDS is invalid #28

jparismorgan opened this issue Jan 23, 2024 · 9 comments

Comments

@jparismorgan
Copy link

jparismorgan commented Jan 23, 2024

Hi there, I'm trying to run infer_NVDS_dpt_bi.py on some images with (NVDS) ~/repo/NVDS python infer_NVDS_dpt_bi.py --base_dir ./demo_outputs/dpt_init/000423/ --vnum 000423 --infer_w 896 --infer_h 384 using an M2 Macbook Pro (I changed the device to use cpu).

When I run, the normal depth network results look okay, i.e. demo_outputs/dpt_init/000423/initial/gray/frame_000000.png:

frame_000000

But the results of the NVDS forward pass are all black, i.e. demo_outputs/dpt_init/000423/1/gray/frame_000000.png:

frame_000000

and demo_outputs/dpt_init/000423/1/color/0.png:

0

These are my installed packages:

(NVDS) ~/repo/NVDS conda list                                   ✹ ✭main 
# packages in environment at /opt/homebrew/anaconda3/envs/NVDS:
#
# Name                    Version                   Build  Channel
absl-py                   2.1.0                    pypi_0    pypi
appnope                   0.1.2           py38hca03da5_1001  
asttokens                 2.0.5              pyhd3eb1b0_0  
attr                      0.3.2                    pypi_0    pypi
backcall                  0.2.0              pyhd3eb1b0_0  
blas                      1.0                    openblas  
brotli-python             1.0.9            py38hc377ac9_7  
ca-certificates           2023.12.12           hca03da5_0  
cachetools                5.3.2                    pypi_0    pypi
certifi                   2023.11.17       py38hca03da5_0  
cffi                      1.16.0           py38h80987f9_0  
chardet                   5.2.0                    pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
comm                      0.1.2            py38hca03da5_0  
contourpy                 1.1.1                    pypi_0    pypi
cryptography              41.0.3           py38h3c57c4d_0  
cycler                    0.12.1                   pypi_0    pypi
debugpy                   1.6.7            py38h313beb8_0  
decorator                 5.1.1              pyhd3eb1b0_0  
einops                    0.7.0                    pypi_0    pypi
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
executing                 0.8.3              pyhd3eb1b0_0  
filelock                  3.13.1           py38hca03da5_0  
fonttools                 4.47.2                   pypi_0    pypi
freetype                  2.12.1               h1192e45_0  
fsspec                    2023.12.2                pypi_0    pypi
future                    0.18.3             pyhd8ed1ab_0    conda-forge
giflib                    5.2.1                h80987f9_3  
gmp                       6.2.1                hc377ac9_3  
gmpy2                     2.1.2            py38h8c48613_0  
google-auth               2.26.2                   pypi_0    pypi
google-auth-oauthlib      1.0.0                    pypi_0    pypi
grpcio                    1.60.0                   pypi_0    pypi
h5py                      3.10.0                   pypi_0    pypi
huggingface-hub           0.20.3                   pypi_0    pypi
idna                      3.6                      pypi_0    pypi
imageio                   2.33.1                   pypi_0    pypi
importlib-metadata        7.0.1            py38hca03da5_0  
importlib-resources       6.1.1                    pypi_0    pypi
importlib_metadata        7.0.1                hd3eb1b0_0  
ipykernel                 6.29.0             pyh3cd1d5f_0    conda-forge
ipython                   8.12.3                   pypi_0    pypi
jedi                      0.18.1           py38hca03da5_1  
jinja2                    3.1.2            py38hca03da5_0  
jpeg                      9e                   h80987f9_1  
jupyter_client            8.6.0            py38hca03da5_0  
jupyter_core              5.5.0            py38hca03da5_0  
kiwisolver                1.4.5                    pypi_0    pypi
lazy-loader               0.3                      pypi_0    pypi
lcms2                     2.12                 hba8e193_0  
lerc                      3.0                  hc377ac9_0  
libblas                   3.9.0           21_osxarm64_openblas    conda-forge
libcblas                  3.9.0           21_osxarm64_openblas    conda-forge
libcxx                    14.0.6               h848a8c0_0  
libdeflate                1.17                 h80987f9_1  
libffi                    3.4.4                hca03da5_0  
libgfortran               5.0.0           11_3_0_hca03da5_28  
libgfortran5              11.3.0              h009349e_28  
liblapack                 3.9.0           21_osxarm64_openblas    conda-forge
libopenblas               0.3.21               h269037a_0  
libpng                    1.6.39               h80987f9_0  
libprotobuf               3.20.3               h514c7bf_0  
libsodium                 1.0.18               h1a28f6b_0  
libtiff                   4.5.1                h313beb8_0  
libuv                     1.44.2               h80987f9_0  
libwebp                   1.3.2                ha3663a8_0  
libwebp-base              1.3.2                h80987f9_0  
llvm-openmp               14.0.6               hc6e5704_0  
lz4-c                     1.9.4                h313beb8_0  
markdown                  3.5.2                    pypi_0    pypi
markupsafe                2.1.4                    pypi_0    pypi
matplotlib                3.7.4                    pypi_0    pypi
matplotlib-inline         0.1.6            py38hca03da5_0  
mpc                       1.1.0                h8c48613_1  
mpfr                      4.0.2                h695f6f0_1  
mpmath                    1.3.0            py38hca03da5_0  
natsort                   8.4.0                    pypi_0    pypi
ncurses                   6.4                  h313beb8_0  
nest-asyncio              1.5.6            py38hca03da5_0  
networkx                  3.1              py38hca03da5_0  
ninja                     1.10.2               hca03da5_5  
ninja-base                1.10.2               h525c30c_5  
numpy                     1.24.3           py38h1398885_0  
numpy-base                1.24.3           py38h90707a3_0  
oauthlib                  3.2.2                    pypi_0    pypi
olefile                   0.47               pyhd8ed1ab_0    conda-forge
opencv-python             4.9.0.80                 pypi_0    pypi
openjpeg                  2.3.0                h7a6adac_2  
openssl                   1.1.1w               h1a28f6b_0  
packaging                 23.1             py38hca03da5_0  
parso                     0.8.3              pyhd3eb1b0_0  
pexpect                   4.9.0                    pypi_0    pypi
pickleshare               0.7.5           pyhd3eb1b0_1003  
pillow                    10.2.0                   pypi_0    pypi
pip                       23.3.1           py38hca03da5_0  
platformdirs              3.10.0           py38hca03da5_0  
prompt-toolkit            3.0.43           py38hca03da5_0  
prompt_toolkit            3.0.42               hd8ed1ab_0    conda-forge
protobuf                  4.25.2                   pypi_0    pypi
psutil                    5.9.0            py38h1a28f6b_0  
ptyprocess                0.7.0              pyhd3eb1b0_2  
pure_eval                 0.2.2              pyhd3eb1b0_0  
pyasn1                    0.5.1                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0  
pygments                  2.15.1           py38hca03da5_1  
pyopenssl                 23.2.0           py38hca03da5_0  
pyparsing                 3.1.1                    pypi_0    pypi
pysocks                   1.7.1            py38hca03da5_0  
python                    3.8.13               hbdb9e5c_1  
python-dateutil           2.8.2              pyhd3eb1b0_0  
python_abi                3.8                      2_cp38    conda-forge
pytorch                   2.1.0           gpu_mps_py38h87e4ab7_100  
pytorch-cpu               1.9.0           cpu_py38hd610c6a_2    conda-forge
pywavelets                1.4.1                    pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     25.1.2           py38h313beb8_0  
readline                  8.2                  h1a28f6b_0  
requests                  2.31.0           py38hca03da5_0  
requests-oauthlib         1.3.1                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
safetensors               0.4.2                    pypi_0    pypi
scikit-image              0.21.0                   pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
setuptools                68.2.2           py38hca03da5_0  
six                       1.16.0             pyhd3eb1b0_1  
sleef                     3.5.1                h80987f9_2  
sqlite                    3.41.2               h80987f9_0  
stack-data                0.6.3                    pypi_0    pypi
stack_data                0.2.0              pyhd3eb1b0_0  
sympy                     1.12             py38hca03da5_0  
tensorboard               2.14.0                   pypi_0    pypi
tensorboard-data-server   0.7.2                    pypi_0    pypi
tifffile                  2023.7.10                pypi_0    pypi
timm                      0.9.12                   pypi_0    pypi
tk                        8.6.12               hb8d0fd4_0  
torchvision               0.15.2          cpu_py38h31aa045_0  
tornado                   6.3.3            py38h80987f9_0  
tqdm                      4.66.1                   pypi_0    pypi
traitlets                 5.7.1            py38hca03da5_0  
typing-extensions         4.9.0            py38hca03da5_1  
typing_extensions         4.9.0            py38hca03da5_1  
urllib3                   2.1.0                    pypi_0    pypi
wcwidth                   0.2.5              pyhd3eb1b0_0  
werkzeug                  3.0.1                    pypi_0    pypi
wheel                     0.41.2           py38hca03da5_0  
xz                        5.4.5                h80987f9_0  
zeromq                    4.3.5                h313beb8_0  
zipp                      3.17.0           py38hca03da5_0  
zlib                      1.2.13               h5a0b063_0  
zstd                      1.5.5                hd90d995_0  

Curious if you have seen this before or have any suggestions to debug this? Thank you!

@jparismorgan
Copy link
Author

jparismorgan commented Jan 23, 2024

In order to debug this more, and because it seems generally useful to others, I created a simple version of infer_NVDS_dpt_bi.py which should work with CUDA, MPS on Mac, or with CPU as a fallback. It just does the forward smoothing pass. Running it seemed to confirm something is not working with NVDS, as I get this from the first depth image created by compute_depth():

0

And in smooth_depth() I can see this is ref_seq (if I stack the images):

0_ref_seq

But I get this from the first depth image created by smooth_depth():

0

Here is the code - perhaps you understand it better and will see that something is wrong with it? Or maybe you could run it on your machine and see if it works for you? You can run it with python smooth.py --input_dir ./demo_videos/000423/left --infer_w 896 --infer_h 384 --output_dir ./input/smoothed_5. Thank you for any help!!

Note that I have only implemented the if i<=2: case right now, not the elif i>=3 and i<=8: or elif i>=9:. But that just means we should be smoothing using the same image stacked several times, so seems like it should not cause issues. Once this is working I plan to add the elif i>=3 and i<=8: and elif i>=9: cases.

import os
import argparse
import cv2
import numpy as np
import matplotlib.pyplot as plt
from backbone import *
import torch 
from networks import *
from full_model import *
import glob
from smooth_loss import *
from dpt.models import DPTDepthModel
from natsort import natsorted
import torchvision

# To get this to work on Mac with MPS I had to update to pytorch 2.1.0:
# - conda update pytorch
# - pip install chardet 
# Also added:
# - pip install natsort

torch.backends.cudnn.enabled = True
torch.backends.cudnn.benchmark = True

__mean = [0.485, 0.456, 0.406]
__std = [0.229, 0.224, 0.225]
__mean_dpt = [0.5, 0.5, 0.5]
__std_dpt = [0.5, 0.5, 0.5]

def get_args_parser():
    parser = argparse.ArgumentParser()

    parser.add_argument(
        "--input_dir",
        type=str
    )
    parser.add_argument(
        "--output_dir",
        type=str
    )
    parser.add_argument(
        "--infer_w",
        default= '896',
        type=int
    )
    parser.add_argument(
        "--infer_h",
        default= '384',
        type=int
    )

    return parser

def img_loader(path):
    image = cv2.imread(path)
    if image.ndim == 2:
        image = cv2.cvtColor(image, cv2.COLOR_GRAY2BGR)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) / 255.0
    return image

def gt_png_loader(path):
    depth = cv2.imread(path, -1) #/ 65535.0
    return depth.astype(np.float32)

def get_depth_model(device):
  dpt = DPTDepthModel(
            path='./dpt/checkpoints/dpt_large-midas-2f21e586.pt',
            backbone="vitl16_384",
            non_negative=True,
            enable_attention_hooks=False,
        ).to(device)
  dpt.eval()
  return dpt

def get_nvds_model(device):
    checkpoint = torch.load('./NVDS_checkpoints/NVDS_Stabilizer.pth', map_location = 'cpu') 
    model = NVDS()
    model = torch.nn.DataParallel(model, device_ids=[0]) # .cuda()
    model.load_state_dict(checkpoint)
    model.to(device) 
    model.eval()
    return model

def compute_depth(image_paths, device, infer_size, save_disparity_dir, save_colorized_dir):
  os.makedirs(save_disparity_dir, exist_ok=True)
  os.makedirs(save_colorized_dir, exist_ok=True)

  dpt = get_depth_model(device)

  for i in range(len(image_paths)):
    frame = image_paths[i]
    rgb = img_loader(frame)
    rgb = cv2.resize(rgb, infer_size, interpolation=cv2.INTER_CUBIC)
    rgb = (rgb - __mean_dpt) / __std_dpt
    rgb = np.transpose(rgb, (2, 0, 1))
    rgb = torch.Tensor(np.ascontiguousarray(rgb).astype(np.float32)).unsqueeze(0)
    rgb = rgb.to(device)
    with torch.no_grad():
        outputs = dpt.forward(rgb)
        print(f"[{i}][{frame}]")
        print(f"  min(outputs): {torch.min(outputs)}, mean(outputs): {torch.mean(outputs)}, max(outputs): {torch.max(outputs)}")
        plt.imsave(
            save_colorized_dir+str(i)+'.png',
            outputs.cpu().numpy().squeeze(), 
            cmap='Spectral_r',
            vmin =np.min(outputs.cpu().numpy().squeeze()), 
            vmax = np.max(outputs.cpu().numpy().squeeze()))
        outputs = outputs.cpu().numpy().squeeze()
        depth_min = outputs.min()
        depth_max = outputs.max()
        outputs = 65535.0 * (outputs - depth_min) / (depth_max - depth_min)
        cv2.imwrite(
            save_disparity_dir+str(i)+'.png',
            outputs.astype("uint16"), 
            [cv2.IMWRITE_PNG_COMPRESSION, 0])

def smooth_depth(image_paths, depth_paths, device, infer_size, save_disparity_dir, save_colorized_dir, save_test_dir):
  os.makedirs(save_disparity_dir, exist_ok=True)
  os.makedirs(save_colorized_dir, exist_ok=True)
  os.makedirs(save_test_dir, exist_ok=True)
  
  model = get_nvds_model(device)
  seq_len = 4

  for i in range(len(image_paths)):
    frame = image_paths[i]
    depth_frame = depth_paths[i]
    print(f"[{i}][{frame}][{depth_frame}]")

    # Get the RGB image.
    img = img_loader(frame)
    img = cv2.resize(img, infer_size, interpolation=cv2.INTER_CUBIC)
    img = (img - __mean) / __std
    img = np.transpose(img, (2, 0, 1))
    img = torch.Tensor(np.ascontiguousarray(img).astype(np.float32))
    img = img.unsqueeze(0)
    torchvision.utils.save_image(img, save_test_dir+str(i)+'_img.png',)

    # Get the depth image.
    depth = gt_png_loader(depth_frame)
    depth = cv2.resize(depth, infer_size, interpolation=cv2.INTER_NEAREST)
    depth = torch.Tensor(np.ascontiguousarray(depth.astype(np.float32))).unsqueeze(0)
    depth = (depth-torch.min(depth))/(torch.max(depth)-torch.min(depth))
    depth = depth.unsqueeze(0)
    torchvision.utils.save_image(depth, save_test_dir+str(i)+'_depth.png',)

    # Create a RGBD image.
    rgbd = torch.cat([img, depth],dim=1)
    torchvision.utils.save_image(rgbd, save_test_dir+str(i)+'_rgbd.png',)

    # Set up the input to the NVDS model.
    for j in range(seq_len):
        if j == 0:
            ref_seq = rgbd
        else:
            ref_seq = torch.cat([ref_seq,rgbd],dim=0)
    ref_seq = ref_seq.unsqueeze(0)
    ref_seq = ref_seq.to(device)
    print(f"  Using ref_seq.shape {ref_seq.shape}")
    concatenated_ref_seq_img = torch.cat([ref_seq[0, i, :4] for i in range(ref_seq.shape[1])], dim=2)
    torchvision.utils.save_image(concatenated_ref_seq_img, f"{save_test_dir}{str(i)}_ref_seq.png")

    # Run the NVDS smoothing model.
    with torch.no_grad():
        outputs = model(ref_seq)
        print(f"  min(outputs): {torch.min(outputs)}, mean(outputs): {torch.mean(outputs)}, max(outputs): {torch.max(outputs)}")
        outputs = outputs.squeeze(1)

        plt.imsave(
            save_colorized_dir+str(i)+'.png',
            outputs.cpu().numpy().squeeze(), 
            cmap='Spectral_r',
            vmin =np.min(outputs.cpu().numpy().squeeze()),
            vmax = np.max(outputs.cpu().numpy().squeeze()))
        
        outputs = outputs.cpu().numpy().squeeze()
        depth_min = outputs.min()
        depth_max = outputs.max()
        outputs = 65535.0 * (outputs - depth_min) / (depth_max - depth_min)
        cv2.imwrite(
            save_disparity_dir+str(i)+'.png',
            outputs.astype("uint16"),
            [cv2.IMWRITE_PNG_COMPRESSION, 0])

def main(args):
    os.makedirs(args.output_dir, exist_ok=True)
    
    if torch.cuda.is_available():
        device = torch.device("cuda:0")
    elif torch.backends.mps.is_available() and torch.backends.mps.is_built():
        device = torch.device("mps:0")
    else:
        device = torch.device("cpu")
    infer_size = (int(args.infer_w),int(args.infer_h))
    print(f"Using device: {device}, infer_size: {infer_size}")

    print('Computing depth maps.')
    frames = natsorted(glob.glob(args.input_dir + '/*.png'))
    save_disparity_dir=args.output_dir + '/0/gray/'
    compute_depth(
       image_paths=frames,
       device=device,
       infer_size=infer_size,
       save_disparity_dir=save_disparity_dir,
       save_colorized_dir=args.output_dir + '/0/color/')
    
    print('Smoothing with a forward pass.')
    depth_frames = natsorted(glob.glob(save_disparity_dir + '/*.png'))
    smooth_depth(
       image_paths=frames,
       depth_paths=depth_frames,
       device=device,
       infer_size=infer_size,
       save_disparity_dir=args.output_dir + '/1/gray/',
       save_colorized_dir=args.output_dir + '/1/color/',
       save_test_dir=args.output_dir + '/1/test/'
    )


if __name__ == '__main__':
  print('Starting.')
  parser = get_args_parser()
  args = parser.parse_args()
  main(args)
  print('Done.')

PS. Here are my logs if it helps:

(NVDS) ~/repo/NVDS python smooth.py --input_dir ./demo_videos/000423/left --infer_w 896 --infer_h 384 --output_dir ./input/smoothed_5
Starting.
Using device: mps, infer_size: (896, 384)

Computing depth maps.
[0][./demo_videos/000423/left/frame_000000.png]
  min(outputs): 0.0, mean(outputs): 6.7235894203186035, max(outputs): 37.926063537597656
[1][./demo_videos/000423/left/frame_000001.png]
  min(outputs): 0.0, mean(outputs): 7.067801475524902, max(outputs): 38.03575134277344
...
[123][./demo_videos/000423/left/frame_000123.png]
  min(outputs): 0.0, mean(outputs): 7.774233818054199, max(outputs): 37.75785446166992
Computing 124 depth maps took 60.48824691772461 seconds - 2.0499850188857898 fps

Smoothing with a forward pass.
/opt/homebrew/anaconda3/envs/NVDS/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /private/var/folders/k1/30mswbxs7r1g6zwn8y4fyt500000gp/T/abs_5ae0635zuj/croot/pytorch-select_1700511177724/work/aten/src/ATen/native/TensorShape.cpp:3527.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[0][./demo_videos/000423/left/frame_000000.png][./input/smoothed_5/0/gray/0.png]
  Using ref_seq.shape torch.Size([1, 4, 4, 384, 896])
/opt/homebrew/anaconda3/envs/NVDS/lib/python3.8/site-packages/torch/nn/functional.py:4756: UserWarning: The operator 'aten::im2col' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /private/var/folders/k1/30mswbxs7r1g6zwn8y4fyt500000gp/T/abs_5ae0635zuj/croot/pytorch-select_1700511177724/work/aten/src/ATen/mps/MPSFallback.mm:13.)
  return torch._C._nn.im2col(input, _pair(kernel_size), _pair(dilation), _pair(padding), _pair(stride))
  min(outputs): 0.02816329151391983, mean(outputs): 0.111056387424469, max(outputs): 0.21592478454113007
[1][./demo_videos/000423/left/frame_000001.png][./input/smoothed_5/0/gray/1.png]
  Using ref_seq.shape torch.Size([1, 4, 4, 384, 896])
  min(outputs): 0.019930804148316383, mean(outputs): 0.10676073282957077, max(outputs): 0.20711131393909454
...
[123][./demo_videos/000423/left/frame_000123.png][./input/smoothed_5/0/gray/123.png]
  Using ref_seq.shape torch.Size([1, 4, 4, 384, 896])
  min(outputs): 0.021556951105594635, mean(outputs): 0.10725495964288712, max(outputs): 0.22037763893604279
Smoothing with a forward pass took 253.15499305725098 seconds - 0.4898185040812425 fps

Done.

@luuude
Copy link

luuude commented Jan 26, 2024

I ran your code, I had to alter some small things to get it to run on CUDA. Looks ok to me. But is it a good idea to store it as 16bit png? Would it not be better to user 32bit float in a exr file? I uploaded the results here https://ludvig-betong.filemail.com/d/agdtxtcjrtwszao

@jparismorgan
Copy link
Author

Thanks for running it! I don't know about 16 vs 32 bit, I was just copying code existing in the repo already. But could be nice to change to 32. Would you mind sharing your code with the modifications? Maybe something that you had to fix for CUDA will also fix my issue. Thanks!

@luuude
Copy link

luuude commented Jan 26, 2024

If I remember correctly I commented out :

#elif torch.backends.mps.is_available() and torch.backends.mps.is_built():
    #device = torch.device("mps:0")

And maybe added ("cuda:0") somewhere.

Regarding the 16/32 float to int issue I think you loose some gradients when converting back and forth. So I imagine it would be best to just keep it in float all the way. I think you can store float in PNGs but maybe you have to modify it slightly, OpenEXR image files handles float without any modifications. Easiest way to use exr is with OpenImageIO.

@jparismorgan
Copy link
Author

Thanks @luuude. I did find out the potential cause of this bug, which is that if I use device = torch.device("cpu") instead of device = torch.device("mps:0"), then I can run the NVDS model fine (and the script as a whole works), though it is very slow. So my current guesses of the error:

  1. Somewhere in the codebase there is a bug and that bug makes MPS acceleration not work
  2. There is a bug in Pytorch and we are hitting it in this code path

Not quite sure which it is, but if anyone has suggestions to debug, would love to hear!

@bigipalabom
Copy link

Thanks @luuude. I did find out the potential cause of this bug, which is that if I use device = torch.device("cpu") instead of device = torch.device("mps:0"), then I can run the NVDS model fine (and the script as a whole works), though it is very slow. So my current guesses of the error:

  1. Somewhere in the codebase there is a bug and that bug makes MPS acceleration not work
  2. There is a bug in Pytorch and we are hitting it in this code path

Not quite sure which it is, but if anyone has suggestions to debug, would love to hear!

hi, i located the issue

huggingface/transformers#22468

@amyeroberts just FIY, thanks the clue from pytorch dev team, the problem can be solved if using contiguous() tensor here:

https://github.com/huggingface/transformers/blob/1670be4bdec19d5a8893f943bf78a8d9b3dc8911/src/transformers/models/glpn/modeling_glpn.py#L283

something like this: hidden_states = self.intermediate_act_fn(hidden_states.contiguous())

Not sure it's worth a PR, since it's not a solution of the root problem, but if so, please let me know, I'll create one

@jparismorgan
Copy link
Author

jparismorgan commented Jan 27, 2024

Hi @bigipalabom, amazing, that is a great find, thank you! And beyond it being a great find, it looks like it worked 😁 In backbone.py I updated to x.contiguous() and now the NVDS smoothing model works:

    def forward(self, x, H, W):
        x = self.fc1(x)
        x = self.dwconv(x, H, W)
        # NOTE(paris): self.act is nn.GELU
        # Before it was x = self.act(x)
        x = self.act(x.contiguous())
        x = self.drop(x)
        x = self.fc2(x)
        x = self.drop(x)
        return x

I don't know enough PyTorch to understand if there are potential issues with this change. But cc @RaymondWang987, this seems like a nice update to make if it seems safe? It will unblock other Mac users who want to use MPS acceleration.

@velaia
Copy link

velaia commented Jan 27, 2024

Great work @jparismorgan @bigipalabom ! I've tried to understand the issue a little deeper: the .contiguous() seems to be about the memory allocation of the tensors according to this post. self.act is probably nn.GELU in this case which falls back to the functional implementation of GELU. I've looked for the MPS implementation of GELU and found Activation.mm copyrighted by Apple, so I guess I'm not totally on the wrong track.

In the corresponding commits I checked for activity mentioning GELU and found a couple of people working on it in the past. Maybe one of you can help, @malfet @qqaatw @DenisVieriu97 ? My best guess is that there's an assumption in the MPS GELU implementation that the tensor memory is contiguous but that's usually not the case. Unfortunately my Objective-C is non-existent 😉

CC pytorch/pytorch#98212

@bigipalabom
Copy link

Hi @bigipalabom, amazing, that is a great find, thank you! And beyond it being a great find, it looks like it worked 😁 In backbone.py I updated to x.contiguous() and now the NVDS smoothing model works:

    def forward(self, x, H, W):
        x = self.fc1(x)
        x = self.dwconv(x, H, W)
        # NOTE(paris): self.act is nn.GELU
        # Before it was x = self.act(x)
        x = self.act(x.contiguous())
        x = self.drop(x)
        x = self.fc2(x)
        x = self.drop(x)
        return x

I don't know enough PyTorch to understand if there are potential issues with this change. But cc @RaymondWang987, this seems like a nice update to make if it seems safe? It will unblock other Mac users who want to use MPS acceleration.

Haha, glad it hepled. I remember Apple has similar problem ever since ancient macintosh time, as per Apple, "Think different"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants