[Bug] torch.OutOfMemoryError #1773

QingChengLineOne · 2024-12-23T00:37:19Z

先决条件

我已经搜索过问题和讨论但未得到预期的帮助。
错误在最新版本中尚未被修复。

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0',
'GPU 0,1': 'NVIDIA GeForce RTX 4090',
'MMEngine': '0.10.5',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 11.8, V11.8.89',
'OpenCV': '4.10.0',
'PyTorch': '2.5.1+cu124',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2024.2-Product Build 20240605 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.5.3 (Git Hash '
'66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX2\n'
' - CUDA Runtime 12.4\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 90.1\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.4, '
'CUDNN_VERSION=9.1.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON '
'-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wsuggest-override '
'-Wno-psabi -Wno-error=old-style-cast '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, '
'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, '
'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, '
'USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, '
'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n',
'Python': '3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]',
'TorchVision': '0.20.1+cu124',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.3.8+',
'sys.platform': 'linux',
'transformers': '4.47.1'}

重现问题 - 代码/配置示例

CUDA_VISIBLE_DEVICES=0,1 python run.py
--datasets tydiqa_gen
--hf-type chat
--hf-path /public/zzy/model/Phi/Phi-3-mini-128k-instruct
--batch-size 1
--debug

重现问题 - 命令或脚本

上面的python命令

重现问题 - 错误信息

/public/zzy/fintuning/opencompass/opencompass/init.py:19: UserWarning: Starting from v0.4.0, all AMOTIC configuration files currently located in ./configs/datasets, ./configs/models, and ./configs/summarizers will be migrated to the opencompass/configs/ package. Please update your configuration file paths accordingly.
_warn_about_config_migration()
12/22 15:16:50 - OpenCompass - WARNING - Found ambiguous patterns, using the first matched config.
+----------------------+---------------------------------------------------------------------------------------+
| Ambiguous patterns | Matched files |
|----------------------+---------------------------------------------------------------------------------------|
| tydiqa_gen | configs/datasets/tydiqa/tydiqa_gen.py |
| | /public/zzy/fintuning/opencompass/opencompass/configs/./datasets/tydiqa/tydiqa_gen.py |
+----------------------+---------------------------------------------------------------------------------------+
12/22 15:16:50 - OpenCompass - INFO - Loading tydiqa_gen: configs/datasets/tydiqa/tydiqa_gen.py
12/22 15:16:50 - OpenCompass - WARNING - Found ambiguous patterns, using the first matched config.
+----------------------+--------------------------------------------------------------------------------+
| Ambiguous patterns | Matched files |
|----------------------+--------------------------------------------------------------------------------|
| example | configs/summarizers/example.py |
| | /public/zzy/fintuning/opencompass/opencompass/configs/./summarizers/example.py |
+----------------------+--------------------------------------------------------------------------------+
12/22 15:16:50 - OpenCompass - INFO - Loading example: configs/summarizers/example.py
12/22 15:16:50 - OpenCompass - INFO - Current exp folder: outputs/default/20241222_151650
12/22 15:16:51 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
12/22 15:16:51 - OpenCompass - INFO - Partitioned into 1 tasks.
12/22 15:16:52 - OpenCompass - WARNING - Only use 1 GPUs for total 2 available GPUs in debug mode.
12/22 15:16:52 - OpenCompass - INFO - Task [Phi-3-mini-128k-instruct_hf/tydiqa-goldp_arabic,Phi-3-mini-128k-instruct_hf/tydiqa-goldp_bengali,Phi-3-mini-128k-instruct_hf/tydiqa-goldp_english,Phi-3-mini-128k-instruct_hf/tydiqa-goldp_finnish,Phi-3-mini-128k-instruct_hf/tydiqa-goldp_indonesian,Phi-3-mini-128k-instruct_hf/tydiqa-goldp_japanese,Phi-3-mini-128k-instruct_hf/tydiqa-goldp_korean,Phi-3-mini-128k-instruct_hf/tydiqa-goldp_russian,Phi-3-mini-128k-instruct_hf/tydiqa-goldp_swahili,Phi-3-mini-128k-instruct_hf/tydiqa-goldp_telugu,Phi-3-mini-128k-instruct_hf/tydiqa-goldp_thai]
flash-attention package not found, consider installing for better performance: No module named 'flash_attn'.
Current flash-attenton does not support window_size. Either upgrade or use attn_implementation='eager'.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████| 2/2 [01:20<00:00, 40.22s/it]
We've detected an older driver with an RTX 4000 series GPU. These drivers have issues with P2P. This can affect the multi-gpu inference when using accelerate device_map.Please make sure to update your driver to the latest version which resolves this.
12/22 15:18:13 - OpenCompass - INFO - using stop words: ['<|assistant|>', '<|endoftext|>', '<|end|>']
12/22 15:18:14 - OpenCompass - INFO - Start inferencing [Phi-3-mini-128k-instruct_hf/tydiqa-goldp_arabic]
[2024-12-22 15:18:14,186] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2024-12-22 15:18:14,187] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
0%| | 0/921 [00:00<?, ?it/s]The seen_tokens attribute is deprecated and will be removed in v4.41. Use the cache_position model input instead.
get_max_cache() is deprecated for all Cache classes. Use get_max_cache_shape() instead. Calling get_max_cache() will raise error from v4.48
You are not running the flash-attention implementation, expect numerical differences.
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 921/921 [1:47:55<00:00, 7.03s/it]
12/22 17:06:09 - OpenCompass - INFO - Start inferencing [Phi-3-mini-128k-instruct_hf/tydiqa-goldp_bengali]
[2024-12-22 17:06:09,609] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2024-12-22 17:06:09,609] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 113/113 [17:35<00:00, 9.34s/it]
12/22 17:23:45 - OpenCompass - INFO - Start inferencing [Phi-3-mini-128k-instruct_hf/tydiqa-goldp_english]
[2024-12-22 17:23:45,217] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2024-12-22 17:23:45,217] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 440/440 [12:06<00:00, 1.65s/it]
12/22 17:35:51 - OpenCompass - INFO - Start inferencing [Phi-3-mini-128k-instruct_hf/tydiqa-goldp_finnish]
[2024-12-22 17:35:51,916] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2024-12-22 17:35:51,916] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 782/782 [1:16:55<00:00, 5.90s/it]
12/22 18:52:47 - OpenCompass - INFO - Start inferencing [Phi-3-mini-128k-instruct_hf/tydiqa-goldp_indonesian]
[2024-12-22 18:52:47,531] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2024-12-22 18:52:47,531] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 565/565 [28:03<00:00, 2.98s/it]
12/22 19:20:50 - OpenCompass - INFO - Start inferencing [Phi-3-mini-128k-instruct_hf/tydiqa-goldp_japanese]
[2024-12-22 19:20:50,824] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2024-12-22 19:20:50,824] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 455/455 [25:30<00:00, 3.36s/it]
12/22 19:46:21 - OpenCompass - INFO - Start inferencing [Phi-3-mini-128k-instruct_hf/tydiqa-goldp_korean]
[2024-12-22 19:46:21,344] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2024-12-22 19:46:21,344] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 276/276 [27:02<00:00, 5.88s/it]
12/22 20:13:23 - OpenCompass - INFO - Start inferencing [Phi-3-mini-128k-instruct_hf/tydiqa-goldp_russian]
[2024-12-22 20:13:23,953] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2024-12-22 20:13:23,953] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 812/812 [23:46<00:00, 1.76s/it]
12/22 20:37:10 - OpenCompass - INFO - Start inferencing [Phi-3-mini-128k-instruct_hf/tydiqa-goldp_swahili]
[2024-12-22 20:37:10,967] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2024-12-22 20:37:10,967] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 499/499 [36:18<00:00, 4.37s/it]
12/22 21:13:29 - OpenCompass - INFO - Start inferencing [Phi-3-mini-128k-instruct_hf/tydiqa-goldp_telugu]
[2024-12-22 21:13:30,003] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2024-12-22 21:13:30,003] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
16%|███████████████▌ | 110/669 [25:30<2:09:37, 13.91s/it]
Traceback (most recent call last):
File "/public/zzy/fintuning/opencompass/run.py", line 4, in
main()
File "/public/zzy/fintuning/opencompass/opencompass/cli/main.py", line 308, in main
runner(tasks)
File "/public/zzy/fintuning/opencompass/opencompass/runners/base.py", line 38, in call
status = self.launch(tasks)
File "/public/zzy/fintuning/opencompass/opencompass/runners/local.py", line 128, in launch
task.run(cur_model=getattr(self, 'cur_model',
File "/public/zzy/fintuning/opencompass/opencompass/tasks/openicl_infer.py", line 89, in run
self._inference()
File "/public/zzy/fintuning/opencompass/opencompass/tasks/openicl_infer.py", line 134, in _inference
inferencer.inference(retriever,
File "/public/zzy/fintuning/opencompass/opencompass/openicl/icl_inferencer/icl_gen_inferencer.py", line 153, in inference
results = self.model.generate_from_template(
File "/public/zzy/fintuning/opencompass/opencompass/models/base.py", line 201, in generate_from_template
return self.generate(inputs, max_out_len=max_out_len, **kwargs)
File "/public/zzy/fintuning/opencompass/opencompass/models/huggingface_above_v4_33.py", line 479, in generate
outputs = self.model.generate(**tokens, **generation_kwargs)
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 2252, in generate
result = self._sample(
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 3251, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Phi-3-mini-128k-instruct/modeling_phi3.py", line 1286, in forward
outputs = self.model(
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Phi-3-mini-128k-instruct/modeling_phi3.py", line 1164, in forward
layer_outputs = decoder_layer(
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Phi-3-mini-128k-instruct/modeling_phi3.py", line 885, in forward
attn_outputs, self_attn_weights, present_key_value = self.self_attn(
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Phi-3-mini-128k-instruct/modeling_phi3.py", line 405, in forward
attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(value_states.dtype)
File "/root/anaconda/envs/opencompass/lib/python3.10/site-packages/torch/nn/functional.py", line 2142, in softmax
ret = input.softmax(dim, dtype=dtype)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.83 GiB. GPU 0 has a total capacity of 23.65 GiB of which 9.06 GiB is free. Process 43896 has 954.00 MiB memory in use. Process 44069 has 13.65 GiB memory in use. Of the allocated memory 12.72 GiB is allocated by PyTorch, and 485.35 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

其他信息

No response

The text was updated successfully, but these errors were encountered:

tonysy · 2024-12-24T12:21:50Z

You can decrease the batch size to avoid out-of-memory

QingChengLineOne · 2024-12-24T13:50:34Z

You can decrease the batch size to avoid out-of-memory

但是我的batch size已经是1了呀

tonysy · 2024-12-25T03:21:59Z

Get, if the OOM is still existing, maybe the tensor parallel is required.

mm-assistant bot assigned bittersweet1999 Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] torch.OutOfMemoryError #1773

[Bug] torch.OutOfMemoryError #1773

QingChengLineOne commented Dec 23, 2024

tonysy commented Dec 24, 2024

QingChengLineOne commented Dec 24, 2024

tonysy commented Dec 25, 2024

[Bug] torch.OutOfMemoryError #1773

[Bug] torch.OutOfMemoryError #1773

Comments

QingChengLineOne commented Dec 23, 2024

先决条件

问题类型

环境

重现问题 - 代码/配置示例

重现问题 - 命令或脚本

重现问题 - 错误信息

其他信息

tonysy commented Dec 24, 2024

QingChengLineOne commented Dec 24, 2024

tonysy commented Dec 25, 2024