You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
===================================BUG REPORT===================================
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib'), PosixPath('/usr/local/lib/x86_64-linux-gnu'), PosixPath('/usr/local/nvidia/lib')}
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/opt/conda/lib/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
warn(msg)
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//www.kaggle.com')}
The following directories listed in your path were found to be non-existent: {PosixPath('gcr.io/kaggle-gpu-images/python@sha256'), PosixPath('141219e230dab548ccc19aa4e62bcf805ed9de0b4d5112227e28f5f1a25991f8')}
The following directories listed in your path were found to be non-existent: {PosixPath('tf2-gpu/2-16+cu123')}
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib'), PosixPath('/usr/local/lib/x86_64-linux-gnu'), PosixPath('/usr/local/nvidia/lib')}
The following directories listed in your path were found to be non-existent: {PosixPath('/kaggle/lib/kagglegym')}
The following directories listed in your path were found to be non-existent: {PosixPath('//dp.kaggle.net'), PosixPath('https')}
DEBUG: Possible options found for libcudart.so: {PosixPath('/opt/conda/lib/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=123, Highest Compute Capability: 7.5.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda123.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
CUDA driver not installed/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
warn(msg)
================================================================================3. CUDA not installed
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/lib/x86_64-linux-gnu'), PosixPath('/usr/local/cuda/lib'), PosixPath('/usr/local/nvidia/lib')}4. You have multiple conflicting CUDA libraries
Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/opt/conda/lib/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
warn(msg)
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.
================================================================================/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//www.kaggle.com')}
The following directories listed in your path were found to be non-existent: {PosixPath('141219e230dab548ccc19aa4e62bcf805ed9de0b4d5112227e28f5f1a25991f8'), PosixPath('gcr.io/kaggle-gpu-images/python@sha256')}CUDA SETUP: Something unexpected happened. Please compile from source:
The following directories listed in your path were found to be non-existent: {PosixPath('tf2-gpu/2-16+cu123')}
git clone https://github.com/TimDettmers/bitsandbytes.gitThe following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/lib/x86_64-linux-gnu'), PosixPath('/usr/local/cuda/lib'), PosixPath('/usr/local/nvidia/lib')}
cd bitsandbytesThe following directories listed in your path were found to be non-existent: {PosixPath('/kaggle/lib/kagglegym')}
The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//dp.kaggle.net')}CUDA_VERSION=123
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/opt/conda/lib/libcudart.so')}python setup.py install
CUDA SETUP: Required library version not found: libbitsandbytes_cuda123.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.
================================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=123
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1364, in _get_module
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1364, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module
return importlib.import_module("." + module_name, self.name)
File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "", line 241, in _call_with_frames_removed
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 190, in
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 190, in
from peft import PeftModel
from peft import PeftModel File "/opt/conda/lib/python3.10/site-packages/peft/init.py", line 22, in
File "/opt/conda/lib/python3.10/site-packages/peft/init.py", line 22, in
from .auto import (
from .auto import ( File "/opt/conda/lib/python3.10/site-packages/peft/auto.py", line 30, in
File "/opt/conda/lib/python3.10/site-packages/peft/auto.py", line 30, in
from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPINGfrom .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING
File "/opt/conda/lib/python3.10/site-packages/peft/mapping.py", line 20, in
File "/opt/conda/lib/python3.10/site-packages/peft/mapping.py", line 20, in
from .peft_model import (from .peft_model import (
File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 39, in
File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 39, in
from .tuners import (from .tuners import (
File "/opt/conda/lib/python3.10/site-packages/peft/tuners/init.py", line 21, in
File "/opt/conda/lib/python3.10/site-packages/peft/tuners/init.py", line 21, in
from .lora import LoraConfig, LoraModel
File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora.py", line 42, in
from .lora import LoraConfig, LoraModel
File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora.py", line 42, in
import bitsandbytes as bnb
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/init.py", line 6, in
import bitsandbytes as bnb
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/init.py", line 6, in
from . import cuda_setup, utils, research
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/init.py", line 1, in
from . import cuda_setup, utils, research
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/init.py", line 1, in
from . import nn
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/nn/init.py", line 1, in
from . import nn
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/nn/init.py", line 1, in
from .modules import LinearFP8Mixed, LinearFP8Global
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in
from .modules import LinearFP8Mixed, LinearFP8Global
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in
from bitsandbytes.optim import GlobalOptimManager
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/optim/init.py", line 6, in
from bitsandbytes.optim import GlobalOptimManager
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/optim/init.py", line 6, in
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issuesraise RuntimeError('''
The above exception was the direct cause of the following exception:
RuntimeErrorTraceback (most recent call last):
:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues File "/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/GOT/train/train_GOT.py", line 25, in <module>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/GOT/train/train_GOT.py", line 25, in
from GOT.train.trainer_vit_fixlr import GOTTrainer
File "/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/GOT/train/trainer_vit_fixlr.py", line 5, in
from GOT.train.trainer_vit_fixlr import GOTTrainer
File "/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/GOT/train/trainer_vit_fixlr.py", line 5, in
from transformers import Trainer
File "", line 1075, in _handle_fromlist
from transformers import Trainer
File "", line 1075, in _handle_fromlist
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1354, in getattr
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1354, in getattr
module = self._get_module(self._class_to_module[name])
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1366, in _get_module
module = self._get_module(self._class_to_module[name])
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1366, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
raise RuntimeError(
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
I get an error after doing training. Can you help me?
deepspeed /GOT-OCR-2.0-master/GOT/train/train_GOT.py \ --deepspeed /GOT-OCR-2.0-master/zero_config/zero2.json --model_name_or_path /GOT_weights/ \ --use_im_start_end True \ --bf16 True \ --gradient_accumulation_steps 2 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 200 \ --save_total_limit 1 \ --weight_decay 0. \ --warmup_ratio 0.001 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 8192 \ --gradient_checkpointing True \ --dataloader_num_workers 8 \ --report_to none \ --per_device_train_batch_size 2 \ --num_train_epochs 1 \ --learning_rate 2e-5 \ --datasets pdf-ocr+scence \ --output_dir /your/output/path_
Error:
[2024-10-15 08:01:18,206] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] please install triton==1.0.0 if you want to use sparse attention
/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning:
torch.cuda.amp.custom_fwd(args...)
is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')
instead.def forward(ctx, input, weight, bias=None):
/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning:
torch.cuda.amp.custom_bwd(args...)
is deprecated. Please usetorch.amp.custom_bwd(args..., device_type='cuda')
instead.def backward(ctx, grad_output):
[2024-10-15 08:01:24,289] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-10-15 08:01:24,289] [INFO] [runner.py:568:main] cmd = /opt/conda/bin/python3.10 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None /kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/GOT/train/train_GOT.py --deepspeed /kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/zero_config/zero2.json --model_name_or_path /kaggle/working/GOT_OCR2/GOT_weights/ --use_im_start_end True --bf16 True --gradient_accumulation_steps 2 --evaluation_strategy no --save_strategy steps --save_steps 200 --save_total_limit 1 --weight_decay 0. --warmup_ratio 0.001 --lr_scheduler_type cosine --logging_steps 1 --tf32 True --model_max_length 8192 --gradient_checkpointing True --dataloader_num_workers 8 --report_to none --per_device_train_batch_size 2 --num_train_epochs 1 --learning_rate 2e-5 --datasets plain --output_dir /kaggle/working/GOT_OCR2/output
[2024-10-15 08:01:26,184] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] please install triton==1.0.0 if you want to use sparse attention
/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning:
torch.cuda.amp.custom_fwd(args...)
is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')
instead.def forward(ctx, input, weight, bias=None):
/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning:
torch.cuda.amp.custom_bwd(args...)
is deprecated. Please usetorch.amp.custom_bwd(args..., device_type='cuda')
instead.def backward(ctx, grad_output):
[2024-10-15 08:01:30,469] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.20.3-1+cuda12.3
[2024-10-15 08:01:30,469] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.20.3-1
[2024-10-15 08:01:30,469] [INFO] [launch.py:139:main] 0 NCCL_VERSION=2.20.3-1
[2024-10-15 08:01:30,469] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2024-10-15 08:01:30,469] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.20.3-1+cuda12.3
[2024-10-15 08:01:30,469] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2024-10-15 08:01:30,469] [INFO] [launch.py:139:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.20.3-1
[2024-10-15 08:01:30,469] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2024-10-15 08:01:30,469] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-10-15 08:01:30,469] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-10-15 08:01:30,469] [INFO] [launch.py:164:main] dist_world_size=2
[2024-10-15 08:01:30,469] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2024-10-15 08:01:30,470] [INFO] [launch.py:256:main] process 328 spawned with command: ['/opt/conda/bin/python3.10', '-u', '/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/GOT/train/train_GOT.py', '--local_rank=0', '--deepspeed', '/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/zero_config/zero2.json', '--model_name_or_path', '/kaggle/working/GOT_OCR2/GOT_weights/', '--use_im_start_end', 'True', '--bf16', 'True', '--gradient_accumulation_steps', '2', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '200', '--save_total_limit', '1', '--weight_decay', '0.', '--warmup_ratio', '0.001', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '8192', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '8', '--report_to', 'none', '--per_device_train_batch_size', '2', '--num_train_epochs', '1', '--learning_rate', '2e-5', '--datasets', 'plain', '--output_dir', '/kaggle/working/GOT_OCR2/output']
[2024-10-15 08:01:30,471] [INFO] [launch.py:256:main] process 329 spawned with command: ['/opt/conda/bin/python3.10', '-u', '/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/GOT/train/train_GOT.py', '--local_rank=1', '--deepspeed', '/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/zero_config/zero2.json', '--model_name_or_path', '/kaggle/working/GOT_OCR2/GOT_weights/', '--use_im_start_end', 'True', '--bf16', 'True', '--gradient_accumulation_steps', '2', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '200', '--save_total_limit', '1', '--weight_decay', '0.', '--warmup_ratio', '0.001', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '8192', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '8', '--report_to', 'none', '--per_device_train_batch_size', '2', '--num_train_epochs', '1', '--learning_rate', '2e-5', '--datasets', 'plain', '--output_dir', '/kaggle/working/GOT_OCR2/output']
False
===================================BUG REPORT===================================
False
===================================BUG REPORT===================================
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib'), PosixPath('/usr/local/lib/x86_64-linux-gnu'), PosixPath('/usr/local/nvidia/lib')}
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/opt/conda/lib/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
warn(msg)
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//www.kaggle.com')}
The following directories listed in your path were found to be non-existent: {PosixPath('gcr.io/kaggle-gpu-images/python@sha256'), PosixPath('141219e230dab548ccc19aa4e62bcf805ed9de0b4d5112227e28f5f1a25991f8')}
The following directories listed in your path were found to be non-existent: {PosixPath('tf2-gpu/2-16+cu123')}
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib'), PosixPath('/usr/local/lib/x86_64-linux-gnu'), PosixPath('/usr/local/nvidia/lib')}
The following directories listed in your path were found to be non-existent: {PosixPath('/kaggle/lib/kagglegym')}
The following directories listed in your path were found to be non-existent: {PosixPath('//dp.kaggle.net'), PosixPath('https')}
DEBUG: Possible options found for libcudart.so: {PosixPath('/opt/conda/lib/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=123, Highest Compute Capability: 7.5.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda123.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
python -m bitsandbytes
warn(msg)
================================================================================3. CUDA not installed
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/lib/x86_64-linux-gnu'), PosixPath('/usr/local/cuda/lib'), PosixPath('/usr/local/nvidia/lib')}4. You have multiple conflicting CUDA libraries
CUDA SETUP: If you compiled from source, try again with
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example,make CUDA_VERSION=113
./opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/opt/conda/lib/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
warn(msg)
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via
conda list | grep cuda
.================================================================================/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//www.kaggle.com')}
The following directories listed in your path were found to be non-existent: {PosixPath('141219e230dab548ccc19aa4e62bcf805ed9de0b4d5112227e28f5f1a25991f8'), PosixPath('gcr.io/kaggle-gpu-images/python@sha256')}CUDA SETUP: Something unexpected happened. Please compile from source:
The following directories listed in your path were found to be non-existent: {PosixPath('tf2-gpu/2-16+cu123')}
git clone https://github.com/TimDettmers/bitsandbytes.gitThe following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/lib/x86_64-linux-gnu'), PosixPath('/usr/local/cuda/lib'), PosixPath('/usr/local/nvidia/lib')}
cd bitsandbytesThe following directories listed in your path were found to be non-existent: {PosixPath('/kaggle/lib/kagglegym')}
The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//dp.kaggle.net')}CUDA_VERSION=123
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/opt/conda/lib/libcudart.so')}python setup.py install
CUDA SETUP: PyTorch settings found: CUDA_VERSION=123, Highest Compute Capability: 7.5.
CUDA SETUP: Setup Failed!CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda123.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
CUDA SETUP: If you compiled from source, try again with
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example,make CUDA_VERSION=113
.CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via
conda list | grep cuda
.================================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=123
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1364, in _get_module
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1364, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module
return importlib.import_module("." + module_name, self.name)
File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "", line 241, in _call_with_frames_removed
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 190, in
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 190, in
from peft import PeftModel
from peft import PeftModel File "/opt/conda/lib/python3.10/site-packages/peft/init.py", line 22, in
File "/opt/conda/lib/python3.10/site-packages/peft/init.py", line 22, in
from .auto import (
from .auto import ( File "/opt/conda/lib/python3.10/site-packages/peft/auto.py", line 30, in
File "/opt/conda/lib/python3.10/site-packages/peft/auto.py", line 30, in
from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPINGfrom .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING
File "/opt/conda/lib/python3.10/site-packages/peft/mapping.py", line 20, in
File "/opt/conda/lib/python3.10/site-packages/peft/mapping.py", line 20, in
from .peft_model import (from .peft_model import (
File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 39, in
File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 39, in
from .tuners import (from .tuners import (
File "/opt/conda/lib/python3.10/site-packages/peft/tuners/init.py", line 21, in
File "/opt/conda/lib/python3.10/site-packages/peft/tuners/init.py", line 21, in
from .lora import LoraConfig, LoraModel
File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora.py", line 42, in
from .lora import LoraConfig, LoraModel
File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora.py", line 42, in
import bitsandbytes as bnb
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/init.py", line 6, in
import bitsandbytes as bnb
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/init.py", line 6, in
from . import cuda_setup, utils, research
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/init.py", line 1, in
from . import cuda_setup, utils, research
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/init.py", line 1, in
from . import nn
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/nn/init.py", line 1, in
from . import nn
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/nn/init.py", line 1, in
from .modules import LinearFP8Mixed, LinearFP8Global
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in
from .modules import LinearFP8Mixed, LinearFP8Global
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in
from bitsandbytes.optim import GlobalOptimManager
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/optim/init.py", line 6, in
from bitsandbytes.optim import GlobalOptimManager
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/optim/init.py", line 6, in
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
The above exception was the direct cause of the following exception:
RuntimeErrorTraceback (most recent call last):
:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/GOT/train/train_GOT.py", line 25, in
from GOT.train.trainer_vit_fixlr import GOTTrainer
File "/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/GOT/train/trainer_vit_fixlr.py", line 5, in
from GOT.train.trainer_vit_fixlr import GOTTrainer
File "/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/GOT/train/trainer_vit_fixlr.py", line 5, in
from transformers import Trainer
File "", line 1075, in _handle_fromlist
from transformers import Trainer
File "", line 1075, in _handle_fromlist
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1354, in getattr
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1354, in getattr
module = self._get_module(self._class_to_module[name])
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1366, in _get_module
module = self._get_module(self._class_to_module[name])
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1366, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
[2024-10-15 08:01:53,494] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 328
[2024-10-15 08:01:53,494] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 329
[2024-10-15 08:01:53,495] [ERROR] [launch.py:325:sigkill_handler] ['/opt/conda/bin/python3.10', '-u', '/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/GOT/train/train_GOT.py', '--local_rank=1', '--deepspeed', '/kaggle/working/GOT_OCR2/GOT-OCR-2.0-master/zero_config/zero2.json', '--model_name_or_path', '/kaggle/working/GOT_OCR2/GOT_weights/', '--use_im_start_end', 'True', '--bf16', 'True', '--gradient_accumulation_steps', '2', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '200', '--save_total_limit', '1', '--weight_decay', '0.', '--warmup_ratio', '0.001', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '8192', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '8', '--report_to', 'none', '--per_device_train_batch_size', '2', '--num_train_epochs', '1', '--learning_rate', '2e-5', '--datasets', 'plain', '--output_dir', '/kaggle/working/GOT_OCR2/output'] exits with return code = 1
The text was updated successfully, but these errors were encountered: