Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on two 3090s? #9

Open
vijetadeshpande opened this issue Sep 4, 2024 · 1 comment
Open

Running on two 3090s? #9

vijetadeshpande opened this issue Sep 4, 2024 · 1 comment

Comments

@vijetadeshpande
Copy link

Hi authors, I do not have access to solid hardware. What I have for now is two 3090s (24GB each). I am planning I run/debug the code with this setup and then move experiments to A100s. On these two 3090s, I have CUDA 12.4. Torch version 2.0.0 (in the requirements) does not support CUDA 12.X. I found that torch 12.1.1 support CUDA 12.1. This is the only change I have made in the requirements, otherwise the setup is as suggested.

When I run

torchrun --nproc_per_node=2 --master_port=6000 train.py ...

I am the code is getting stuck at the following progress step,

LlamaTokenizerFast(name_or_path='meta-llama/Llama-2-7b-hf', vocab_size=32000, model_max_length=32, is_fast=True, padding_side='right', truncation
_side='left', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '</s>'}, clean_up_tokenization_spaces=F
alse),  added_tokens_decoder={                                                                                                                   
        0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),                                   
        1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),                                     
        2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),                                    
}                                                                                                                                                
/mnt/shared_home/vdeshpande/miniconda3/envs/env_spag/lib/python3.9/site-packages/accelerate/accelerator.py:457: FutureWarning: Passing the follow
ing arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). 
Please pass an `accelerate.DataLoaderConfiguration` instead:                                                                                     
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)                                                          
  warnings.warn(                                                                                                                                 
Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Using /mnt/shared_home/vdeshpande/.cache/torch_extensions/py39_cu121 as PyTorch extensions root...                                               
Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Using /mnt/shared_home/vdeshpande/.cache/torch_extensions/py39_cu121 as PyTorch extensions root...                                               
Detected CUDA files, patching ldflags                                                                                                            
Emitting ninja build file /mnt/shared_home/vdeshpande/.cache/torch_extensions/py39_cu121/cpu_adam/build.ninja...                                 
Building extension module cpu_adam...                                                                                                            
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)                                
ninja: no work to do.               
Loading extension module cpu_adam...                                    
Loading extension module cpu_adam...                                    
Time to load cpu_adam op: 3.3634016513824463 seconds                    
Time to load cpu_adam op: 3.0814285278320312 seconds                    
Parameter Offload: Total persistent parameters: 532480 in 130 params                                                                             [2024-09-04 15:34:07,215] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 50337 closing signal SIGTERM                  
[2024-09-04 15:34:22,297] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 0 (pid: 50336) of binary: /mnt
/shared_home/vdeshpande/miniconda3/envs/env_spag/bin/python

Any insights on resolving this issue?

@Linear95
Copy link
Owner

Linear95 commented Sep 9, 2024

It looks like your CUDA, accelerate, and pytorch versions are incompatible.

Also, I'm not sure whether 3090 can run the training... A possible solution might be using Lora to train your model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants