Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update train.py #109

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Update train.py #109

wants to merge 1 commit into from

Conversation

Li-Guanda
Copy link

A command like "python train.py task=Ant headless=True sim_device=cpu rl_device=cpu" can not work correctly. The reason is "rlg_config_dict" doesn't include the information of "rl_device".

In the "a2c_common.py" of "rl_games", there is a line of code: "self.ppo_device = config.get('device', 'cuda:0')". So the RL algorithm will always only work on the cuda:0.

A command like "python train.py task=Ant headless=True sim_device=cpu rl_device=cpu" can not work correctly.
The reason is "rlg_config_dict" doesn't include the information of "rl_device".

In the "a2c_common.py" of "rl_games", there is a line of code: "self.ppo_device = config.get('device', 'cuda:0')".
So the RL algorithm will always only work on the cuda:0.
@tylerlum
Copy link

tylerlum commented Apr 11, 2023

I encountered the same issue! This fix should work, but I think a cleaner solution would be to avoid making a change in train.py ("feels" more like a hack), but instead modify all the *PPO.yaml files (eg. AntPPO.yaml)

We should add in under params.config

params:
....
  config:
....
    device: ${resolve_default:cuda:0,${....rl_device}}  # Used in rl_games/common/a2c_common.py
    device_name: ${resolve_default:cuda:0,${....rl_device}}  # Used in rl_games/common/player.py

This is similar to other config values like

    name: ${resolve_default:Ant,${....experiment}}
    multi_gpu: ${....multi_gpu}
    num_actors: ${....task.env.numEnvs}
    max_epochs: ${resolve_default:500,${....max_iterations}}

which use values from the top-level config (which has rl_device), but also gives a default value.

@utomm
Copy link

utomm commented Apr 25, 2023

Hi, thanks for the fix and discussion. Your solution works well with device like 'cuda:2' #129 .

However when using rl_device=cpu, the process still crashed. In training case python train.py task=Ant headless=True sim_device=cpu rl_device=cpu

it will crash before the first update of the policy

Error executing job with overrides: ['task=Ant', 'headless=True', 'sim_device=cpu', 'rl_device=cpu']
Traceback (most recent call last):
  File "train.py", line 161, in launch_rlg_hydra
    'sigma' : None
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/torch_runner.py", line 120, in run
    self.run_train(args)
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/torch_runner.py", line 101, in run_train
    agent.train()
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/common/a2c_common.py", line 1173, in train
    step_time, play_time, update_time, sum_time, a_losses, c_losses, b_losses, entropies, kls, last_lr, lr_mul = self.train_epoch()
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/common/a2c_common.py", line 1059, in train_epoch
    a_loss, c_loss, entropy, kl, last_lr, lr_mul, cmu, csigma, b_loss = self.train_actor_critic(self.dataset[i])
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/algos_torch/a2c_continuous.py", line 159, in train_actor_critic
    self.calc_gradients(input_dict)
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/algos_torch/a2c_continuous.py", line 135, in calc_gradients
    self.scaler.scale(loss).backward()
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 162, in scale
    assert outputs.is_cuda or outputs.device.type == 'xla'
AssertionError

and in testing case python train.py task=Ant headless=True sim_device=cpu rl_device=cpu test=True, the tensors in different device appears again, the output error is

Error executing job with overrides: ['task=Ant', 'headless=True', 'sim_device=cpu', 'rl_device=cpu', 'test=True']
Traceback (most recent call last):
  File "train.py", line 161, in launch_rlg_hydra
    'sigma' : None
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/torch_runner.py", line 123, in run
    self.run_play(args)
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/torch_runner.py", line 108, in run_play
    player.run()
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/common/player.py", line 208, in run
    action = self.get_action(obses, is_determenistic)
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/algos_torch/players.py", line 55, in get_action
    res_dict = self.model(input_dict)
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/algos_torch/models.py", line 246, in forward
    input_dict['obs'] = self.norm_obs(input_dict['obs'])
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/algos_torch/models.py", line 49, in norm_obs
    return self.running_mean_std(observation) if self.normalize_input else observation
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/hu/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/algos_torch/running_mean_std.py", line 79, in forward
    y = (input - current_mean.float()) / torch.sqrt(current_var.float() + self.epsilon)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

@tylerlum
Copy link

NOTE: edited solution above to include device_name. This fixes the problem for python train.py task=Ant headless=True sim_device=cpu rl_device=cpu test=True.

This doesn't fix the other issue though. I believe this is from

        self.scaler = torch.cuda.amp.GradScaler(enabled=self.mixed_precision)

in rl_games/common/a2c_common.py, which would need more work to fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants