Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stochastic Muzero performance was not as expected. #309

Open
walkacross opened this issue Dec 20, 2024 · 4 comments
Open

Stochastic Muzero performance was not as expected. #309

walkacross opened this issue Dec 20, 2024 · 4 comments
Labels
bug Something isn't working efficiency optimization Efficiency optimization (time, memory and so on)

Comments

@walkacross
Copy link

walkacross commented Dec 20, 2024

hi @puyuan1996 , sorry for the late response,as the training time of Stochastic Muzero in game 2048 seems excessively long.
I’d like to discuss some experimental results and questions with you.

1 Stochastic Muzero performance was not as expected.

The model reached an episode reward mean of around 50,000 at 2 million environment steps, but oscillated between 2 million and 14 million steps without significant improvement. both the collect stage and evaluate stage.
Screenshot from 2024-12-20 11-48-21
Screenshot from 2024-12-20 11-49-40

2 question about expected performance.

the performance of stochastic muzero in raw paper is as follows
Screenshot from 2024-12-20 12-06-58

it seems the model reaches an episode reward mean of around 250k at 1 billion environment step. Could you share your experimental results with me?

3 question about the tranining time.

based on the raw config in game-2048-stochstic-muzero in lightzero

env_id = 'game_2048'
action_space_size = 4
use_ture_chance_label_in_chance_encoder = True
collector_env_num = 8
n_episode = 8
evaluator_env_num = 3
num_simulations = 100
update_per_collect = 200
batch_size = 512
max_env_step = int(1e9)
reanalyze_ratio = 0.
num_of_possible_chance_tile = 2
chance_space_size = 16 * num_of_possible_chance_tile

the model took 5 days to reach 14 million environment steps. I’d like to ask:

3.1 What is the approximate training duration for your models?
3.2 How long would it take to train for 10 billion environment steps, as stated in the paper?
3.3 Are there any alternative approaches to further reduce the training time?

4 bug, the game-2048 can not render properly on the screen when set the mode="image_realtime_mode".

when set mode of the game-2048 to image_realtime_mode, there is no any response on the screen, you can try it.

import numpy as np
import pytest
from easydict import EasyDict

from game_2048_env import Game2048Env

cfg = Game2048Env.default_config()
print(cfg)
cfg.render_mode = "image_realtime_mode"
print(cfg)


env = Game2048Env(cfg=cfg)


obs = env.reset()
print(obs)
#action = np.random.choice([0,1,2,3])
#print(action)

#obs, reward, done, info = env.step(action)
#print()

for i in range(10000):
    #env.render(mode="image_realtime_mode")
    action = np.random.choice([0,1,2,3])
    obs, reward, done, info = env.step(action)
@puyuan1996 puyuan1996 added bug Something isn't working efficiency optimization Efficiency optimization (time, memory and so on) labels Dec 23, 2024
@puyuan1996
Copy link
Collaborator

Question 1 and Question 2

As you mentioned, our previous experiments were also limited to around 2M environment steps, and we did not conduct longer training sessions. Based on your preliminary experimental results, they align with ours. Regarding the lack of further improvement in later stages, we suspect it may be related to the 2048 environment settings. Currently, the code sets a maximum tile_num (see specific code: game_2048_env.py#L116), which might restrict the highest score achievable in a single game. Additionally, the existing configuration file (stochastic_muzero_2048_config.py) is still in its initial version and has not been extensively optimized for performance.

To address this issue, we suggest the following improvements:

  1. Enhance exploration mechanisms.
  2. Optimize hyperparameter tuning.
  3. Reward normalization: Introduce techniques like value rescale or symlog to normalize reward values, reduce the dynamic range of rewards, and improve training stability.

Implementing these methods could significantly improve performance in later stages.


Question 3

Regarding multi-GPU acceleration and environment optimization, we recommend focusing on the following two aspects:

  1. Multi-GPU distributed training:

    • Refer to the multi-GPU DDP configuration file for the Atari environment (atari_muzero_multigpu_ddp_config.py) and adapt the 2048 environment to the multi-GPU training framework.
    • In theory, distributed training with multiple GPUs should achieve nearly linear speedup.
  2. Environment and configuration optimization:

    • Environment optimization: Carefully analyze the game_2048_env.py code logic to eliminate unnecessary computations (e.g., potential redundancies in state encoding or rendering) and improve interaction efficiency.
    • Configuration optimization: Adjust parameters in stochastic_muzero_2048_config.py, such as reducing num_simulations or fine-tuning batch_size, to lower computational overhead while maintaining performance.

By optimizing these two aspects, you can improve both training efficiency and overall performance.


Question 4

For Question 4, we found that if you uncomment this line, your script will execute correctly, and you’ll be able to see the game being rendered in real time. We’ll fix this bug in a future update.


We plan to start working on efficiency and performance optimizations in the coming weeks. If you’re interested, you can explore these optimizations locally in advance and submit any improvements or questions via a PR or issue. We deeply appreciate your contributions and look forward to seeing your optimization results!

Once again, thank you for supporting the LightZero project!

@walkacross
Copy link
Author

hi, thanks for your detail reply. If there is any progress, I will update you accordingly.

@Khev
Copy link

Khev commented Jan 3, 2025

Dumb question -- how did you generate the performance plots? Is there in build functionality to plot from the logs?

@puyuan1996
Copy link
Collaborator

Dumb question -- how did you generate the performance plots? Is there in build functionality to plot from the logs?

Hello, you can refer to the following documentation (LightZero Documentation). If you have any questions, feel free to ask at any time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working efficiency optimization Efficiency optimization (time, memory and so on)
Projects
None yet
Development

No branches or pull requests

3 participants