GitHub - Rain-shadow/ElegantRL: Scalable and Elastic Deep Reinforcement Learning Using PyTorch. Please star. 🔥

ElegantRL “小雅”: Scalable and Elastic Deep Reinforcement Learning

ElegantRL is developed for researchers and practitioners with the following advantages:

Lightweight: the core codes <1,000 lines (check elegantrl/tutorial), using PyTorch (train), OpenAI Gym (env), NumPy, Matplotlib (plot).
Efficient: in many testing cases, we find it more efficient than Ray RLlib.
Stable: much more stable than [Stable Baselines 3] (https://github.com/DLR-RM/stable-baselines3). Stable Baselines 3 can only use single GPU, but ElegantRL can use 1~8 GPUs for stable training.

ElegantRL implements the following model-free deep reinforcement learning (DRL) algorithms:

DDPG, TD3, SAC, PPO, PPO (GAE),REDQ for continuous actions
DQN, DoubleDQN, D3QN, SAC for discrete actions
QMIX, VDN; MADDPG, MAPPO, MATD3 for multi-agent environment

For the details of DRL algorithms, please check out the educational webpage OpenAI Spinning Up.

《诗经·小雅·鹤鸣》中「他山之石，可以攻玉」，是我们的库“小雅”名字的来源。

News

[Towardsdatascience] ElegantRL-Podracer: A Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning
[Towardsdatascience] ElegantRL: A Lightweight and Stable Deep Reinforcement Learning Library
[Towardsdatascience] ElegantRL: Mastering PPO Algorithms
[MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part I)
[MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part II)

Framework (Helloworld folder)

An agent (agent.py) with Actor-Critic networks (net.py) is trained (run.py) by interacting with an environment (env.py).

A high-level overview:

1). Instantiate an environment in Env.py, and an agent in Agent.py with an Actor network and a Critic network in Net.py;
2). In each training step in Run.py, the agent interacts with the environment, generating transitions that are stored into a Replay Buffer;
3). The agent fetches a batch of transitions from the Replay Buffer to train its networks;
4). After each update, an evaluator evaluates the agent's performance (e.g., fitness score or cumulative return) and saves the agent if the performance is good.

Code Structure

Core Codes

elegantrl/agents/net.py # Neural networks.
- Q-Net,
- Actor network,
- Critic network,
elegantrl/agents/Agent___.py # RL algorithms.
- AgentBase,
elegantrl/train/run___.py # run DEMO 1 ~ 4
- Parameter initialization,
- Training loop,
- Evaluator.

Until Codes

elegantrl/envs/ # gym env or custom env, including FinanceStockEnv.
- gym_utils.py: A PreprocessEnv class for gym-environment modification.
- Stock_Trading_Env: A self-created stock trading environment as an example for user customization.
eRL_demo_BipedalWalker.ipynb # BipedalWalker-v2 in jupyter notebooks
eRL_demos.ipynb # Demo 1~4 in jupyter notebooks. Tell you how to use tutorial version and advanced version.
eRL_demo_SingleFilePPO.py # Use a single file to train PPO, more simple than tutorial version
eRL_demo_StockTrading.py # Stock Trading Application in jupyter notebooks

Start to Train

Initialization:

hyper-parameters args.
env = PreprocessEnv() : creates an environment (in the OpenAI gym format).
agent = agent.XXX() : creates an agent for a DRL algorithm.
buffer = ReplayBuffer() : stores the transitions.
evaluator = Evaluator() : evaluates and stores the trained model.

Training (a while-loop):

agent.explore_env(…): the agent explores the environment within target steps, generates transitions, and stores them into the ReplayBuffer.
agent.update_net(…): the agent uses a batch from the ReplayBuffer to update the network parameters.
evaluator.evaluate_save(…): evaluates the agent's performance and keeps the trained model with the highest score.

The while-loop will terminate when the conditions are met, e.g., achieving a target score, maximum steps, or manually breaks.

Experiments

Experimental Demos

LunarLanderContinuous-v2

BipedalWalkerHardcore-v2

Note: BipedalWalkerHardcore is a difficult task in continuous action space. There are only a few RL implementations can reach the target reward. Check out an experiment video: Crack the BipedalWalkerHardcore-v2 with total reward 310 using IntelAC.

Requirements

Necessary:
| Python 3.6+     |           
| PyTorch 1.6+    |    

Not necessary:
| Numpy 1.18+     | For ReplayBuffer. Numpy will be installed along with PyTorch.
| gym 0.17.0      | For env. Gym provides tutorial env for DRL training. (env.render() bug in gym==0.18 pyglet==1.6. Change to gym==0.17.0, pyglet==1.5)
| pybullet 2.7+   | For env. We use PyBullet (free) as an alternative of MuJoCo (not free).
| box2d-py 2.3.8  | For gym. Use pip install Box2D (instead of box2d-py)
| matplotlib 3.2  | For plots. 

pip3 install gym==0.17.0 pybullet Box2D matplotlib

To install StarCraftII env,
bash ./elegantrl/envs/installsc2.sh
pip install -r sc2_requirements.txt

Citation:

To cite this repository:

@misc{erl,
  author = {Liu, Xiao-Yang and Li, Zechu and Wang, Zhaoran and Zheng, Jiahao},
  title = {{ElegantRL}: A Scalable and Elastic Deep Reinforcement Learning Library},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/AI4Finance-Foundation/ElegantRL}},
}

Name	Name	Last commit message	Last commit date
Latest commit supersglzc Update faq.rst Jan 1, 2022 8a699ec · Jan 1, 2022 History 1,498 Commits
.idea	.idea	faq	Dec 31, 2021
docs	docs	Update faq.rst	Jan 1, 2022
elegantrl	elegantrl	update	Dec 29, 2021
elegantrl_helloworld	elegantrl_helloworld	Update Qmix	Dec 21, 2021
figs	figs	Update Figure File_structure.png (simplify)	Sep 20, 2021
Awesome_Deep_Reinforcement_Learning_List.md	Awesome_Deep_Reinforcement_Learning_List.md	Update Awesome_Deep_Reinforcement_Learning_List.md	Nov 17, 2021
ChasingVecEnv.ipynb	ChasingVecEnv.ipynb	Created using Colaboratory	Dec 23, 2021
LICENSE	LICENSE	Update LICENSE	Feb 23, 2021
README.md	README.md	Update README.md	Dec 28, 2021
setup.py	setup.py	Update setup.py	Oct 14, 2021
tutorial_BipedalWalker.ipynb	tutorial_BipedalWalker.ipynb	BipedalWalker example	Nov 20, 2021
tutorial_Pendulum.ipynb	tutorial_Pendulum.ipynb	Pendulum example	Nov 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ElegantRL “小雅”: Scalable and Elastic Deep Reinforcement Learning

Contents

News

Framework (Helloworld folder)

Code Structure

Core Codes

Until Codes

Start to Train

Initialization:

Training (a while-loop):

Experiments

Experimental Demos

Requirements

Citation:

About

Releases

Packages

Languages

License

Rain-shadow/ElegantRL

Folders and files

Latest commit

History

Repository files navigation

ElegantRL “小雅”: Scalable and Elastic Deep Reinforcement Learning

Contents

News

Framework (Helloworld folder)

Code Structure

Core Codes

Until Codes

Start to Train

Initialization:

Training (a while-loop):

Experiments

Experimental Demos

Requirements

Citation:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages