Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
zeonchen authored Apr 14, 2021
1 parent 5703753 commit 9942054
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Hyperparameters of MADDPG are fine tuned, before training, $10,000$ sample data
The value of DO ranges from 0 to 5 $mg/L$, and chemical dosage ranges from 0 to 800 $kg/d$. The first training is implemented under LCA scenario, with the SRT as
15 days. For other scenarios, transfer learning is applied to narrow down required data size by freezing part of the network.

- State: The observation of agents includes historical information of five timesteps: (i) influent COD, TN, TP and NH$_3$-N (in ASM state form); (ii) inflow rate; (iii) time; (iv) current DO and dosage respectively. After each interaction, a reward signal is released by the environment. (In the code I provided, the environement only utilizes information of one timestep for experiments, which actually does not have stable performance).
- State: The observation of agents includes historical information of five timesteps: (i) influent COD, TN, TP and NH$_3$-N (in ASM state form); (ii) inflow rate; (iii) time; (iv) current DO and dosage respectively.
- Env: GPS-X and Gym are used to form the environment, the practitioners can also use surrogate models to run the code.
- Reward: Rewards are formulated from LCA and LCCA perspectives, see paper for details.

Expand Down

0 comments on commit 9942054

Please sign in to comment.