Update README.md

zeonchen · Apr 14, 2021 · 9942054 · 9942054
1 parent 5703753
commit 9942054
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/README.md b/README.md
@@ -10,7 +10,7 @@ Hyperparameters of MADDPG are fine tuned, before training, $10,000$ sample data
 The value of DO ranges from 0 to 5 $mg/L$, and chemical dosage ranges from 0 to 800 $kg/d$. The first training is implemented under LCA scenario, with the SRT as
 15 days. For other scenarios, transfer learning is applied to narrow down required data size by freezing part of the network. 
 
-- State: The observation of agents includes historical information of five timesteps: (i) influent COD, TN, TP and NH$_3$-N (in ASM state form); (ii) inflow rate; (iii) time; (iv) current DO and dosage respectively. After each interaction, a reward signal is released by the environment. (In the code I provided, the environement only utilizes information of one timestep for experiments, which actually does not have stable performance).
+- State: The observation of agents includes historical information of five timesteps: (i) influent COD, TN, TP and NH$_3$-N (in ASM state form); (ii) inflow rate; (iii) time; (iv) current DO and dosage respectively.
 - Env: GPS-X and Gym are used to form the environment, the practitioners can also use surrogate models to run the code.
 - Reward: Rewards are formulated from LCA and LCCA perspectives, see paper for details.