-
生成数据的过程,用于大模型的训练 https://github.com/LucasAlegre/sumo-rl 、https://github.com/LucasAlegre/sumo-rl
-
RL训练的过程为大模型提供数据
-
好处是交互时知道这个过程,
-
同时交互也提供微调数据(越往后面权重越大),Deepseek微调 https://blog.csdn.net/2401_85375186/article/details/145264671
-
语音播报,交通治理
Reinforcement Learning Benchmarks for Traffic Signal Control (RESCO)
EcoLight: Reward Shaping in Deep Reinforcement Learning for Ergonomic Traffic Signal Control