Update README.md

0xC000005 · Sep 25, 2024 · d7f11cc · d7f11cc
1 parent 94a458a
commit d7f11cc
Showing 1 changed file with 61 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1,11 +1,71 @@
 # Revolution
 ![Repin_17October](https://github.com/social-ai-uoft/revolutions/assets/29427196/3488423c-29a9-4f8a-970b-ac9fd299a92e)
-![Image_20240908163948](https://github.com/user-attachments/assets/c50309d7-4df5-4247-b50c-662d37d0d6d0)
 
+## Architecture
+```mermaid
+classDiagram
+    class MultiAgentEnv {
+        +teams
+        +agents
+        +pairings
+    }
+    class Game {
+        +game_player_1
+        +game_player_2
+    }
+    class Team {
+        +team_players
+    }
+    class Player {
+        +model
+        +replay_buffer
+    }
+    class DQN_dynamic
+    class DoubleDQN_dynamic {
+        +dqn
+        +target_dqn
+    }
+    class ActorCritic {
+        +actor
+        +critic
+    }
+    class PPO {
+        +policy
+        +policy_old
+        +buffer
+    }
+    class RolloutBuffer
+
+    nn_Module <|-- DQN_dynamic
+    nn_Module <|-- ActorCritic
+
+    MultiAgentEnv --> "*" Team : contains
+    MultiAgentEnv --> "*" Player : contains
+    MultiAgentEnv --> "*" Game : creates
+
+    Team --> "*" Player : contains
+
+    Game --> "2" Player : references
+
+    Player --> "0..1" DoubleDQN_dynamic : may have
+    Player --> "0..1" PPO : may have
+
+    DoubleDQN_dynamic --> "2" DQN_dynamic : contains
+
+    PPO --> ActorCritic : contains
+    PPO --> RolloutBuffer : contains
+```
 
 **Changelog**
 
 
+**[Version 5.0](https://github.com/social-ai-uoft/revolutions/tree/version_5.0)**
+* **Episodic Memory Empowers Agents:**  Agents now retain memories of past interactions, specific to each opponents, enabling them to develop sophisticated and adaptive strategies.
+* **Replay Buffer Improvement:**  Agents now retain replay buffer of past interactions specific to each opponents, enabling user to learn state transition to one user, making predicting opponents behavior more explicit. 
+* **Richer State Information:** Agents make more informed decisions by considering additional factors such as relative team performance and opponent identities.
+* **NPC Opponents:** Test your agents against a variety of non-player characters (NPCs) with predefined behavioral patterns.
+* **Battle of the Sexes Integration:**  The reward function now includes internally both the battle of sexes and the prisoners dilemma. 
+
 **[Version 4.0](https://github.com/social-ai-uoft/revolutions/tree/version_4.0)**
 * **Reactive Training:** Agents now adapt their strategies dynamically in response to opponents' predefined actions. We have verified that in fact agents with PPO and episodic memory are able to learn
 * **Episodic Memory:** Agents now are able to associate opponents with their previous action within a episode, which significantly improve the agent performance