monte-carlo q-learning dqn epsilon-greedy policy-gradient dynamic-programming transfer-learning policy-iteration value-iteration model-based-rl behavioral-economics sarsa-learning n-armed-bandit-problem double-q-learning model-learning n-step-expected-sarsa n-step-tree-backup ucb-algorithm cognitive-fallacies
-
Updated
Sep 27, 2021 - Jupyter Notebook