Skip to content

Commit

Permalink
Minor change on greedy policy variable usage
Browse files Browse the repository at this point in the history
Chap 18, why not using directly the 'n_outputs' variable defined earlier, instead of hardcoded '2'
  • Loading branch information
lebaste77 authored Feb 28, 2021
1 parent 0eb31f7 commit 64f0e05
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion 18_reinforcement_learning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1306,7 +1306,7 @@
"source": [
"def epsilon_greedy_policy(state, epsilon=0):\n",
" if np.random.rand() < epsilon:\n",
" return np.random.randint(2)\n",
" return np.random.randint(n_outputs)\n",
" else:\n",
" Q_values = model.predict(state[np.newaxis])\n",
" return np.argmax(Q_values[0])"
Expand Down

0 comments on commit 64f0e05

Please sign in to comment.