Releases: markkho/msdm
Releases · markkho/msdm
v0.11 Release
v0.10 Release
Summary of changes/additions:
- Implemented a
Table
class that allows for a dict and numpy-like interface with numpy array backend MarkovDecisionProcess
andPartiallyObservableMDP
algorithms returnResults
objects with attributes in the form ofTable
s (e.g.,state_value
,action_value
,policy
) - note that this is a breaking change- For all MDPs and derived problem classes,
is_terminal
has been changed tois_absorbing
FunctionalPolicy
andTabularPolicy
classes introducedPolicyIteration
,ValueIteration
, andMultichainPolicyIteration
have been (re-)implemented- Tests have been streamlined
- Organization of core modules has been streamlined
v0.9 Release
Summary of changes/additions:
- RMAX implementation
- Fix TD Learning bug
- Fix
TabularMDP.reachable_states
- New tests
v0.8 Release
Summary of changes/additions:
LAOStar
error handling- New
DictDistribution
methods - New
condition
,chain
, andis_normalized
methods inFiniteDistribution
v0.7 Release
Summary of changes/additions:
- POMDP solvers:
FSCBoundedPolicyIteration
(new)FSCGradientAscent
(minor changes)
- Planning algorithms
- Major refactor of
LAOStar
to support event listener pattern (note interface changes) - Minor refactor of
LRTDP
to support event listener pattern
- Major refactor of
- Core classes
- Fix to
TabularPolicy.from_q_matrices
calculation of softmax distribution - Minor changes to core POMDP implementation
- Fix to
- New domains
GridMDP
base class and plotting toolsWindyGridWorld
MDP
- clean up
v0.6
v0.5 Release
This release mainly includes interfaces, algorithms, and test domains for tabular partially observable markov decision processes (POMDPs).
Summary of changes:
- Core POMDP classes:
PartiallyObservableMDP
TabularPOMDP
BeliefMDP
POMDPPolicy
ValueBasedTabularPOMDPPolicy
AlphaVectorPolicy
FiniteStateController
StochasticFiniteStateController
- Domains:
HeavenOrHell
LoadUnload
Tiger
- Algorithms:
PointBasedValueIteration
QMDP
FSCGradientAscent
- JuliaPOMDPs wrapper
- Fixes to Policy Iteration and Value Iteration
- Updated README.md
v0.4 Release
New Features
- QLearning, SARSA, Expected SARSA, DoubleQLearning
- Policy Iteration
- Entropy Regularized Policy Iteration
- Works with python 3.9
- QuickMDP and QuickTabularMDP constructors
- Construction of TabularMDPs from matrices
- New domains: CliffWalking, GridMDP generic class, Russell & Norvig gridworld example
- Gridworld plotting of action values
Refactoring of core
Major overhaul of core and tabular methods:
- States/actions are assumed to be hashable (e.g., Gridworld now uses frozendict; no built-in hashing functions; dictionaries are the main way to create maps)
- The distribution classes have been streamlined (Multinomial has been removed and DictDistribution is the main way to represent categorical distributions; .sample() takes a random number generator)
- Policy classes have been simplified
- More thorough type hints
Minor additions to algorithms
v0.2 Add makefile