Bundle Classic AI algorithms: graph search, adversarial game search, optimization and tabular reinforcement learning (-lib ai)
MarkovDecisionProcess
An explicit finite Markov decision process solved by value iteration. Transitions are added as (state, action) -> (next, probability, reward) tuples; probabilities for each (state, action) should sum to 1.
Example
mdp := MarkovDecisionProcess->New(3, 2);
mdp->AddTransition(0, 1, 1, 1.0, 0.0);
mdp->AddTransition(1, 1, 2, 1.0, 10.0);
mdp->Solve(0.9, 0.000001, 1000);
policy := mdp->GetPolicy();Operations
AddTransition #
Adds a transition.
method : public : AddTransition(state:Int, action:Int, next:Int, prob:Float, reward:Float) ~ BoolParameters
| Name | Type | Description |
|---|---|---|
| state | Int | source state |
| action | Int | action id |
| next | Int | resulting state |
| prob | Float | transition probability |
| reward | Float | transition reward |
Return
| Type | Description |
|---|---|
| Bool | true if added, false on invalid ids |
GetPolicy #
Gets the greedy policy (best action per state).
method : public : GetPolicy() ~ Int[]Return
| Type | Description |
|---|---|
| Int | policy array, or Nil when unsolved |
GetValues #
Gets the optimal state values.
method : public : GetValues() ~ Float[]Return
| Type | Description |
|---|---|
| Float | value array, or Nil when unsolved |
IsSolved #
Whether the MDP has been solved.
method : public : IsSolved() ~ BoolReturn
| Type | Description |
|---|---|
| Bool | true if solved |
New # constructor
Constructor
New(num_states:Int, num_actions:Int)Parameters
| Name | Type | Description |
|---|---|---|
| num_states | Int | number of states |
| num_actions | Int | number of actions |
Solve #
Solves the MDP by value iteration and extracts the greedy policy. States without transitions for an action treat that action's value as negative infinity; states with no transitions at all hold value 0.
method : public : Solve(gamma:Float, tolerance:Float, max_iterations:Int) ~ BoolParameters
| Name | Type | Description |
|---|---|---|
| gamma | Float | discount factor |
| tolerance | Float | stop when no state value changes more than this |
| max_iterations | Int | iteration cap |
Return
| Type | Description |
|---|---|
| Bool | true when solved |