v2026.6.1
All Bundles
Bundle Classic AI algorithms: graph search, adversarial game search, optimization and tabular reinforcement learning (-lib ai)

MarkovDecisionProcess

An explicit finite Markov decision process solved by value iteration. Transitions are added as (state, action) -> (next, probability, reward) tuples; probabilities for each (state, action) should sum to 1.

Example

mdp := MarkovDecisionProcess->New(3, 2);
mdp->AddTransition(0, 1, 1, 1.0, 0.0);
mdp->AddTransition(1, 1, 2, 1.0, 10.0);
mdp->Solve(0.9, 0.000001, 1000);
policy := mdp->GetPolicy();

Operations

AddTransition #

Adds a transition.

method : public : AddTransition(state:Int, action:Int, next:Int, prob:Float, reward:Float) ~ Bool

Parameters

NameTypeDescription
stateIntsource state
actionIntaction id
nextIntresulting state
probFloattransition probability
rewardFloattransition reward

Return

TypeDescription
Booltrue if added, false on invalid ids

GetPolicy #

Gets the greedy policy (best action per state).

method : public : GetPolicy() ~ Int[]

Return

TypeDescription
Intpolicy array, or Nil when unsolved

GetValues #

Gets the optimal state values.

method : public : GetValues() ~ Float[]

Return

TypeDescription
Floatvalue array, or Nil when unsolved

IsSolved #

Whether the MDP has been solved.

method : public : IsSolved() ~ Bool

Return

TypeDescription
Booltrue if solved

New # constructor

Constructor

New(num_states:Int, num_actions:Int)

Parameters

NameTypeDescription
num_statesIntnumber of states
num_actionsIntnumber of actions

Solve #

Solves the MDP by value iteration and extracts the greedy policy. States without transitions for an action treat that action's value as negative infinity; states with no transitions at all hold value 0.

method : public : Solve(gamma:Float, tolerance:Float, max_iterations:Int) ~ Bool

Parameters

NameTypeDescription
gammaFloatdiscount factor
toleranceFloatstop when no state value changes more than this
max_iterationsIntiteration cap

Return

TypeDescription
Booltrue when solved