Objeck Documentation

All Bundles

Bundle Classic AI algorithms: graph search, adversarial game search, optimization and tabular reinforcement learning (-lib ai)

MarkovDecisionProcess

An explicit finite Markov decision process solved by value iteration. Transitions are added as (state, action) -> (next, probability, reward) tuples; probabilities for each (state, action) should sum to 1.

Example

mdp := MarkovDecisionProcess->New(3, 2);
mdp->AddTransition(0, 1, 1, 1.0, 0.0);
mdp->AddTransition(1, 1, 2, 1.0, 10.0);
mdp->Solve(0.9, 0.000001, 1000);
policy := mdp->GetPolicy();

AddTransition #

Adds a transition.

method : public : AddTransition(state:Int, action:Int, next:Int, prob:Float, reward:Float) ~ Bool

Parameters

Name	Type	Description
state	Int	source state
action	Int	action id
next	Int	resulting state
prob	Float	transition probability
reward	Float	transition reward

Return

Type	Description
Bool	true if added, false on invalid ids

GetPolicy #

Gets the greedy policy (best action per state).

method : public : GetPolicy() ~ Int[]

Return

Type	Description
Int	policy array, or Nil when unsolved

GetValues #

Gets the optimal state values.

method : public : GetValues() ~ Float[]

Return

Type	Description
Float	value array, or Nil when unsolved

IsSolved #

Whether the MDP has been solved.

method : public : IsSolved() ~ Bool

Return

Type	Description
Bool	true if solved

New # constructor

Constructor

New(num_states:Int, num_actions:Int)

Parameters

Name	Type	Description
num_states	Int	number of states
num_actions	Int	number of actions

Solve #

Solves the MDP by value iteration and extracts the greedy policy. States without transitions for an action treat that action's value as negative infinity; states with no transitions at all hold value 0.

method : public : Solve(gamma:Float, tolerance:Float, max_iterations:Int) ~ Bool

Parameters

Name	Type	Description
gamma	Float	discount factor
tolerance	Float	stop when no state value changes more than this
max_iterations	Int	iteration cap

Return

Type	Description
Bool	true when solved