v2026.6.1
All Bundles
Bundle Classic AI algorithms: graph search, adversarial game search, optimization and tabular reinforcement learning (-lib ai)

Sarsa

On-policy SARSA: epsilon-greedy behavior bootstrapping on the action actually taken, Q(s,a) += alpha*(r + gamma*Q(s',a') - Q(s,a)). Seeded for reproducible training.

Operations

BestAction #

Gets the greedy action for a state.

method : public : BestAction(state:Int) ~ Int

Parameters

NameTypeDescription
stateIntstate id

Return

TypeDescription
Intbest action, or -1 when untrained

GetQ #

Gets the learned Q table (states x actions).

method : public : GetQ() ~ Float[,]

Return

TypeDescription
FloatQ table, or Nil when untrained

IsTrained #

Whether the agent has been trained.

method : public : IsTrained() ~ Bool

Return

TypeDescription
Booltrue if trained

New # constructor

Constructor

New(alpha:Float, gamma:Float, epsilon:Float, seed:Int)

Parameters

NameTypeDescription
alphaFloatlearning rate
gammaFloatdiscount factor
epsilonFloatexploration probability
seedIntPRNG seed for reproducible training

Train #

Trains the agent.

method : public : Train(env:Environment, episodes:Int, max_steps:Int) ~ Bool

Parameters

NameTypeDescription
envEnvironmentenvironment
episodesIntnumber of episodes
max_stepsIntper-episode step cap

Return

TypeDescription
Booltrue if training ran