v2026.6.1
All Bundles
Bundle Classic AI algorithms: graph search, adversarial game search, optimization and tabular reinforcement learning (-lib ai)

QLearning

Off-policy Q-learning: epsilon-greedy behavior with a greedy bootstrap, Q(s,a) += alpha*(r + gamma*max_a' Q(s',a') - Q(s,a)). Seeded for reproducible training.

Example

agent := QLearning->New(0.1, 0.95, 0.2, 7);
agent->Train(env, 500, 100);
best := agent->BestAction(state);

Operations

BestAction #

Gets the greedy action for a state.

method : public : BestAction(state:Int) ~ Int

Parameters

NameTypeDescription
stateIntstate id

Return

TypeDescription
Intbest action, or -1 when untrained

GetQ #

Gets the learned Q table (states x actions).

method : public : GetQ() ~ Float[,]

Return

TypeDescription
FloatQ table, or Nil when untrained

IsTrained #

Whether the agent has been trained.

method : public : IsTrained() ~ Bool

Return

TypeDescription
Booltrue if trained

New # constructor

Constructor

New(alpha:Float, gamma:Float, epsilon:Float, seed:Int)

Parameters

NameTypeDescription
alphaFloatlearning rate
gammaFloatdiscount factor
epsilonFloatexploration probability
seedIntPRNG seed for reproducible training

Train #

Trains the agent.

method : public : Train(env:Environment, episodes:Int, max_steps:Int) ~ Bool

Parameters

NameTypeDescription
envEnvironmentenvironment
episodesIntnumber of episodes
max_stepsIntper-episode step cap

Return

TypeDescription
Booltrue if training ran