Objeck Documentation

All Bundles

Bundle Classic AI algorithms: graph search, adversarial game search, optimization and tabular reinforcement learning (-lib ai)

QLearning

Off-policy Q-learning: epsilon-greedy behavior with a greedy bootstrap, Q(s,a) += alpha*(r + gamma*max_a' Q(s',a') - Q(s,a)). Seeded for reproducible training.

Example

agent := QLearning->New(0.1, 0.95, 0.2, 7);
agent->Train(env, 500, 100);
best := agent->BestAction(state);

BestAction #

Gets the greedy action for a state.

method : public : BestAction(state:Int) ~ Int

Parameters

Name	Type	Description
state	Int	state id

Return

Type	Description
Int	best action, or -1 when untrained

GetQ #

Gets the learned Q table (states x actions).

method : public : GetQ() ~ Float[,]

Return

Type	Description
Float	Q table, or Nil when untrained

IsTrained #

Whether the agent has been trained.

method : public : IsTrained() ~ Bool

Return

Type	Description
Bool	true if trained

New # constructor

Constructor

New(alpha:Float, gamma:Float, epsilon:Float, seed:Int)

Parameters

Name	Type	Description
alpha	Float	learning rate
gamma	Float	discount factor
epsilon	Float	exploration probability
seed	Int	PRNG seed for reproducible training

Train #

Trains the agent.

method : public : Train(env:Environment, episodes:Int, max_steps:Int) ~ Bool

Parameters

Name	Type	Description
env	Environment	environment
episodes	Int	number of episodes
max_steps	Int	per-episode step cap

Return

Type	Description
Bool	true if training ran