Bundle Classic AI algorithms: graph search, adversarial game search, optimization and tabular reinforcement learning (-lib ai)
QLearning
Off-policy Q-learning: epsilon-greedy behavior with a greedy bootstrap, Q(s,a) += alpha*(r + gamma*max_a' Q(s',a') - Q(s,a)). Seeded for reproducible training.
Example
agent := QLearning->New(0.1, 0.95, 0.2, 7);
agent->Train(env, 500, 100);
best := agent->BestAction(state);Operations
BestAction #
Gets the greedy action for a state.
method : public : BestAction(state:Int) ~ IntParameters
| Name | Type | Description |
|---|---|---|
| state | Int | state id |
Return
| Type | Description |
|---|---|
| Int | best action, or -1 when untrained |
GetQ #
Gets the learned Q table (states x actions).
method : public : GetQ() ~ Float[,]Return
| Type | Description |
|---|---|
| Float | Q table, or Nil when untrained |
IsTrained #
Whether the agent has been trained.
method : public : IsTrained() ~ BoolReturn
| Type | Description |
|---|---|
| Bool | true if trained |
New # constructor
Constructor
New(alpha:Float, gamma:Float, epsilon:Float, seed:Int)Parameters
| Name | Type | Description |
|---|---|---|
| alpha | Float | learning rate |
| gamma | Float | discount factor |
| epsilon | Float | exploration probability |
| seed | Int | PRNG seed for reproducible training |
Train #
Trains the agent.
method : public : Train(env:Environment, episodes:Int, max_steps:Int) ~ BoolParameters
| Name | Type | Description |
|---|---|---|
| env | Environment | environment |
| episodes | Int | number of episodes |
| max_steps | Int | per-episode step cap |
Return
| Type | Description |
|---|---|
| Bool | true if training ran |