Bundle Classic AI algorithms: graph search, adversarial game search, optimization and tabular reinforcement learning (-lib ai)
Sarsa
On-policy SARSA: epsilon-greedy behavior bootstrapping on the action actually taken, Q(s,a) += alpha*(r + gamma*Q(s',a') - Q(s,a)). Seeded for reproducible training.
Operations
BestAction #
Gets the greedy action for a state.
method : public : BestAction(state:Int) ~ IntParameters
| Name | Type | Description |
|---|---|---|
| state | Int | state id |
Return
| Type | Description |
|---|---|
| Int | best action, or -1 when untrained |
GetQ #
Gets the learned Q table (states x actions).
method : public : GetQ() ~ Float[,]Return
| Type | Description |
|---|---|
| Float | Q table, or Nil when untrained |
IsTrained #
Whether the agent has been trained.
method : public : IsTrained() ~ BoolReturn
| Type | Description |
|---|---|
| Bool | true if trained |
New # constructor
Constructor
New(alpha:Float, gamma:Float, epsilon:Float, seed:Int)Parameters
| Name | Type | Description |
|---|---|---|
| alpha | Float | learning rate |
| gamma | Float | discount factor |
| epsilon | Float | exploration probability |
| seed | Int | PRNG seed for reproducible training |
Train #
Trains the agent.
method : public : Train(env:Environment, episodes:Int, max_steps:Int) ~ BoolParameters
| Name | Type | Description |
|---|---|---|
| env | Environment | environment |
| episodes | Int | number of episodes |
| max_steps | Int | per-episode step cap |
Return
| Type | Description |
|---|---|
| Bool | true if training ran |