Reinforcement Learning

Fill me in

Grid World

VLQuantitativeFinancePackage.MyPeriodicRectangularGridWorldModel — Type

mutable struct MyPeriodicRectangularGridWorldModel <: AbstractWorldModel

The MyPeriodicRectangularGridWorldModel mutable struct represents a rectangular grid world model.

Required fields

number_of_rows::Int: The number of rows in the grid
number_of_cols::Int: The number of columns in the grid
coordinates::Dict{Int,Tuple{Int,Int}}: A dictionary that holds the coordinates of each cell in the grid where the key is the cell index and the value is a tuple of the row and column index
states::Dict{Tuple{Int,Int},Int}: A dictionary that holds the states of each cell in the grid where the key is the cell coordinates and the value is the state index
moves::Dict{Int,Tuple{Int,Int}}: A dictionary that holds the moves that can be made from each cell in the grid where the key is the move index and the value is a tuple of the row and column changes associated with the move
rewards::Dict{Int,Float64}: A dictionary that holds the rewards for each cell in the grid where the key is the cell index and the value is the reward

source

VLQuantitativeFinancePackage.build — Method

function build(type::MyPeriodicRectangularGridWorldModel, nrows::Int, ncols::Int, 
    rewards::Dict{Tuple{Int,Int}, Float64}; defaultreward::Float64 = -1.0) -> MyPeriodicRectangularGridWorldModel

The build method constructs an instance of the MyPeriodicRectangularGridWorldModel type using the data in the NamedTuple.

Arguments

type::MyPeriodicRectangularGridWorldModel: The type of model to build.
data::NamedTuple: The data to use to build the model.

The data::NamedTuple must contain the following keys:

nrows::Int: The number of rows in the grid
ncols::Int: The number of columns in the grid
rewards::Dict{Tuple{Int,Int}, Float64}: A dictionary that maps the coordinates of the grid to the rewards at those coordinates
defaultreward::Float64: The default reward for the grid. This is set to -1.0 by default.

Return

This function returns a populated instance of the MyPeriodicRectangularGridWorldModel type.

source

Wolfram policies and grids

Fill me in

VLQuantitativeFinancePackage.MyOneDimensionalElementarWolframRuleModel — Type

mutable struct MyOneDimensionalElementarWolframRuleModel <: AbstractPolicyModel

The MyOneDimensionalElementarWolframRuleModel mutable struct represents a one-dimensional elementary Wolfram rule model.

Required fields

index::Int: The index of the rule
radius::Int: The radius, i.e, the number of cells that influence the next state for this rule
rule::Dict{Int,Int}: A dictionary that holds the rule where the key is the binary representation of the neighborhood and the value is the next state

source

VLQuantitativeFinancePackage.build — Method

function build(modeltype::Type{MyOneDimensionalElementarWolframRuleModel}, data::NamedTuple) -> MyOneDimensionalElementarWolframRuleModel

This build method constructs an instance of the MyOneDimensionalElementarWolframRuleModel type using the data in a NamedTuple.

Arguments

modeltype::Type{MyOneDimensionalElementarWolframRuleModel}: The type of model to build, in this case, the MyOneDimensionalElementarWolframRuleModel type.
data::NamedTuple: The data to use to build the model.

The data::NamedTuple must contain the following keys:

index::Int64: The index of the Wolfram rule
colors::Int64: The number of colors in the rule
radius::Int64: The radius, i.e., the number of cells to consider in the rule

Return

This function returns a populated instance of the MyOneDimensionalElementarWolframRuleModel type.

source

VLQuantitativeFinancePackage.MyTwoDimensionalTotalisticWolframRuleModel — Type

mutable struct MyTwoDimensionalTotalisticWolframRuleModel <: AbstractPolicyModel

The MyTwoDimensionalTotalisticWolframRuleModel mutable struct represents a two-dimensional totalistic Wolfram rule model.

Required fields

index::Int: The index of the rule
radius::Int: The radius, i.e, the number of cells that influence the next state for this rule
number_of_colors::Int: The number of colors in the rule, i.e., the number of states a cell can exist in
rule::Dict{Int,Int}: A dictionary that holds the rule where the key is index of the neighborhood state and the value is the next state
neighborhoodstatesmap::Dict{Float64, Int64}: A dictionary whose keys are the average of the neighborhood states and the value the neighborhood state index

source

VLQuantitativeFinancePackage.build — Method

function build(modeltype::Type{MyTwoDimensionalTotalisticWolframRuleModel}, data::NamedTuple) -> MyTwoDimensionalTotalisticWolframRuleModel

This build method constructs an instance of the MyTwoDimensionalTotalisticWolframRuleModel type using the data in a NamedTuple.

Arguments

modeltype::Type{MyTwoDimensionalTotalisticWolframRuleModel}: The type of model to build.
data::NamedTuple: The data to use to build the model.

The data::NamedTuple must contain the following keys:

index::Int64: The index of the Wolfram rule
colors::Int64: The number of colors in the rule
radius::Int64: The radius, i.e., the number of neighbor cells to consider in the rule

Return

This function returns a populated instance of the MyTwoDimensionalTotalisticWolframRuleModel type.

source

VLQuantitativeFinancePackage.solve — Method

solve(rulemodel::MyTwoDimensionalTotalisticWolframRuleModel, initialstate::Array{Int64,2}; steps::Int64 = 100) -> Dict{Int64, Array{Int64,2}}

The solve function solves the two-dimensional totalistic Wolfram rule model for a given initial state.

Arguments

rulemodel::MyTwoDimensionalTotalisticWolframRuleModel: An instance of the MyTwoDimensionalTotalisticWolframRuleModel type that defines the rule model parameters.
initialstate::Array{Int64,2}: A two-dimensional array that represents the initial state of the world.
steps::Int64: The number of steps to simulate. The default value is 100.

Returns

Dict{Int64, Array{Int64,2}}: A dictionary where the keys are integers (time steps) and the values are two-dimensional arrays that represent the state of the world at each time step.

source

Wolfram Q-learning

Fill me in

VLQuantitativeFinancePackage.MyWolframRuleQLearningAgentModel — Type

mutable struct MyWolframRuleQLearningAgentModel <: AbstractLearningModel

The MyWolframRuleQLearningAgentModel mutable struct represents a Q-learning agent model for a Wolfram rule.

Required fields

states::Array{Int,1}: The states of the model
actions::Array{Int,1}: The actions of the model
γ::Float64: The discount factor
α::Float64: The learning rate
Q::Array{Float64,2}: The Q-table of the model

source

VLQuantitativeFinancePackage.build — Method

function build(modeltype::Type{MyWolframRuleQLearningAgentModel}, data::NamedTuple) -> MyWolframRuleQLearningAgentModel

source

VLQuantitativeFinancePackage.MyWolframGridWorldModel — Type

mutable struct MyWolframGridWorldModel <: AbstractWorldModel

The MyWolframGridWorldModel mutable struct represents a grid world model for a Wolfram rule.

Required fields

number_of_states::Int64: The number of states of the agent
data::Dict{Int64, Array{Int64,1}}: A dictionary that holds temporal playback data where the key is the time step and the value is an array holding the neighborhood (first radius elemnts) of the agent and the next state (last element).
policymap::Dict{Float64, Int64}: A dictionary that holds the policy map where the key is the average of the neighborhood and the value is the state index
world::Function: A function that represents the world model. The function takes the world model, the time step t, the state s and proposed action a and returns the next state of the agent, and the immediate reward for taking action a in state s.

source

VLQuantitativeFinancePackage.build — Method

function build(modeltype::Type{MyWolframGridWorldModel}, data::NamedTuple) -> MyWolframGridWorldModel

source

VLQuantitativeFinancePackage.sample — Method

sample(agent::MyWolframRuleQLearningAgentModel, environment::MyWolframGridWorldModel; maxsteps::Int = 100, ϵ::Float64 = 0.2) -> MyWolframRuleQLearningAgentModel

The sample function samples the environment using the Q-learning agent model for a given number of steps using an epsilon-greedy method. At each step, the agent selects an action based on the epsilon-greedy method and updates the Q-table using the Q-learning update rule by calling the world function contained in the environment model.

Arguments

agent::MyWolframRuleQLearningAgentModel: An instance of the MyWolframRuleQLearningAgentModel type that defines the agent model parameters.
environment::MyWolframGridWorldModel: An instance of the MyWolframGridWorldModel type that defines the world model.
maxsteps::Int: The number of steps to sample. The default value is 100.
ϵ::Float64: The epsilon value for the epsilon-greedy method. The default value is 0.2.

Returns

MyWolframRuleQLearningAgentModel: An updated instance of the MyWolframRuleQLearningAgentModel type, where the Q-table has been updated.

source