Reinforcement Learning

Fill me in

Grid World

VLQuantitativeFinancePackage.MyPeriodicRectangularGridWorldModelType
mutable struct MyPeriodicRectangularGridWorldModel <: AbstractWorldModel

The MyPeriodicRectangularGridWorldModel mutable struct represents a rectangular grid world model.

Required fields

  • number_of_rows::Int: The number of rows in the grid
  • number_of_cols::Int: The number of columns in the grid
  • coordinates::Dict{Int,Tuple{Int,Int}}: A dictionary that holds the coordinates of each cell in the grid where the key is the cell index and the value is a tuple of the row and column index
  • states::Dict{Tuple{Int,Int},Int}: A dictionary that holds the states of each cell in the grid where the key is the cell coordinates and the value is the state index
  • moves::Dict{Int,Tuple{Int,Int}}: A dictionary that holds the moves that can be made from each cell in the grid where the key is the move index and the value is a tuple of the row and column changes associated with the move
  • rewards::Dict{Int,Float64}: A dictionary that holds the rewards for each cell in the grid where the key is the cell index and the value is the reward
source
VLQuantitativeFinancePackage.buildMethod
function build(type::MyPeriodicRectangularGridWorldModel, nrows::Int, ncols::Int, 
    rewards::Dict{Tuple{Int,Int}, Float64}; defaultreward::Float64 = -1.0) -> MyPeriodicRectangularGridWorldModel

The build method constructs an instance of the MyPeriodicRectangularGridWorldModel type using the data in the NamedTuple.

Arguments

  • type::MyPeriodicRectangularGridWorldModel: The type of model to build.
  • data::NamedTuple: The data to use to build the model.

The data::NamedTuple must contain the following keys:

  • nrows::Int: The number of rows in the grid
  • ncols::Int: The number of columns in the grid
  • rewards::Dict{Tuple{Int,Int}, Float64}: A dictionary that maps the coordinates of the grid to the rewards at those coordinates
  • defaultreward::Float64: The default reward for the grid. This is set to -1.0 by default.

Return

This function returns a populated instance of the MyPeriodicRectangularGridWorldModel type.

source

Wolfram policies and grids

Fill me in

VLQuantitativeFinancePackage.MyOneDimensionalElementarWolframRuleModelType
mutable struct MyOneDimensionalElementarWolframRuleModel <: AbstractPolicyModel

The MyOneDimensionalElementarWolframRuleModel mutable struct represents a one-dimensional elementary Wolfram rule model.

Required fields

  • index::Int: The index of the rule
  • radius::Int: The radius, i.e, the number of cells that influence the next state for this rule
  • rule::Dict{Int,Int}: A dictionary that holds the rule where the key is the binary representation of the neighborhood and the value is the next state
source
VLQuantitativeFinancePackage.buildMethod
function build(modeltype::Type{MyOneDimensionalElementarWolframRuleModel}, data::NamedTuple) -> MyOneDimensionalElementarWolframRuleModel

This build method constructs an instance of the MyOneDimensionalElementarWolframRuleModel type using the data in a NamedTuple.

Arguments

  • modeltype::Type{MyOneDimensionalElementarWolframRuleModel}: The type of model to build, in this case, the MyOneDimensionalElementarWolframRuleModel type.
  • data::NamedTuple: The data to use to build the model.

The data::NamedTuple must contain the following keys:

  • index::Int64: The index of the Wolfram rule
  • colors::Int64: The number of colors in the rule
  • radius::Int64: The radius, i.e., the number of cells to consider in the rule

Return

This function returns a populated instance of the MyOneDimensionalElementarWolframRuleModel type.

source
VLQuantitativeFinancePackage.MyTwoDimensionalTotalisticWolframRuleModelType
mutable struct MyTwoDimensionalTotalisticWolframRuleModel <: AbstractPolicyModel

The MyTwoDimensionalTotalisticWolframRuleModel mutable struct represents a two-dimensional totalistic Wolfram rule model.

Required fields

  • index::Int: The index of the rule
  • radius::Int: The radius, i.e, the number of cells that influence the next state for this rule
  • number_of_colors::Int: The number of colors in the rule, i.e., the number of states a cell can exist in
  • rule::Dict{Int,Int}: A dictionary that holds the rule where the key is index of the neighborhood state and the value is the next state
  • neighborhoodstatesmap::Dict{Float64, Int64}: A dictionary whose keys are the average of the neighborhood states and the value the neighborhood state index
source
VLQuantitativeFinancePackage.buildMethod
function build(modeltype::Type{MyTwoDimensionalTotalisticWolframRuleModel}, data::NamedTuple) -> MyTwoDimensionalTotalisticWolframRuleModel

This build method constructs an instance of the MyTwoDimensionalTotalisticWolframRuleModel type using the data in a NamedTuple.

Arguments

  • modeltype::Type{MyTwoDimensionalTotalisticWolframRuleModel}: The type of model to build.
  • data::NamedTuple: The data to use to build the model.

The data::NamedTuple must contain the following keys:

  • index::Int64: The index of the Wolfram rule
  • colors::Int64: The number of colors in the rule
  • radius::Int64: The radius, i.e., the number of neighbor cells to consider in the rule

Return

This function returns a populated instance of the MyTwoDimensionalTotalisticWolframRuleModel type.

source
VLQuantitativeFinancePackage.solveMethod
solve(rulemodel::MyTwoDimensionalTotalisticWolframRuleModel, initialstate::Array{Int64,2}; steps::Int64 = 100) -> Dict{Int64, Array{Int64,2}}

The solve function solves the two-dimensional totalistic Wolfram rule model for a given initial state.

Arguments

  • rulemodel::MyTwoDimensionalTotalisticWolframRuleModel: An instance of the MyTwoDimensionalTotalisticWolframRuleModel type that defines the rule model parameters.
  • initialstate::Array{Int64,2}: A two-dimensional array that represents the initial state of the world.
  • steps::Int64: The number of steps to simulate. The default value is 100.

Returns

  • Dict{Int64, Array{Int64,2}}: A dictionary where the keys are integers (time steps) and the values are two-dimensional arrays that represent the state of the world at each time step.
source

Wolfram Q-learning

Fill me in

VLQuantitativeFinancePackage.MyWolframRuleQLearningAgentModelType
mutable struct MyWolframRuleQLearningAgentModel <: AbstractLearningModel

The MyWolframRuleQLearningAgentModel mutable struct represents a Q-learning agent model for a Wolfram rule.

Required fields

  • states::Array{Int,1}: The states of the model
  • actions::Array{Int,1}: The actions of the model
  • γ::Float64: The discount factor
  • α::Float64: The learning rate
  • Q::Array{Float64,2}: The Q-table of the model
source
VLQuantitativeFinancePackage.MyWolframGridWorldModelType
mutable struct MyWolframGridWorldModel <: AbstractWorldModel

The MyWolframGridWorldModel mutable struct represents a grid world model for a Wolfram rule.

Required fields

  • number_of_states::Int64: The number of states of the agent
  • data::Dict{Int64, Array{Int64,1}}: A dictionary that holds temporal playback data where the key is the time step and the value is an array holding the neighborhood (first radius elemnts) of the agent and the next state (last element).
  • policymap::Dict{Float64, Int64}: A dictionary that holds the policy map where the key is the average of the neighborhood and the value is the state index
  • world::Function: A function that represents the world model. The function takes the world model, the time step t, the state s and proposed action a and returns the next state of the agent, and the immediate reward for taking action a in state s.
source
VLQuantitativeFinancePackage.sampleMethod
sample(agent::MyWolframRuleQLearningAgentModel, environment::MyWolframGridWorldModel; maxsteps::Int = 100, ϵ::Float64 = 0.2) -> MyWolframRuleQLearningAgentModel

The sample function samples the environment using the Q-learning agent model for a given number of steps using an epsilon-greedy method. At each step, the agent selects an action based on the epsilon-greedy method and updates the Q-table using the Q-learning update rule by calling the world function contained in the environment model.

Arguments

  • agent::MyWolframRuleQLearningAgentModel: An instance of the MyWolframRuleQLearningAgentModel type that defines the agent model parameters.
  • environment::MyWolframGridWorldModel: An instance of the MyWolframGridWorldModel type that defines the world model.
  • maxsteps::Int: The number of steps to sample. The default value is 100.
  • ϵ::Float64: The epsilon value for the epsilon-greedy method. The default value is 0.2.

Returns

source