Dynamic programming and Heuristic Optimization#
Dynamic programming#
Dynamic programming is an approach developed by Richard Bellman in the 1950s that solves problems by breaking them down into smaller subproblems and storing the solutions to these subproblems. This allows the program to avoid recomputing the solution to the same subproblem multiple times and instead reuse the stored solutions, significantly improving the algorithm’s efficiency.
Dynamic programming is typically used for problems that can be divided into overlapping subproblems, i.e., the solution to one subproblem can be used to solve other subproblems. This is often the case with optimization problems, where the goal is to find the optimal solution among possible solutions. It is also true for sequential decision problems.
Dynamic decision problems#
Imagine we have a decision-making agent in some state \(x_{t}\), at time \(t\). At time \(t\), the agent takes an action \(a_{t}\) from a set of possible actions \(a_{t}\in\mathcal{A}\left(x_{t}\right)\) that leads to a new state \(x_{t+1} = T(x_{t},a_{t})\) and a payoff \(F(x_{t},a_{t})\). In this situation, an infinite-horizon decision problem takes the form:
where the function \(V(x_{\circ})\) is the value function, \(x_{\circ}\) is the initial state of the agent and \(0\leq\beta\leq{1}\) denotes the discount factor. Dynamic programming breaks this decision problem into smaller subproblems using Bellman’s principle of optimality (Observation 8):
Observation 8 (Bellman’s principle of optimality)
Principle of optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
The optimality principle suggests that we can pull the first decision from the sum, which gives a value function of the form:
However, the quantity inside the brackets of the \(V(x_{\circ})\) expression is just \(V(x_{1})\), i.e., the decision problem starting at time step 1
instead of time step 0
. Thus, we can re-write the expression above as:
Dropping the subscripts gives rise to a Bellman equation of the form shown in Definition 46:
Definition 46 (Bellman equation)
The optimal value of a state \(x\), denoted by \(V(x)\), is governed by a Bellman equation of the form:
where \(\mathcal{A}\) denotes the set of possible actions open to the decision maker and \(0\leq\beta\leq{1}\) represents the discount factor.
Equation (76) is a functional equation, i.e., an equation in which functions appear as unknowns. Thus, solving Eqn. (76) gives the unknown value function \(V(x)\). Further, by calculating the value function, we will find the policy function \(a(x)\) that describes the optimal action as a function of the system state \(x\).
Heuristic optimization#
Suppose we want to maximize or minimize a nonlinear objective function \(f(x)\) subject to some (potentially nonlinear) constraints \(g(x)\). One of the first things we might try is the method of Lagrange multipliers:
Definition 47 (Method of Lagrange multipliers)
To find the optimum of a function \(f(x)\) subject to the equality constraint \(g(x)\), we form the Lagrangian \(\mathcal{L}(x,\lambda)\):
where \(\lambda\) is the Lagrange multiplier for constraint \(g(x)\). At a critical point (maximum or minimum), the partial derivatives of the Lagrangian with respect to \(x\) and \(\lambda\) vanish:
This system of equations, which represents first-order optimality conditions, can be solved for the critical points and, thus, the maximum or minimum value of the objective function.
However, numerically solving the system first-order optimality equations directly can be challenging, especially when the objective function \(f(x)\) or constraints \(g(x)\) are non-convex, or expensive to evaluate. Alternatively, we could use a numerical optimization algorithm to find the optimal solution for unconstrained problems, such as gradient descent. Gradient descent is an iterative optimization algorithm that uses the gradient of the objective function to update the solution in the direction of the steepest descent:
where \(x_{i}\) is the current solution, \(\alpha\) is the step size, and \(\nabla{f}(x_{i})\) is the gradient of the objective function at \(x_{i}\). If we have constraints, we can use a penalty function approach to enforce the constraints, i.e., we add a penalty term to the objective function that penalizes violations of the constraints.
However, gradient descent may not always work, especially when the objective function \(f(x)\) or constraints \(g(x)\) have some edge-case properties, i.e., these functions are non-differentiable. Moreover, maybe we don’t have a mathematical model for the objective function, e.g., it relies on a solution to another problem, or the derivative is too expensive to compute. Finally, the objective function may have many local optima, and gradient descent may get stuck in a local minimum. Thus, gradient descent is a hassle to implement and may not always work (IMHO!
).
We can use heuristic optimization algorithms to find near-optimal solutions in such cases. Heuristic optimization is a family of algorithms inspired by natural phenomena and human behavior. Heuristic methods are often used to solve complex, real-world problems where exact solutions are difficult to obtain or may not even exist. Unlike traditional optimization methods, heuristic approaches use rules of thumb, intuition, and trial-and-error to explore the search space and find the best solution. They use values of the objective function to guide the search, and they may not always find the optimal solution, but they can find a good solution in a reasonable amount of time.
There are several types of heuristic optimization algorithms, including:
Simulated Annealing (SA) is inspired by the process of annealing in metallurgy. It starts with an initial solution and gradually changes it by randomly perturbing it and accepting or rejecting the new solution based on a probability function. SA is commonly used for problems that involve finding the global minimum of a non-convex function.
Genetic Algorithms (GA) are inspired by the process of natural selection in biology. It starts with a population of potential solutions represented as chromosomes, which evolve through selection, crossover, and mutation operations to create a new generation of solutions. GA is commonly used for problems that involve a large number of variables and constraints.
Particle Swarm Optimization (PSO) simulates the behavior of a swarm of particles moving in a multi-dimensional search space. Each particle represents a potential solution to the problem, and it adjusts its position and velocity based on the best solution found by the swarm. PSO is commonly used for problems that require continuous optimization.
Ant colony optimization (ACO) is inspired by the behavior of ant colonies. It involves a set of artificial ants that communicate with each other using pheromone trails to find the best path between a start and end point. ACO is commonly used for problems that find the shortest path in a graph.
Artificial Bee Colony (ABC) simulates the behavior of a colony of artificial honey bees. ABC is commonly used for problems that involve finding the optimal solution in a continuous search space. Each bee represents a potential solution to the problem and adjusts its position based on the quality of the nectar source it has found.
Tabu Search (TS) uses a memory-based search strategy to avoid revisiting previously explored solutions. TS is commonly used for problems that involve finding the optimal solution in a combinatorial search space. It maintains a list of tabu moves, which are forbidden moves, to ensure that the search space is explored efficiently.
Simulated annealing#
Simulated annealing (SA) is a heuristic optimization algorithm inspired by the annealing process in metallurgy [Kirkpatrick and Vecchi, 1983]. Simulated annealing is especially good at finding near-optimal solutions to problems with many local optima.
Simulated annealing solves complex optimization problems by randomly selecting a candidate solution and evaluating the fitness of this candidate compared to the current best solution. The candidate solution is accepted or rejected based on a probability function; candidate solutions far away from the best solution are less likely to be selected. However, the willingness of the simulated annealing algorithm to choose a candidate that is far from the best solution changes over time; in the beginning, the SA algorithm is more willing to take a chance. However, as time progresses, the SA algorithm only bets on sure things, i.e., new solutions that are strictly better than the best solution found so far.
A pseudo-code implementation for a simulated annealing routine is given in Algorithm 12:
Algorithm 12 (Simulated Annealing)
Inputs The objective function \(f\), temperature, neighbor, and accept functions, initial guess \(x_{\circ}\), initial temperature \(T\), and max iterations \(\mathcal{M}_{\infty}\)
Outputs the \(\min_{x}f(x)\) and \(\arg\min_{x}f(x)\).
Initialize
Set best solution \(\hat{x}\leftarrow{x}_{\circ}\)
Set initial temperature \(T_{0}\leftarrow{T}\)
Main
for \(i\in\left\{1,\dots,\mathcal{M}_{\infty}\right\}\)
update temperature \(T_{i}\leftarrow\text{temperature}(T_{i-1},i)\)
generate new solution \(x^{\prime}\leftarrow\text{neighbor}(\hat{x})\)
compute objective function difference \(\Delta_{i} = f(x^{\prime}) - f(\hat{x})\)
if \(\text{accept}(\Delta_{i},T_{i}) = \text{true}\)
update the current best solution \({\hat{x}}\leftarrow{x}^{\prime}\)
Return the tuple \(f(\hat{x})\) and \(\hat{x}\).
The performance of simulated annealing depends upon the choice of the temperature, neighbor, and accept functions. Example pseudo code implementations for these functions is shown in Algorithm 13:
Algorithm 13 (Temperature, neighbor, and accept functions)
Temperature function
function temperature(\(T\), \(i\)):
return \(T\leftarrow\alpha\times{T}~\) where \(\alpha<1\)
Accept function
function accept(\(\Delta\), \(T\)):
if \(\Delta < 0\):
return true
if random(0, 1) < \(\exp\left(-\Delta/T\right)\):
return true
return false
Neighbor function
function neighbor(solution):
set new_solution \(\leftarrow\) copy(solution)
select random move
perform move on new_solution
return new_solution
Let’s explore the performance of a simulated annealing implementation in Example 26:
Example 26 (Minimization of the Sphere function)
Minimize the d-dimensional sphere function \(f(x)\):
subject to the bounds constraints \(x_{i}\in\left[-5.12,5.12\right]\) using simulated annealing. The solution code for this example can be found here.
Summary#
Dynamic programming, heuristic optimization, and other combinatorial optimization techniques can all be used to solve optimization problems. In this lecture we introduced several optimization approaches:
Dynamic programming solves optimization problems by breaking them down into smaller subproblems, solving each subproblem once, and storing the solutions in a table or array. It is typically used for problems that can be divided into similar subproblems and for which the optimal solution can be constructed from optimal solutions to the subproblems.
Heuristic optimization is a class of algorithms used to solve nonlinear optimization problems, inspired by natural phenomena and human behavior. These algorithms are particularly useful when the objective function or constraints are non-convex, non-differentiable, or expensive to evaluate. Heuristic optimization algorithms use rules of thumb, intuition, and trial-and-error methods to explore the search space and find near-optimal solutions.