I am busy with a project where I need to calibrate the Heston Model to some Asian options data. The model has been implemented as a function which executes a Monte Carlo (MC) simulation. As a result, the objective function is rather noisy. There are a number of algorithms for dealing with this sort of problem, and here I simply give a brief overview of some of them.
The Objective Function (with constraint)
For illustrative purposes, I will consider a simple objective function:
where solutions will be confined to the first octant (so that x, y and z are all greater than or equal to 0) and subject to the constraint that the product of x and y is greater than or equal to 1. It is clear that the optimum should occur when both x and y are 1 and z equates to pi. However, to make things more interesting, I will evaluate pi using a simple MC method. The objective function is implemented as
Here the constraint on x and y is enforced at the beginning: if the constraint is violated then the objective function returns a value of positive infinity. This has two consequences:
optimisation procedures will naturally reject these values and
the discontinuity will break any method which relies on derivatives.
The global parameter N determines the number of iterations in the MC calculation. Obviously, increasing N results in a less noisy estimate of pi. The effect of N can be seen in the boxplot below, where the objective function was repeatedly evaluated with input parameters x = y = 1 and z = pi. For N = 10, 100 or 1000 there is a significant degree of variability between samples. However, with N = 10000 or larger the values are quite stable. We will use N = 10000 since it will result in a relatively good value for pi but still capture some variability.
The obvious choice for optimisation is Differential Evolution (DE), which is a member of the class of probabilistic metaheuristics. It works by using a population of candidate solutions, from which new candidates are generated on every iteration. Only the fittest candidates are retained, so that the population gradually migrates towards the optimum. DE does not make any assumptions about the form of the function being optimised. It is thus applicable to scenarios in which the objective function is noisy, stochastic or not differentiable. DE does not guarantee that it will reach the global optimum, but given enough time it should get pretty close!
Here the algorithm stopped when it reached the maximum number of iterations. The value obtained for pi is rather respectable, as are the final values for x and y. During the earlier iterations the best fit parameters change fairly rapidly, but towards the end they remain rather static. These results were obtained with the default strategy. Next we will try an alternative strategy:
The final estimate of pi is not as good, but this might be due either to the change in strategy or the random nature of DE. The algorithm again terminates when it reaches the maximum number of iterations. Let’s see what happens if we increase the allowed number of iterations.
Now that is interesting: it seems that by doing more iterations, the final estimate for pi actually gets worse! Not what one would immediately expect… and certainly not the kind of behaviour that one would anticipate from a deterministic objective function. However, this is simply the nature of the beast when optimising in the presence of noise: inevitably you will find sets of parameters which, when combined with noise, will produce progressively lower values for the objective function.
The algorithm still has not converged, but terminated when it reached the maximum number of iterations. To impose a convergence criterion we need to set both the reltol and steptol parameters. The algorithm will converge when the output from the objective function does not change by the specified relative tolerance. This criterion is only applied after steptol number of iterations.
Simulated Annealing (SA) is an alternative optimisation technique, which is also a probabilistic metaheuristic. SA gradually converges on the global optimum by moving probabilistically between states, where the distance between states slowly declines as the algorithm progresses. In contrast to DE, which considers a population of candidate solutions, SA only evolves a single state. SA is implemented in R as an option in the optim() function.
By default optim() should stop after 10000 iterations, however, I pushed this up by a factor of 100. Despite the large number of iterations, the final set of parameters attained is not particularly good. This, again, is due to the noisy objective function. Every time that the objective function is called it uses a new set of random numbers, so that the objective surface is never fixed. Even calling it with the same set of parameters results in a different outcome! The effects of this are more pronounced for SA (as opposed to DE) due to the fact that it only tracks a single state.
Particle Swarm Optimisation
Particle Swarm Optimisation is quite similar to DE in the sense that it uses a population of particles which move around the optimisation domain. Again this is a metaheuristic.
The Objective Function (without constraint)
There are other methods that will work with a noisy objective function. However, they do not function with the internally imposed constraint. To see how these work we need to remove the constraint code. We will simplify the constraint by asserting that both x and y are greater than or equal to 1 (but this is done in the optimiser and not in the objective function). This is a slightly different problem but has the same solution.
This method is described in Kelley, C. T. (1999). Iterative Methods for Optimization. SIAM.
This converges on a reasonable solution without using too many function evaluations. So it is relatively quick! The final set of parameters could probably be improved by tweaking some of the optional arguments to nmkb(). The result might be used as an initial solution for a more intensive optimisation.
Multi-objective Particle Swarm Optimization with Crowding Distance
This method has the capability to simultaneously optimise multiple objective functions. Although this is not applied here, it is certainly a very useful feature!
mopsocd() actually provides the option of applying a constraint as a separate function. I could not, however, get this to work effectively with this example.
A Cautionary Note
Because these optimisation functions used here rely on a stream of random numbers, you should not consider resetting the random number seed within the objective function. Why would you do this? Well, it is one way to ensure that you get a reproducible value out of the objective function for a particular set of parameters. This might be important for debugging purposes, but it will completely break the optimisation.
Thank you to the helpful folk at stackoverflow who gave some excellent suggestions which contributed to the content of this article.