1. Introduction
The financial crisis of 2007–2008 firmly emphasized the importance of quantifying counterparty credit risk (CCR), which is the risk that the counterparty will default on the obligation and fail to fulfill its contractual agreements. Important indicators used to measure and price CCR include expected exposure (EE), potential future exposure (PFE), and various valuation adjustments (xVAs), which reflect credit, funding, and capital costs related to OTC derivative trading Gregory (2015). Most of these metrics depend on the distribution of the potential future losses resulting from a credit event. Due to the complex nature of these distributions, practitioners resort to numerical methods like Monte Carlo (MC) simulation to approximate the quantities. Typically, this involves scenario generation for the underlying risk factors and subsequent valuation of the contract for each time-step on each path Zhu and Pykhtin (2007). The latter is generally considered the most involved aspect because it needs to be carried out for full portfolios. This poses a major computational challenge to financial institutions. Efficient numerical methods for derivative valuation, both on spot and future simulation dates, are therefore highly relevant.
To address this problem, we extend the concept of (semi-)static replication, which has been extensively studied for, for example, equity derivatives, to interest rate derivatives. A traditional dynamic replication, such as a delta hedge, is achieved by constructing an asset portfolio that is rebalanced continuously through time as the market moves. A static replication on the other hand is an asset portfolio that mirrors the value of the derivative without the need for rebalancing. The weights of the portfolio composition are so to speak static. In this work, we consider a semi-static hedge, which is a replicating portfolio that needs to be updated on only a finite number of instances. Considering a replication of vanilla products instead of the exotic derivative itself can greatly simplify its risk-assessment. Typically, ample machinery is available to analyze vanilla instruments, including closed-form prices and sensitivities.
In the equity world, the static replication problem has been addressed in the literature by, for example, Breeden and Litzenberger (1978), Carr and Bowie (1994), Carr et al. (1999), and Carr and Wu (2014). The main concept is to construct an infinite portfolio of short-dated European options with a continuum of different strike prices. A different but comparable approach is proposed in Derman et al. (1995). Here, a portfolio of European options with a continuum of different maturities is constructed to replicate the boundary and terminal conditions of exotic derivatives, such as knock-out options. The replication of an American-style option is challenging as it involves a time-dependent exercise boundary, giving rise to a free boundary problem. In Chung and Shih (2009), this is addressed by composing a portfolio of European options with multiple strikes and maturities, and, in Lokeshwar et al. (2022), a semi-static hedge is constructed using shallow neural network approximations. However, in the field of interest rate (IR) modeling, this topic has received little attention and the static replication of exotic IR derivatives remains largely an open problem. Where equity options depend on the realization of a stock, IR derivatives depend on the realization of a full term structure of interest rates, leveraging the complexity of the hedge. The articles of Pelsser (2003) and Hagan (2005) are among the few contributions to the literature, treating the static replication of guaranteed annuity options, and CMS swaps, caps, and floors, respectively, with a portfolio of European swaptions.
In this work, we study the replication problem of Bermudan swaptions under an affine term structure model, possibly multi-factor. Bermudan swaptions are a class of exotic interest rate derivatives that are heavily traded in the OTC market. We show that such a contract can be semi-statically replicated by a portfolio of short-maturity options, such as discount bond options. We propose a regress-later approach, which is introduced in Lokeshwar et al. (2022) for callable equity options. In Lokeshwar et al. (2022), the replication method combines the approximation power of artificial neural networks (ANNs) with the computational benefits of regress-later schemes. In traditional regress-now schemes, such as that of Longstaff and Schwartz (2001), sampled realizations of the continuation value are regressed against the realizations of the risk factors at the preceding monitor date. Advanced variations in this algorithm, where the polynomial regression functions are replaced by ANNs, include the work of Kohler et al. (2010), Lapeyre and Lelong (2019), and Becker et al. (2020). In contrast, in regress-later schemes, the sampled realizations of the continuation value are regressed against the realizations of the risk factors at the same date. The continuation value at the preceding monitor date is then obtained by evaluating the conditional expectation of this regression. An analysis and discussion of the benefits of this approach can be found in Glasserman and Yu (2004) and an example of such a scheme is presented in Jain and Oosterlee (2015).
Novel pricing algorithms that replace costly valuation functions with ANN-based approximations have been the subject of many recent papers. An early attempt to approximate option prices in the Black–Scholes model can be attributed to Hutchinson et al. (1994) and dates back to 1994. Since then, a great number of variations in this approach have been investigated. A comprehensive overview of articles devoted to this topic can be found in the literature review of Ruf and Wang (2020). An accessible introduction to neural networks and an application to derivative valuation is, for example, given in the work of Ferguson and Green (2018). A drawback of directly replacing value functions with ANNs is that the method continues to rely on external pricing methodologies to provide input to the training process. In that sense, it can accelerate, but not fully substitute, traditional valuation routines.
Other approaches in the literature consider an indirect use of ANNs and therefore do not depend on classical benchmarks for training. A noteworthy example is the development of deep backward SDE solvers, which, in a financial context, have been introduced by Henry-Labordere (2017). Where the dynamics of financial risk factors are typically captured by forward SDEs, option prices tend to be the solution to backward SDEs. An application to Bermudan swaption valuation is treated in Wang et al. (2018) and a generalization to a CCR management framework is proposed in Gnoatto et al. (2020). Another example is the development of the deep optimal stopping (DOS) algorithm by Becker et al. (2019). They propose an ANN-based method by directly learning the optimal stopping strategy of callable options, without depending on the approximation of continuation values. In the work of Andersson and Oosterlee (2021), the DOS algorithm is applied to compose exposure profiles for Bermudan contracts.
Our contribution to the existing literature is threefold. First, we propose a semi-static replication method for Bermudan swaptions under a multi-factor short-rate model. In the one-factor case, we argue that replication can be achieved with an options portfolio written on a single discount bond. In the multi-factor case, replication can be achieved with an options portfolio written on a basket of discount bonds. As such, we generalize the Black–Scholes-embedded method presented in Lokeshwar et al. (2022) to an interest rate modeling framework. Additionally we propose an alternative ANN design, such that a replication with vanilla options can also be achieved in the multi-factor case (as opposed to basket options). This facilitates highly efficient pricing, which is essential for credit risk applications, such as exposure, VaR, and xVAs, which rely on frequent re-evaluations of the portfolio.
Second, we propose a direct estimator and a lower and an upper bound estimator to the contract’s value, which is implied by the semi-static replication. The lower bound results from applying a non-optimal exercise strategy on an independent set of Monte Carlo paths. The upper bound is based on the dual formulation of Haugh and Kogan (2004) and Rogers (2002), which, in contrast to other work, can be obtained without resorting to expensive nested simulations. We complement the study of Lokeshwar et al. (2022) by deriving analytical error margins to the lower and upper bound estimators. This provides a direct insight toward the approximation quality of the proposed estimators and proves their convergence as the regression errors of the ANNs diminish.
Thirdly, we prove that any desired level of accuracy can be achieved in the replication due to the universal approximating power of ANNs. We support this theoretical result with a range of representative numerical experiments. We demonstrate the pricing accuracy of the proposed algorithm by benchmarking to the established least-square method of Longstaff and Schwartz (2001). The regression error and convergence of the method is presented for different contract specifications. Lastly, we study the replication performance for different ANN designs.
The paper is organized as follows: Section 2 introduces the mathematical setting, describes the modeling framework, and provides the problem formulation. Section 3 provides a thorough introduction to the algorithm, motivates the use and interpretation of neural networks, and treats the fitting procedure. Section 4 introduces the lower bound and upper bound estimates to the true option price. In Section 5, we introduce the error bounds on the direct, lower bound, and upper bound estimates brought forth by the algorithm. We finalize the paper by illustrating the method through several numerical examples in Section 6 and providing a conclusion in Section 7.
2. Mathematical Background
In this section, we describe the general framework for our computations and give a detailed introduction to the Bermudan swaption pricing problem.
2.1. Model Formulation
We consider a continuous-time financial market, defined on finite time horizon . We additionally consider a probability space , which represents all possible states of the economy, and let the filtration represent all information generated by the economy up to time-t. The market is assumed to be frictionless and we ignore any transaction costs.
We let denote the time value of the bank account. Investments in the money market are assumed to compound a continuous, risk-free interest , which we refer to as the short rate. corresponds to the time-t value of a unit of currency invested in the money market at time-zero and we assume it is given by the following expression (see Andersen and Piterbarg 2010a or Brigo and Mercurio 2006):
We denote by the risk-neutral measure equivalent to , which is associated to as the numéraire. Attainable claims denominated by the numéraire are assumed to be martingales under , which guarantees the absence of arbitrage Harrison and Pliska (1981).We assume that the dynamics of the short-rate r are captured by an affine term structure model, in accordance with the set-up introduced in Duffie and Kan (1996) and Dai and Singleton (2000). The short rate itself is therefore considered to be an affine function of a—possibly multi-dimensional—latent factor , i.e.,
(1)
with , denoting a scalar and a vector of time-dependent coefficients, respectively. We furthermore assume that the stochastic process is a bounded Markov process that takes values in , which represents all market influences affecting the state of the short rate. Let the dynamics of be governed by an SDE of the form(2)
where denotes an valued Brownian motion under adapted to the filtration . The measurable functions and are taken to satisfy the standard regularity conditions by which the SDE in Equation (2) admits a strong solution.We let denote the time value of a zero-coupon bond contract that matures at T. A zero-coupon bond guarantees the holder one unit of currency at maturity, i.e., : = 1. Within the class of affine term structure models, zero-coupon bond prices are exponential affine in Andersen and Piterbarg (2010b); Duffie and Kan (1996). Therefore, the value of can be expressed as
where the deterministic coefficients and can be found by solving a system of ODEs, which are of the form of the well-known Riccati equations; see Duffie and Kan (1996) or Filipovic (2009) for details. We consider this framework as it is still intensively used for risk management purposes. High-dimensional models, such as Libor market models, can be intractable for quantifying credit risk for large portfolios, particularly in a multi-currency setting. Multi-factor short-rate models are therefore popular amongst practitioners, providing a solid compromise between modeling flexibility and analytical tractability.For simplicity, we will assume that the collateral rate used for discounting and the instantaneous rate used to derive term rates are both implied by the same short rate . Thus, we consider a classic single-curve model environment. As term rates, we consider simply compounded rates, which we refer to as LIBOR Brigo and Mercurio (2006)
where denotes the year fraction between date t and T.2.2. The Bermudan Swaption Pricing Problem
We consider the pricing problem of a Bermudan swaption. A Bermudan swaption is a contract that gives the holder the right to enter a swap with fixed maturity at a number of predefined monitor dates. Should the holder at any of the monitor dates decide to exercise the option, the holder immediately enters the underlying swap. The lifetime of this swap is assumed to be equal to the time between the exercise date and a fixed maturity date .
As an underlying, we take a standard interest rate swap that exchanges fixed versus floating cashflows. For simplicity, we will assume that the contract is priced in a single-curve framework and that cashflow schemes of both legs coincide, yielding fixing dates and payment dates . However, we stress that the algorithm is applicable to any industry standard contract specifications and is not limited to the simplifying assumptions that are made here. The time fraction between two consecutive dates is denoted as . Let N be the notional and K the fixed rate of the swap. Assuming that the holder of the option exercises at , the payments of the swap will occur at .
We consider the class of pricing problems, where the value of the contract is completely determined by the Markov process in as defined in Section 2. Let be the -measurable function denoting the immediate pay-off of the option if exercised at time . Although the methodology holds for any generalization of the functions , we will consider those in accordance with the contract specifications described above. This means that the functions are assumed to be given by
where the indicator infers a payer and infers a receiver swaption. The swap rate and the annuity are defined in the same fashion as Brigo and Mercurio (2006), given by the expressions where the function F denotes the simply compounded forward rate given by the expression for any . For details, we refer to Brigo and Mercurio (2006).Now, let denote the set of all discrete stopping times with respect to the filtration , taking values on the grid . Define the function as
(3)
In this notation, indicates that the option is not exercised at all. We aim to approximate the time-zero value of the Bermudan swaption, which satisfies the following equation:(4)
Finding the optimal exercise strategy is typically a non-trivial exercise. Numerical approximations for can, however, be computed by considering a dynamical programming formulation as given below, which is shown to be equivalent to (4) in, for example, Glasserman (2013). Let for some and denote by the value of the option, conditioned on the fact that it is not yet exercised prior to t. This value satisfies the equation (see Glasserman 2013)
(5)
We refer to the random variables as the hold or continuation values. They represent the expected value of the contract if it is not being exercised up until t but continues to follow the optimal policy thereafter. Approximations of the dynamic formulation are typically obtained by a backward iteration based on simulations of the underlying risk factors. The objective is then to determine the continuation values as a function of the state of the risk factor . Popular numerical schemes based on regression have been introduced in, for example, Carriere et al. (1996) and Longstaff and Schwartz (2001).Based on approximations of the continuation values, the optimal policy can be computed as follows. Assume that, for a given scenario , the risk factor takes the values . Then, the holder should continue to hold the option if and exercise as soon as . In other words, the exercise strategy can be determined as
Should, for some scenario, the continuation value be bigger than the immediate pay-off for each monitor date, then and the option expires as worthless.3. A Semi-Static Replication for Bermudan Swaptions
The main concept of our method is to construct static hedge portfolios that replicate the dynamical formulation in Equation (5) between two consecutive monitor dates. In this section, we introduce the algorithm for a Bermudan swaption that is priced under a multi-factor affine term structure model. The methodology is inspired by the algorithm presented in Lokeshwar et al. (2022) and utilizes a regress-later technique in which the intermediate option values are regressed against simple IR assets, such as discount bonds. The regression model is chosen deliberately to represent the pay-off of an options portfolio written on these assets. An important consequence is that the hedge can be valued in closed form. Throughout this work, we will use the terms semi-static hedge and semi-static replication interchangeably. A hedge in general refers to a trading strategy that reduces the exposure to market risk of an outstanding position. A replication refers to an asset portfolio that mirrors the value of a derivative, which is a common means to set up a hedge. As we see the efficient valuation properties in the context of credit risk quantification as the main application, rather than actual hedging, we will put emphasis on the term replication.
3.1. The Algorithm
The regress-later algorithm is executed in an iterative manner, backward in time. The outcome is a set of option portfolios written on pre-selected IR assets. To be more precise, the algorithm determines the weights and strikes of each portfolio , such that it closely mirrors the Bermudan swaption after its composition at until its expiry at . The pay-off of exactly meets the cost of composing the next portfolio or the Bermudan’s pay-off in case it is exercised. The methodology yields a semi-static hedging strategy as the portfolio compositions are constant between two consecutive monitor dates. Hence, there is no need for continuous rebalancing, as is the case for a dynamic hedging strategy. The algorithm can roughly be divided into three steps, presented below. Algorithm 1 summarizes the method.
Algorithm 1 The algorithm for a Bermudan swaption |
|
3.1.1. Sample the Independent Variables
We start by sampling N realizations of the risk factor on the time grid . These realizations will serve as an input for the regression data. We will denote the data points as . Different sample methodologies could be used, such as:
Take a standard quadrature grid for each monitor date , associated with the transition density of the risk factor. For example, if has Gaussian dynamics, one could consider the Gauss–Hermite quadrature scaled and shifted in accordance with the mean and variance of . See, for example, Xiu (2010).
Discretize the SDE of the risk factor and sample by the means of an Euler or Milstein scheme. Make sure that a sufficiently coarse time-stepping grid is used, which includes the M monitor dates. See, for example, Kloeden and Platen (2013) for details.
The asset should be a square integrable random variable that is measurable, taking values in .
The risk-neutral price of should only be dependent on the current state of the risk factor and be almost surely unique; that is, the mapping should be continuous and injective. This is required to guarantee a well-defined parametrization of the option value.
3.1.2. Regress the Option Value against an IR Asset
In this phase, we compose replication portfolios by fitting M regression functions . We consider functions of the form , which assign values in to each realization of the selected asset . Fitting is performed recursively, starting at , moving backwards in time, until the first exercise opportunity . Approximations of the Bermudan swaption value at each monitor date serve as the dependent variable. At the final monitor date, the value of the contract (given it has not been exercised) is known to be
Now, assume that, for some monitor date , we have an approximation of the contract value . Let for some denote the vector of the unknown regression parameters. The objective is to determine such that with the smallest possible error. This is carried out by formulating and solving a related optimization problem. In this case, we choose to minimize the expected square error, given by(6)
There is no exact analytical expression available for the expectation of Equation (6). However, it can be approximated using the sampled regression data, giving rise to an empirical loss function L given by(7)
The parameters are then the result of the fitting procedure, such that If the regression model is chosen accordingly, represents the pay-off at of a derivative portfolio written on the selected asset . Details on suggested functional forms of , asset selection for , and fitting procedures are subject of Section 3.2.3.1.3. Compute the Continuation Value
Once the regression is completed, the last step is to compute the continuation value and subsequently the option value at the monitor date preceding . For each scenario , we approximate the continuation value as
(8)
As is chosen to represent the pay-off of a derivative portfolio written on , we argue that computing is in fact equivalent to the risk-neutral pricing of . In other words, we have In Section 3.2, we treat examples for which can be computed in closed form.Finally, the option value at the preceding monitor date is given by
The steps are repeated recursively until we have a representation of the option value at the first monitor date. An estimator of the time-zero option value is given by We refer to this approximation as the direct estimator.3.2. A Neural Network Approach to
In this section, we propose to represent the regression functions as shallow, artificial neural networks. The choices that are presented here are adapted to a framework of Gaussian risk factors, such as that presented in Section 2. The method, however, lends itself to be generalized to a broader class of models by considering an appropriate adjustment to the input or structure.
3.2.1. The 1-Factor Case
First, we discuss the case . Let . As a regression function, we consider a fully connected, feed-forward neural network with one hidden layer, denoted as . The design with only a single hidden layer is graphically represented in Figure 1 and is chosen deliberately to facilitate the network’s interpretation. As an input to the network (the asset ), we select a zero-coupon bond, which pays one unit of currency at .
-
The first layer consists of a single node and corresponds to the discount bond price, which serves as input. It is represented by the left node in Figure 1. The hidden layer has hidden nodes, represented by the center layer in Figure 1. The affine transformation acting between the first two layers is denoted and is of the form
As an activation function acting on the hidden layer, we take the ReLU-function, given by
Note that the ReLU function corresponds to the pay-off function of a European option.
-
The output of the network estimates contract value and therefore takes value in . It is represented by the right node in Figure 1. We consider a linear transformation acting between the second and last layer , given by
On top of that, we apply the linear activation, which comes down to an identity function, mapping x to itself.
Combined together, the network is specified to satisfy
and the trainable parameters can be presented by the list3.2.2. Interpretation of the Neural Network
Now that we have specified the structure of the neural network, we will discuss how each function can be interpreted as a portfolio . In the one-dimensional case, can be expressed as follows:
We can regard this as the pay-off of a derivative portfolio written on the asset . The portfolio contains q derivatives that each have a terminal value equal to . In total, we can recognize four types of products, which depend on the signs of and .If and , we have
which is the pay-off of a forward contract on units in and units of currency.If and , we have
which is the pay-off corresponding to units of a European call option written on , with strike price .If and , we have
which is the pay-off corresponding to units of a European put option written on , with strike price .If and , we have
which clearly represents a worthless contract.
3.2.3. The Multi-Factor Case
In the case , we propose that a basket of d zero-coupon bonds all maturing at different dates is required as input to the regression. If the risk factor space is d-dimensional, it can only be parametrized by an at least d-dimensional asset vector.
To see why the above statement is true, simply consider n bonds and note that the following relation holds:
Since we have that , it follows that if , the image of does not span the whole risk factor space, whereas if , the image of is still equal to the case .Concluding on the argument above, it would be an obvious choice to take a -dimensional vector of bonds as the input and generalize the architecture of by increasing the input dimension (i.e., the number of nodes in the first layer) from 1 to d. However, in that case, represents a derivatives portfolio written on a basket of bonds, by which the tractability of pricing would be lost. Therefore, we suggest two alternatives to the design of , intended to preserve the analytical valuation potential of .
The basic specifications of the neural network will remain similar to the one-factor case. We consider a feed-forward neural network with one hidden layer of the form .
-
The first layer consists of d nodes and the hidden layer has hidden nodes. The affine transformation and activation acting between the first two layers are denoted and , respectively, given by
-
The output contains a single node. A linear transformation acts between the second and last layer , together with the linear activation, given by
-
The network is given by .
3.2.4. Suggestion 1: A Locally Connected Neural Network
The outcome of each node in the hidden layer represents the terminal value of a derivative written on the asset , which, together, compose the portfolio . In the -dimensional case, the outcome of the node can be expressed as
which corresponds to the pay-off of an arithmetic basket option with weights and strike price . Such an exotic option is difficult to price. To overcome this issue, we constrain the matrix to only admit a single non-zero value in each row. The architecture of this suggestion is graphically depicted in Figure 2a. Let the number of hidden nodes be a multiple of the input dimension, i.e., for some . The matrix is set to be of the formAs a result, none of the hidden nodes are connected to more than one input node (see Figure 2a). Therefore, the outcome of each node again represents a European option or forward written on a single bond, which can be priced in closed form (see Appendix A.1).
We can recognize two drawbacks to this approach. First, the number of trainable parameters for a fixed number of hidden nodes is much lower compared to the fully connected case. This can simply be overcome by increasing q. Second, as the network is not fully connected, the universal approximation theorem no longer applies to . Therefore, we have no guarantee that the approximation errors can be reduced to any desirable level. Our numerical experiments however indicate that the approximation accuracy of this design is not inferior to that of a fully connected counterpart of the same dimensions; see Section 6.
3.2.5. Suggestion 2: A Fully Connected Neural Network
Our second approach does not entail altering the structure or weights of the network, but suggests to take a different input. We hence consider a fully connected feed-forward neural network with one hidden layer of the form . The architecture is graphically depicted in Figure 2. As a consequence, each hidden node is connected to each input node. However, as an input, we use the log of n bonds, i.e.,
Therefore, each node can be compared to the pay-off of a geometric basket option written on n assets equal to the log of . Under the assumption that the dynamics of the risk factor are Gaussian, these options can be priced explicitly as we will show in Appendix A.2.An advantage of this approach is that it employs a fully connected network that, by virtue of the universal approximation theorem Hornik et al. (1989), can yield any desired level of accuracy. A drawback is that the financial interpretation of the network as a replicating portfolio is not as strong as in suggestion 1 due to the required log in the payoff.
3.3. Training of the Neural Networks
In this section, we specify some of the main considerations related to the fitting procedure of the algorithm. The method requires the training of M shallow feed-forward networks as specified in Section 3.2, which we denote . Our numerical experiments indicated that the normalization of the training set strongly improved the networks’ fitting accuracy. Details for pre-processing the regression data are treated in Appendix B.
Optimization
The training of each network is performed in an iterative process, starting with working backwards until . The effectiveness of the process depends on several standard choices related to neural network optimization, of which some are listed below.
As an optimizer, we apply AdaMax Kingma and Ba (2014), a variation of the commonly used Adam algorithm. This is a stochastic, first-order, gradient-based optimizer that updates weights inversely proportional to the -norm of their current and past gradient, whereas Adam is based on the -norm. Our experiments indicate that AdaMax slightly outperforms comparable algorithms in the scope of our objectives.
The batch size, i.e., the number of training points used per weight update, is set to a standard 32. The learning rate, which scales the step size of each update, is kept in the range 0.0001–0.0005.
For the initial network, , we use random initialization of the parameters. If the considered contract is a payer Bermudan swaption, we initialize the (non-zero) entries of i.i.d. unif and the biases i.i.d. unif. In the case of a receiver contract, it is the other way around. The weights are initialized i.i.d. unif.
For the subsequent networks, , each network is initialized with the final set of weights of the previous network .
As a training set for the optimizer, we use a collection of 20,000 data-points.
4. Lower and Upper Bound Estimates
The algorithm described in Section 3.1 gives rise to a direct estimator of the true option price V. The accuracy of this estimator depends on the approximation performance of the neural networks at each monitor date. Should each regression yield a perfect fit, then the estimation error would automatically be zero. In practice, however, the loss function, defined in Equation (7), never fully converges to zero. As the networks are trained to closed-form exercise and continuation values, error measures such as MSE and MAE can be easily obtained. In particular, the mean absolute errors provide a strong indication of the error bounds on the direct estimator (see Section 5).
Although convergence errors put solid bounds on the accuracy of the estimator, they are typically quite loose. Therefore, they give rise to non-tight confidence bounds. To overcome this issue, we introduce a numerical approximation to a tight lower and upper bound to the true price, in the same spirit as Lokeshwar et al. (2022). These should provide a better indication of the quality of the estimate.
4.1. The Lower Bound
We compute a lower bound approximation by considering the non-optimal exercise strategy implied by the continuation values estimates introduced in Section 3.1. We define as
(9)
where refers to the approximated continuation value given in Equation (8). A strict lower bound is now given by(10)
where corresponds to the definition given in Equation (3). The term on the right is obtained by changing the measure from to the forward measure Geman et al. (1995). Under the forward measure, the lower bound can be estimated by simulating a fresh set of scenarios of the risk factor . Denote by the zero-coupon bond realization corresponding to . Then, the lower bound cab be approximated as4.2. The Upper Bound
We compute an upper bound by considering a dual formulation of the price expression Equation (4) as proposed in Haugh and Kogan (2004) and Rogers (2002). Let denote the set of all martingales adapted to such that . An upper bound to the true price is obtained by observing that the following inequality holds (see Haugh and Kogan 2004):
(11)
for any . To find a suitable martingale that yields a tight bound, we consider the Doob–Meyer decomposition of the true discounted option price process . As the price process is a supermartingale, we can write where denotes a martingale and is a predictable, strictly decreasing process such that . Note that Equation (11) attains an equality if we set , i.e., the martingale part of the option price process. The bound will hence be tight if we consider a martingale that is close to the unknown . Let denote the neural networks induced by the algorithm. In the spirit of Andersen and Broadie (2004) and Lokeshwar et al. (2022), we construct a martingale on the discrete time grid as follows:(12)
Clearly, the process yields a discrete martingale as Furthermore, the process as defined above will coincide with if the approximation errors in equal zero, hence yielding an equality in Equation (11). Note that the recursive relation in Equation (12) can be rewritten as(13)
We can now estimate the upper bound by again simulating a set of scenarios of the risk factor and approximate under the risk-neutral measure as The upper bound can be approximated under the forward measure. In that case, the risk factor should be simulated under and the numéraire should be replaced by . By carrying this out, we avoid the need to approximate the numéraire on a coarse simulation grid.Note that by the deliberate choice of , all the conditional expectations appearing in Equation (13) can be computed in closed form (see Appendix A). Hence, there is no need to resort to nested simulations, in contrast to, for example, Andersen and Broadie (2004) and Becker et al. (2020). Especially if simulations are performed under the forward measure, both lower and upper bound estimations can be obtained at minimal additional computational cost.
5. Error Analysis
In this section, we analyze the errors of the semi-static hedge, the direct estimator, the lower bound estimator, and the upper bound estimator, which are induced by the imprecision of the regression functions . We show that for a sufficiently large hedging portfolio, the replication error will be arbitrarily small. Furthermore, we will provide error margins for the price estimators in terms of the regression imprecision. We thereby show that the direct estimator, lower bound, and upper bound will converge to the true option price as the accuracy of the regressions increases. The cornerstone to the subsequent theorems is the universal approximation theorem, as presented in, for example, Hornik et al. (1989). Given that is a continuous function on the compact set , it guarantees that, for each , there exists a neural network such that
for arbitrary . In other words, the regression error can be kept arbitrarily small on any compact domain of the risk factor.5.1. Accuracy of the Semi-Static Hedge
Let denote the set of monitor dates. For the following theorem, we assume that for some compact set . As can be arbitrarily large, this assumption is loose enough to account for a vast majority of the risk factor scenarios in a standard Monte Carlo sample. On top of that, can be chosen as sufficiently large such that approaches zero. For the proof, we refer to Appendix D.
Let and . Denote by the value of the replication portfolio for a Bermudan swaption, conditional on the fact that it is not exercised prior to time t. Assume that there exist M networks such that
Then, for any , we have that
5.2. Error of the Direct Estimator
Theorem 1 bounds the hedging error of the semi-static hedge in terms of the maximum regression errors. This implicitly provides an error margin to the direct estimator under the aforementioned assumptions. Although the universal approximation theorem guarantees that the supremum errors can be kept at any desired level, in practice, they are substantially higher than, for example, the MSEs or MAEs of the regression function. This is due to inevitable fitting imprecision outside or near the boundaries of the finite training sets. In the following theorem, we propose that the error of the direct estimator can be bounded in terms of the discounted MAEs of the neural networks. These quantities are generally much tighter than the supremum errors and are typically easier to estimate.
The proof of the theorem follows a similar line of thought as the proof of Theorem 1. As the direct estimator at time-zero depends on the expectation of the continuation value at , we can show by an iterative argument that the overall error is bounded by the sum of the mean absolute fitting errors at each monitor date. The error bound in the direct estimator therefore scales linearly with the number of exercise opportunities. For a complete proof, we refer to Appendix E.
Let and assume that . Denote by the time-zero direct estimator for the price of a Bermudan swaption V. Assume that, for each , there is a neural network approximation such that
where denotes the estimator at date . Then, the error in is bounded as given below:
5.3. Tightness of the Lower Bound Estimate
A lower bound to the true price can be computed by considering the non-optimal exercise strategy, implied by the direct estimator (see Section 4.1). This relies on the stopping time
(14)
In the following theorem, we propose that the tightness of can be bounded by the discounted MAEs of neural network approximations.The proof of the theorem relies on the fact that, conditioned on any realization of and , the expected difference between and is bounded by the sum of the mean absolute fitting errors at the monitor dates between and . In the proof, we therefore distinguish between the events and . Then, by an inductive argument, we can show that the bound on the spread between and the true price scales linearly with the number of exercise opportunities. For a complete proof, we refer to Appendix F.
Let and assume that . Denote by the lower bound on the true Bermudan swaption price as defined in Equation (10). Assume that, for each , there is a neural network approximation , such that
where denotes the estimator at date . Then, the spread between and is bounded as given below:
5.4. Tightness of the Upper Bound Estimate
An upper bound to the true price can be computed by considering a dual formulation of the dynamic pricing equation Haugh and Kogan (2004); see Section 4.2. From a practical point of view, the difference between the upper bound and the true price can be interpreted as the maximum loss that an investor would incur due to hedging imprecision resulting from the algorithm Lokeshwar et al. (2022). The overall hedging error at some monitor date is the result of all incremental hedging errors occurring from rebalancing the portfolio at preceding monitor dates. As the incremental hedging errors can be bounded by the sum of the expected absolute fitting errors, we propose that the tightness of can be bounded by the discounted MAEs of the neural networks and scales at most quadratically with the number of exercise opportunities.
The proof follows a similar line of thought as that presented in Andersen and Broadie (2004). There, it is noted that the difference between the dual formulation of the option and its true price is difficult to be bound. Here, we make a similar remark and propose a theoretical maximum spread between and that is relatively loose. Our numerical experiments, however, indicate that the upper bound estimate is much tighter in practice. For a complete proof, we refer to Appendix G.
Let and assume that . Denote by the upper bound on the true Bermudan swaption price as defined in Equation (11). Assume that, for each , there is a neural network approximation , such that
where denotes the estimator at date . Then, the spread between and is bounded as given below:
6. Numerical Experiments
In this section, we treat several numerical examples to illustrate the convergence, pricing, and hedging performance of our proposed method. We will start by considering the price estimate of a vanilla swaption contract in a one-factor model. This is a toy example by which we can demonstrate the accuracy of the direct estimator in comparison to exact benchmarks. We continue with price estimates of Bermudan swaption contracts in a one-factor and a two-factor framework. The performance of the direct estimator will be compared to the established least-square regression method (LSM) introduced in Longstaff and Schwartz (2001), fine-tuned to an interest rate setting as described in Oosterlee et al. (2016). Additionally, we will approximate the lower and upper bound estimates as described in Section 4 and show that they are well inside the error margins introduced in Section 5. Finally, we will illustrate the performance of the static hedge for a swaption in a one-factor model and a Bermudan swaption in a two-factor model. For the one-factor case, we can benchmark the performance by the analytic delta hedge for a swaption, provided in Henrard (2003).
A contract (either European swaption or Bermudan swaption) refers to an option written on a swap with a notional amount of 100 and a lifetime between and . This means that and are the first and last monitor dates, respectively, in case of a Bermudan. The underlying swaps are set to exchange annual payments, yielding year fractions of 1 and annual exercise opportunities. All examples that are illustrated here have been implemented in Python using the Quant-Lib library Ametrano and Ballabio (2003) for standard pricing routines and Keras with Tensorflow backend Chollet et al. (2015) for constructing, fitting, and evaluating the neural networks.
6.1. 1-Factor Swaption
We start by considering a swaption contract under a one-dimensional risk factor setting. The direct estimator of the true swaption price is computed similar to a Bermudan swaption, but with only a single exercise possibility at . Therefore, only a single neural network per option needs to be trained to compute the option price. We have used 64 hidden nodes and 20,000 training points, generated through Monte Carlo sampling. We assume the risk factor to be captured by the Hull–White model with constant mean reversion parameter a and constant volatility . The dynamics of the shifted mean-zero process Brigo and Mercurio (2006) are hence given by
(15)
For simplicity, we consider a flat time-zero instantaneous forward rate . The risk-neutral scenarios are generated using a discrete Euler scheme of the process above. Parameter values that were used in the numerical experiments are summarized in Table 1.Figure 3a,b show the time-zero option values in basis points (0.01%) of the notional for a and a payer swaption as a function of the moneyness. The moneyness is defined as , where K denotes the fixed strike and S the time-zero swap rate associated with the underlying swap. The exact benchmarks are computed by an application of Jamshidian’s decomposition Jamshidian (1989). The relative estimate errors are shown in Figure 3c,d. We observe a close agreement between the estimates and the reference prices. The errors are in the order of several basis points of the true option price. In the current setting, the results presented serve mostly as a validation of the estimator. We however point out that this algorithm for swaptions is applicable in general frameworks, such as multi-factor, dual-curve, or non-overlapping payment schemes, for which exact routines are no longer available.
6.2. 1-Factor Bermudan Swaption
As a second example, we consider a Bermudan swaption contract. The same dynamics for the underlying risk factor are assumed as discussed in the previous paragraph, using the parameter settings of Table 1. Monte Carlo scenarios are generated based on a discretized Euler scheme associated to the SDE in Equation (15), taking weekly time-steps.
We first demonstrate the convergence property of the direct estimator, which is implied by the replication portfolio. We consider a Bermudan swaption with strike . This strike is selected as it close to ATM, a moneyness level that is most likely to be liquid in the market. For this analysis, the neural networks were trained to a set of 2000 Monte-Carlo-generated training points. Figure 4a shows the direct estimator as a function of the number of hidden nodes in each neural network, alongside an LSM-based benchmark. In Figure 4b, the error with respect to the LSM estimate is shown on a logscale. We observe that the direct estimator converges to the LSM confidence interval or slightly above, which is in accordance with the fact that LSM is biased low by definition. The analysis indicates that a portfolio of 16 discount bond options is sufficient to achieve a replication of a similar accuracy to the LSM benchmark.
Table 2 depicts numerical pricing results for a , and receiver Bermudan swaption. For each contract, we consider different levels of moneyness, setting the fixed rate K of the underlying swap to, respectively, 80%, 100%, and 120% of the time-zero swap rate. The estimations of the direct, the upper bound, and the lower bound statistics are again reported alongside LSM-based benchmarks. Here, the neural networks have 64 hidden nodes and are fitted using a training set of 20,000 points. The lower and upper bound estimates, as well as the LSM estimates, are based on simulation runs of 200,000 paths each. The given lower and upper bounds are Monte Carlo estimates of the statistics defined in Equations (10) and (11) and are therefore subject to standard errors, which are reported in parentheses. The reference LSM results have been generated using as regression basis functions for approximating the continuation values. The standard errors and confidence intervals are obtained from ten independent Monte Carlo runs. The choice for hyperparameter settings is motivated by the analysis of Appendix C.
The spreads between the lower and upper bound estimates provide a good indication of the accuracy of the method. For the current setting, we obtain spreads in the order of several basis points up a few dozen of basis points. The lower bound estimate is typically very close to the LSM estimate, which itself is also biased low. Their standard errors are of the same order of magnitude. The upper bound estimates prove to be very stable and show a variance that is roughly two orders of magnitude smaller compared to that of the lower bound. The direct estimate is occasionally slightly less accurate. This can be explained by the fact that it depends on the accuracy of the regression over the full domain of the risk factor, whereas, for the lower bound, only a high accuracy near the exercise boundaries is required. In Figure 5, the mean absolute error of each neural network after fitting is presented as a function of the network’s index. The errors are displayed in basis points of the notional. We observe that the errors are the smallest at maturity and tend to increase with each iteration backward in time. That the errors at the final monitor date are virtually zero can be explained by the fact that the pay-off at is given by
which can be exactly captured by a network with only a single hidden node. With each step backwards, the target function is harder to fit, yielding larger errors. We observe MAEs up to one basis point of the notional amount. The empirical lower–upper bound spreads remain well within the theoretical error margins provided in Section 4.1 and Section 4.2. The spreads are mostly much lower than the sum of the MAEs, indicating that the bound estimates are in practice significantly tighter than their theoretical maximum spread.6.3. 2-Factor Bermudan Swaption
As a final pricing example, we consider a Bermudan swaption contract under a two-factor model. The dynamics of the underlying risk factors are assumed to follow a G2++ model Brigo and Mercurio (2006). Monte Carlo scenarios are generated based on a discretized Euler scheme, taking weekly time-steps, based on the SDE below:
where and are correlated Brownian motions with . Parameter values that were used in the numerical experiments are summarized in Table 3.We again start by demonstrating the convergence property of the direct estimator for both the locally connected and the fully connected neural network designs as specified in Section 3.2.3. The same Bermudan swaption with strike is used and the networks are each fitted to a set of 6400 training points. Figure 6a shows the direct estimator as a function of the number of hidden nodes in each neural network, alongside an LSM-based benchmark. In Figure 6b, the error with respect to the LSM estimate is shown on a logscale. We observe a similar convergence behavior, where the direct estimators approach the LSM benchmark within the 95% confidence range. Here, it is noted that a portfolio of eight discount bond options is already sufficient to achieve a replication of a similar accuracy to the LSM estimator.
In Table 4, numerical results for a , , and receiver Bermudan swaption are depicted for different levels of moneyness. We again report the direct, the upper bound, and the lower bound estimates for both neural network designs. In this case, all networks have 64 hidden nodes and are fitted to training sets of 20,000 points. As before, the lower bound, the upper bound, and the LSM estimates are the result of 10 independent Monte Carlo simulations of 200,000 scenarios.
For the LSM algorithm, we used as basis functions. Note that the number of monomials grows quadratically with the dimension of the state space and, with that, the number of free parameters. For our method, this number grows at a linear rate. Choices for the hyperparameters are again based on the analysis of Appendix C. The results under the two-factor case share several features with the one-factor results. We observe spreads between the lower and upper bounds ranging from several basis points up to a few dozen basis points of the option price. The lower bound estimates turn out to be very close to the LSM estimates and the same holds for their standard errors. The upper bounds are again very stable with low standard errors and the direct estimator appears as slightly less accurate. If we compare the locally connected to the fully connected case, we observe that the results are overall in close agreement, especially the lower and upper bound estimates. This is remarkable given that the fully connected case gives rise to more trainable parameters, by which we would expect a higher approximation accuracy. In the two-factor setting, the ratio of free parameters for the two designs is 3:4.
In Figure 7, the mean absolute errors of the neural networks after fitting are shown. The MAEs for the locally connected networks are in blue; the fully connected are in red. All are represented in basis points of the notional amount. We observe that the errors are mostly in the same order of magnitude as the one-dimensional case. The figures indicate that the locally connected networks slightly outperform the fully connected networks in terms of accuracy, although this does not appear to materialize in tighter estimates of the lower and upper bounds. For the locally connected case, we again observe that the errors are virtually zero at the last monitor date, for the same reasons as in the one-factor setting. In the fully connected representation, an exact replication might not exist, resulting in larger errors. We conjecture that this effect partially carries over to the networks at preceding monitor dates. The empirical lower–upper bound spreads remain well within the theoretical error margins, as the spreads are in all cases lower than the sum of the MAEs. Hence, also for the two-factor setting, we find that the bound estimates are tighter in practice than their theoretical maximum spreads.
6.4. Performance Semi-Static Hedge
Finally, we consider the hedging problem of a vanilla swaption under the one-factor model and a Bermudan swaption under the two-factor model.
6.4.1. 1-Factor Swaption
Here, we compare the performance of a static hedge versus a dynamic hedge in the one-factor model. As an example, we take a European receiver swaption at different levels of moneyness. The model set-up is similar to that in Section 6.2, using the same set of parameters reported in Table 1. In the static hedge case, the option contract writer aims to hedge the risk using a static portfolio of zero-coupon bond options and discount bonds. The replicating portfolio is composed using a neural network with 64 hidden nodes, optimized using 20,000 training-points generated through Monte Carlo sampling. The portfolio is composed at time-zero and kept until the expiry of the option at year. In the dynamic hedge case, the delta-hedging strategy is applied. The replicating portfolio is composed of units of the underlying forward-starting swap and investment in the money market. The dynamic hedge involves the periodic rebalancing of the portfolio. The delta for a receiver swaption under the Hull–White model (see Henrard 2003) is given by
(16)
where is the solution of and where denotes the CDF of a standard normal distribution, for , and . The function denotes the instantaneous volatility of a discount bond maturing at T, which, under Hull–White, is given by . We validated the analytic expression above with numerical approximations of the Delta obtained by bumping the yield curve. Within the simulation, the dynamic hedge portfolio is rebalanced on a daily basis between time-zero and expiry of the option. In this experiment, that means it is updated on 255 instances at equidistant monitor dates.The performance of both hedging strategies is reported in Table 5. The results are based on 10,000 risk-neutral Monte Carlo paths. The hedging error refers to the difference between the option’s pay-off at expiry and the replicating portfolio’s final value. The quantities are reported in basis points of the notional amount. The empirical distribution of the hedging error is shown in Figure 8. We observe that, overall, the static hedge outperforms the dynamic hedge in terms of accuracy, even though it involves only a quarter (64 versus 255) of the trades. Although it is not visible in Figure 8b, the static strategy does give rise to occasional outliers in terms of accuracy. These are associated with scenarios that reach or exceed the boundary of the training set. These errors are typically of a similar order of magnitude as the errors observed in the dynamic hedge. The impact of outliers can be reduced by increasing the training set and thereby broadening the regression domain.
6.4.2. 2-Factor Bermudan Swaption
Here, we demonstrate the performance of the semi-static hedge for a receiver Bermudan swaption under a two-factor model. We compare the accuracy of the hedging strategy utilizing a locally connected network versus a fully connected neural network. In the former, the replication portfolio consists of zero-coupon bonds and zero-coupon bond options. In the latter, the Bermudan is replicated with options written on hypothetical assets with a pay-off equal to the log of a zero-coupon bond (see Section 3.2.3). The model set-up is similar to that in Section 6.3, using the same set of parameters reported in Table 3. Both networks are composed with 64 hidden nodes and optimized using 20,000 training points generated through Monte Carlo sampling. The portfolio is set up at time-zero and updated at each monitor date of the Bermudan until it is either exercised or expired. We assume that the holder of the Bermudan swaption follows the exercise strategy implied by the algorithm, i.e., the option is exercised as soon as . When a monitor date is reached, the replication portfolio matures with a pay-off equal to . In case the Bermudan is continued, the price to set up a new replication portfolio is given by , which contributes to the hedging error. In case the Bermudan is exercised, the holder will claim , which also contributes to the hedging error. The total error of the semi-static hedge (HE) is therefore computed as
where denotes the direct estimator at date and denotes the stopping time, as defined in Equation (9).The performance of the strategies related to locally and fully connected neural networks is reported in Table 6. The results are based on 10,000 risk-neutral Monte Carlo paths and reported in basis points of the notional amount. The empirical distribution of the hedging error is shown in Figure 9. We observe that both approaches yield an accuracy in the same order of magnitude, although the locally connected case slightly outperforms the fully connected case. This is in line with expectations, as the fitting performance of the locally connected networks is generally higher. For similar reasons to the one-factor case, the hedging experiments give rise to occasional outliers in terms of accuracy. These outliers can be in the order of several dozens of basis points. Again, the impact of outliers can be reduced by broadening the regression domain.
7. Conclusions
In this paper, we have proposed a semi-static replication algorithm for Bermudan swaptions under an affine term structure model. We have shown that Bermudan swaptions, an exotic interest rate derivative that is heavily traded in the OTC market, can be semi-statically replicated with an options portfolio written on a basket of discount bonds. The static portfolio composition is obtained by regressing the target option’s value using a shallow, artificial neural network. The choice of the regression basis functions are motivated by their representation of an option’s portfolio pay-off, implying an interpretable neural network structure. Leveraging the approximating power of ANNs, we proved that the replication can achieve any desired level of accuracy given that the portfolio is sufficiently large. We derived a direct estimator of the contract price, and an upper bound and lower bound estimate to this price can be computed at minimal additional computational cost.
The algorithm we presented is inspired by the work of Lokeshwar et al. (2022), which proposes a semi-static replication approach for callable equity options embedded in the Black–Scholes model. We contribute to the literature by extending the concept of (semi-)static replication to the field of interest rate modeling. Next, to a direct, lower bound, and upper bound estimator, we have derived analytical error margins for these statistics. This proves their convergence as the regression error diminishes and provides a direct insight toward the accuracy of the estimates. Additionally, we propose an alternative ANN design, which constrains the replication into a portfolio of vanilla bond options, even in the case of a multi-factor model. This guarantees efficiency in the portfolio valuation, which is key to many applications in credit risk management.
The performance of the method was demonstrated through several numerical experiments. We focused on Bermudan swaptions under a one- and two-factor model, which are popular amongst practitioners. The pricing accuracy of the method was determined through a benchmark to the established least-square method of Longstaff and Schwartz (2001). This reference is approached with basis point precision. A convergence analysis showed that a portfolio of 16 bond options suffices in achieving a replication with a similar accuracy to the LSM. Finally, the replication performance was studied through an in-model hedging experiment. This showed that the semi-static hedge outperforms a traditional dynamic replication in terms of hedging error.
As a look-out for further research, we consider applying the algorithm to the computation of credit risk measures and various value adjustments (xVAs). These metrics typically rely on generating forward value and sensitivity profiles of (exotic) derivative portfolios. We see the semi-static replication approach combined with the simple error analysis as an effective tool to address the computational challenges associated with these risk measures. The performance of the method in the context of quantifying CCR will therefore be studied in a forthcoming companion paper.
Conceptualization, J.H., S.J. and D.K.; Formal analysis, J.H., S.J. and D.K.; Investigation, J.H.; Writing—original draft, J.H.; Writing—review and editing, S.J. and D.K.; Visualization, J.H.; Supervision, S.J. and D.K.; Project administration, D.K. All authors have read and agreed to the published version of the manuscript.
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
The authors declare no conflict of interest.
The opinions expressed in this work are solely those of the authors and do not represent in any way those of their current and past employers.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 2. Suggested neural network designs for [Forumla omitted. See PDF.]. (a) Locally connected neural network. (b) Fully connected neural network.
Figure 3. Accuracy of the direct estimator for vanilla swaptions. [Forumla omitted. See PDF.].
Figure 4. Convergence of the direct estimator for the [Forumla omitted. See PDF.] Bermudan swaption price as a function of hidden node count, with respect to the LSM benchmark under a 1-factor model.
Figure 5. Mean absolute errors of neural network fit per monitor date under a 1-factor model.
Figure 6. Convergence of the direct estimator for the 1Y × 5Y Bermudan swaption price as a function of hidden node count, with respect to the LSM benchmark under a 2-factor model.
Figure 7. Accuracy of neural network fit per monitor date under a 2-factor model. Blue lines represent the locally connected (l.c.) case and the red lines represent the fully connected (f.c.) case. The legend in Figure (c) applies to all three graphs.
Figure 8. Hedge error distribution for a [Forumla omitted. See PDF.] receiver swaption, based on [Forumla omitted. See PDF.] MC paths. [Forumla omitted. See PDF.].
Figure 9. Hedge error distribution for a [Forumla omitted. See PDF.] receiver Bermudan swaption, based on [Forumla omitted. See PDF.] MC paths. [Forumla omitted. See PDF.].
Parameters 1F Hull–White model.
Parameter | a |
|
|
---|---|---|---|
Value | 0.01 | 0.01 | 0.03 |
Results of 1-factor model.
Type | K/S | Dir.est. | Lower bnd | Upper bnd | UB-LB | LSM est. | LSM 95% CI |
---|---|---|---|---|---|---|---|
1Y × 5Y | 80% | 1.527 | 1.521 (0.001) | 1.528 (0.000) | 0.007 | 1.521 (0.001) | [1.518, 1.523] |
100% | 2.543 | 2.534 (0.002) | 2.542 (0.000) | 0.008 | 2.534 (0.002) | [2.531, 2.538] | |
120% | 4.015 | 4.016 (0.002) | 4.018 (0.000) | 0.002 | 4.016 (0.002) | [4.012, 4.021] | |
3Y × 7Y | 80% | 3.296 | 3.293 (0.002) | 3.295 (0.000) | 0.002 | 3.293 (0.002) | [3.290, 3.296] |
100% | 4.767 | 4.755 (0.004) | 4.761 (0.000) | 0.006 | 4.755 (0.004) | [4.747, 4.762] | |
120% | 6.625 | 6.629 (0.004) | 6.631 (0.000) | 0.002 | 6.629 (0.004) | [6.621, 6.638] | |
1Y × 10Y | 80% | 3.950 | 3.945 (0.005) | 3.960 (0.000) | 0.015 | 3.945 (0.005) | [3.935, 3.955] |
100% | 5.818 | 5.811 (0.003) | 5.818 (0.000) | 0.007 | 5.811 (0.003) | [5.805, 5.816] | |
120% | 8.346 | 8.354 (0.005) | 8.360 (0.000) | 0.006 | 8.353 (0.005) | [8.344, 8.362] |
Parameters 2F G2++ model.
Parameter |
|
|
|
|
|
|
---|---|---|---|---|---|---|
Value | 0.07 | 0.08 | 0.015 | 0.008 | −0.6 | 0.03 |
Results of 2-factor model for the locally connected and fully connected neural network cases.
|
|||||||
Type | K/S | Dir.est. | Lower bnd | Upper bnd | UB-LB | LSM est. | LSM 95% CI |
1Y × 5Y | 80% | 1.617 | 1.617(0.002) | 1.619(0.000) | 0.002 | 1.617(0.002) | [1.614, 1.621] |
100% | 2.652 | 2.650(0.002) | 2.654(0.000) | 0.004 | 2.650(0.002) | [2.646, 2.654] | |
120% | 4.128 | 4.127(0.003) | 4.131(0.000) | 0.004 | 4.127(0.003) | [4.121, 4.132] | |
3Y × 7Y | 80% | 3.073 | 3.076(0.004) | 3.078(0.000) | 0.002 | 3.077(0.004) | [3.069, 3.085] |
100% | 4.554 | 4.553(0.004) | 4.553(0.000) | 0.000 | 4.552(0.004) | [4.545, 4.559] | |
120% | 6.444 | 6.448(0.004) | 6.451(0.000) | 0.003 | 6.446(0.005) | [6.435, 6.456] | |
1Y × 10Y | 80% | 3.616 | 3.624(0.002) | 3.626(0.000) | 0.002 | 3.622(0.002) | [3.618, 3.627] |
100% | 5.508 | 5.509(0.002) | 5.514(0.000) | 0.005 | 5.508(0.002) | [5.503, 5.512] | |
120% | 8.128 | 8.123(0.005) | 8.130(0.000) | 0.007 | 8.121(0.005) | [8.110, 8.132] | |
|
|||||||
Type | K/S | Dir.est. | Lower bnd | Upper bnd | UB-LB | LSM est. | LSM 95% CI |
1Y × 5Y | 80% | 1.617 | 1.617(0.002) | 1.619(0.000) | 0.002 | 1.617(0.002) | [1.614, 1.621] |
100% | 2.651 | 2.650(0.002) | 2.654(0.000) | 0.004 | 2.650(0.002) | [2.646, 2.654] | |
120% | 4.129 | 4.127(0.003) | 4.131(0.000) | 0.004 | 4.127(0.003) | [4.121, 4.132] | |
3Y × 7Y | 80% | 3.076 | 3.077(0.004) | 3.078(0.000) | 0.001 | 3.077(0.004) | [3.069, 3.085] |
100% | 4.553 | 4.553(0.004) | 4.554(0.000) | 0.001 | 4.552(0.004) | [4.545, 4.559] | |
120% | 6.451 | 6.447(0.005) | 6.451(0.000) | 0.004 | 6.446(0.005) | [6.435, 6.456] | |
1Y × 10Y | 80% | 3.616 | 3.624(0.002) | 3.626(0.000) | 0.002 | 3.622(0.002) | [3.618, 3.627] |
100% | 5.506 | 5.509(0.002) | 5.514(0.000) | 0.005 | 5.508(0.002) | [5.503, 5.512] | |
120% | 8.124 | 8.123(0.005) | 8.130(0.000) | 0.007 | 8.121(0.005) | [8.110, 8.132] |
Hedging errors for static and dynamic hedging strategy for a
Hedge Error (bps) | K/S | Static Hedge | Dyn. Hedge |
---|---|---|---|
Mean | 80% |
|
|
100% |
|
|
|
120% |
|
|
|
St. dev. | 80% |
|
|
100% |
|
|
|
120% |
|
|
|
95%-percentile | 80% |
|
|
100% |
|
|
|
120% |
|
|
Hedging errors of the semi-static hedging strategy for a
Hedge Error (bps) | K/S | Loc. conn. NN | Fully conn. NN |
---|---|---|---|
Mean | 80% |
|
|
100% |
|
|
|
120% |
|
|
|
St. dev. | 80% |
|
|
100% |
|
|
|
120% |
|
|
|
95%-percentile | 80% |
|
|
100% |
|
|
|
120% |
|
|
Appendix A. Evaluation of the Conditional Expectation
In this section, we will explicitly compute the conditional expectations related to the continuation values. We will distinguish two approaches associated with the two proposed network structures, i.e., the locally connected case (suggestion 1) and the fully connected case (suggestion 2).
For ease of computation, we will use a simplified, yet equivalent representation of the risk factor dynamics discussed in
Appendix A.1. The Continuation Value with Locally Connected NN
We consider the network
The map
If, on the other hand,
In the expression above, the function
Appendix A.2. The Continuation Value with Fully Connected NN
Once again, we consider the network
Let
Appendix B. Pre-Processing the Regression-Data
A procedure that significantly improves the fitting performance of the neural networks is the normalization of the training data. The linear rescaling of the input to the optimizer is a common form of data pre-processing
Another argument for pre-processing the input is that large data values typically induce large weights. Large weights can lead to exploding network outputs in the feed-forward process
In practice, we propose the following rescaling of the data. Denote by
The locally connected NN case: Consider the outcome of the
hidden node and denote the input of the network as . Then, , where k is the index of the only non-zero entry of , the row of weight matrix . The transformation implies that As a consequence, in the analysis of
Appendix A.1 , the transformationsand should be taken into account. Additionally, the transformation is required to account for the scaling of . The fully connected NN case: Again, consider the outcome of the
hidden node . This time, the transformation implies that As a consequence, in the analysis of
Appendix A.2 , the transformationsand should be taken into account. And, again, the transformation is required to account for the scaling of .
Appendix C. Hyperparameter Selection
The accuracy of the neural network fitting procedure is dependent on the choice of several hyperparameters. For the numerical experiments reported in
Hidden node count: see
Figure A1 ;Size training set: see
Figure A2 ;Learning-rate: see
Figure A3 .
Figure A1. Impact hidden node count: accuracy of the neural network fit per monitor date under a 2-factor model. # training points = 5000. Learning-rate = 0.0002.
Figure A2. Impact size training set: accuracy of the neural network fit per monitor date under a 2-factor model. # hidden nodes = 64. Learning-rate = 0.0002.
Figure A3. Impact learning-rate: accuracy of the neural network fit per monitor date under a 2-factor model. # hidden nodes = 64. # training-points = 10,000.
Appendix D. Proof of Theorem 1
We prove by induction on m. At the last exercise date of the Bermudan, i.e.,
, then ; , then ; , then ; , then .
Appendix E. Proof of Theorem 2
First, we fix some notation.
Let
denote the true price of the Bermudan swaption at conditioned on the fact that it is not yet exercised. Let
denote the estimator of the continuation value at . Let
denote the estimator of . Let
denote the neural network approximation of . Let
denote the numéraire at . Let
.
For the final step, note that if
Appendix F. Proof of Theorem 3
We consider the following three events:
Bounding
Bounding
Appendix G. Proof of Theorem 4
The discounted true price process is a supermartingale under
References
Ametrano, Ferdinando; Ballabio, Luigi. Quantlib—A Free/Open-Source Library for Quantitative Finance. 2003; Available online: https://github.com/lballabio/QuantLib (accessed on 1 March 2020).
Andersen, Leif; Broadie, Mark. Primal-dual simulation algorithm for pricing multidimensional american options. Management Science; 2004; 50, pp. 1222-34. [DOI: https://dx.doi.org/10.1287/mnsc.1040.0258]
Andersen, Leif B. G.; Piterbarg, Vladimir V. Interest Rate Modeling, Volume I: Foundations and Vanilla Models; Atlantic Financial Press: London, 2010a.
Andersen, Leif B. G.; Piterbarg, Vladimir V. Interest Rate Modeling, Volume II: Term Structure Models; Atlantic Financial Press: London, 2010b.
Andersson, Kristoffer; Oosterlee, Cornelis W. A deep learning approach for computations of exposure profiles for high-dimensional bermudan options. Applied Mathematics and Computation; 2021; 408, 126332. [DOI: https://dx.doi.org/10.1016/j.amc.2021.126332]
Becker, Sebastian; Cheridito, Patrick; Jentzen, Arnulf. Deep optimal stopping. Journal of Machine Learning Research; 2019; 20, 74.
Becker, Sebastian; Cheridito, Patrick; Jentzen, Arnulf. Pricing and hedging american-style options with deep learning. Journal of Risk and Financial Management; 2020; 13, 158. [DOI: https://dx.doi.org/10.3390/jrfm13070158]
Beyna, Ingo. Interest Rate Derivatives: Valuation, Calibration and Sensitivity Analysis; Springer Science & Business Media: Berlin/Heidelberg, 2013.
Bishop, Christopher M. Neural Networks for Pattern Recognition; Oxford University Press: Oxford, 1995.
Breeden, Douglas T.; Litzenberger, Robert H. Prices of state-contingent claims implicit in option prices. Journal of Business; 1978; 51, pp. 621-51. [DOI: https://dx.doi.org/10.1086/296025]
Brigo, Damiano; Mercurio, Fabio. Interest Rate Models-Theory and Practice: With Smile, Inflation and Credit; Springer: Berlin/Heidelberg, 2006; vol. 2.
Carr, Peter; Bowie, Jonathan. Static simplicity. Risk; 1994; 7, pp. 45-50.
Carr, Peter; Ellis, Katrina; Gupta, Vishal. Static hedging of exotic options. Quantitative Analysis in Financial Markets: Collected Papers of the New York University Mathematical Finance Seminar; World Scientific: Singapore, 1999; pp. 152-76.
Carr, Peter; Wu, Liuren. Static hedging of standard options. Journal of Financial Econometrics; 2014; 12, pp. 3-46. [DOI: https://dx.doi.org/10.1093/jjfinec/nbs014]
Carriere, Jacques F. Valuation of the early-exercise price for options using simulations and nonparametric regression. Insurance: Mathematics and Economics; 1996; 19, pp. 19-30. [DOI: https://dx.doi.org/10.1016/S0167-6687(96)00004-2]
Chollet, François. Keras. 2015; Available online: https://keras.io (accessed on 1 May 2020).
Chung, San-Lin; Shih, Pai-Ta. Static hedging and pricing american options. Journal of Banking & Finance; 2009; 33, pp. 2140-49.
Dai, Qiang; Singleton, Kenneth J. Specification analysis of affine term structure models. The Journal of Finance; 2000; 55, pp. 1943-78. [DOI: https://dx.doi.org/10.1111/0022-1082.00278]
Derman, Emanuel; Ergener, Deniz; Kani, Iraj. Static options replication. Journal of Derivatives; 1995; 2, [DOI: https://dx.doi.org/10.3905/jod.1995.407927]
Duffie, Darrell; Kan, Rui. A yield-factor model of interest rates. Mathematical Finance; 1996; 6, pp. 379-406. [DOI: https://dx.doi.org/10.1111/j.1467-9965.1996.tb00123.x]
Ferguson, Ryan; Green, Andrew. Deeply learning derivatives. arXiv; 2018; arXiv: 1809.02233
Filipovic, Damir. Term-Structure Models. A Graduate Course; Springer: Berlin/Heidelberg, 2009.
Geman, Helyette; Karoui, Nicole El; Rochet, Jean-Charles. Changes of numeraire, changes of probability measure and option pricing. Journal of Applied probability; 1995; 32, pp. 443-58. [DOI: https://dx.doi.org/10.2307/3215299]
Glasserman, Paul. Monte Carlo Methods in Financial Engineering; Springer Science & Business Media: Berlin/Heidelberg, 2013; vol. 53.
Glasserman, Paul; Yu, Bin. Simulation for american options: Regression now or regression later?. Monte Carlo and Quasi-Monte Carlo Methods 2002; Springer: Berlin/Heidelberg, 2004; pp. 213-26.
Gnoatto, Alessandro; Reisinger, Christoph; Picarelli, Athena. Deep xva solver—A neural network based counterparty credit risk management framework. SIAM Journal on Financial Mathematics; 2023; 14, pp. 314-352. [DOI: https://dx.doi.org/10.1137/21M1457606]
Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron; Bengio, Yoshua. Deep Learning; MIT Press Cambridge: Cambridge, 2016; vol. 1.
Gregory, Jon. The xVA Challenge: Counterparty Credit Risk, Funding, Collateral and Capital; John Wiley & Sons: Hoboken, 2015.
Hagan, Patrick S. Convexity conundrums: Pricing cms swaps, caps, and floors. The Best of Wilmott; 2005; 305. [DOI: https://dx.doi.org/10.1002/wilm.42820030211]
Harrison, J. Michael; Pliska, Stanley R. Martingales and stochastic integrals in the theory of continuous trading. Stochastic Processes and Their Applications; 1981; 11, pp. 215-60. [DOI: https://dx.doi.org/10.1016/0304-4149(81)90026-0]
Haugh, Martin B.; Kogan, Leonid. Pricing american options: A duality approach. Operations Research; 2004; 52, pp. 258-70. [DOI: https://dx.doi.org/10.1287/opre.1030.0070]
Henrard, Marc. Explicit bond option formula in heath–jarrow–morton one factor model. International Journal of Theoretical and Applied Finance; 2003; 6, pp. 57-72. [DOI: https://dx.doi.org/10.1142/S0219024903001785]
Henry-Labordere, Pierre. Deep Primal-Dual Algorithm for BSDEs: Applications of Machine Learning to CVA and IM. 2017; Available online: https://ssrn.com/abstract=3071506 (accessed on 1 October 2020).
Hornik, Kurt; Stinchcombe, Maxwell; White, Halbert. Multilayer feedforward networks are universal approximators. Neural Networks; 1989; 2, pp. 359-66. [DOI: https://dx.doi.org/10.1016/0893-6080(89)90020-8]
Hutchinson, James M.; Lo, Andrew W.; Poggio, Tomaso. A nonparametric approach to pricing and hedging derivative securities via learning networks. The Journal of Finance; 1994; 49, pp. 851-89. [DOI: https://dx.doi.org/10.1111/j.1540-6261.1994.tb00081.x]
Jain, Shashi; Oosterlee, Cornelis W. The stochastic grid bundling method: Efficient pricing of bermudan options and their greeks. Applied Mathematics and Computation; 2015; 269, pp. 412-31. [DOI: https://dx.doi.org/10.1016/j.amc.2015.07.085]
Jamshidian, Farshid. An exact bond option formula. The Journal of Finance; 1989; 44, pp. 205-209. [DOI: https://dx.doi.org/10.1111/j.1540-6261.1989.tb02413.x]
Kingma, Diederik P.; Ba, Jimmy. Adam: A method for stochastic optimization. arXiv; 2014; arXiv: 1412.6980
Kloeden, Peter E.; Platen, Eckhard. Numerical Solution of Stochastic Differential Equations; Springer Science & Business Media: Berlin/Heidelberg, 2013; vol. 23.
Kohler, Michael; Krzyżak, Adam; Todorovic, Nebojsa. Pricing of high-dimensional american options by neural networks. Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics; 2010; 20, pp. 383-410. [DOI: https://dx.doi.org/10.1111/j.1467-9965.2010.00404.x]
Lapeyre, Bernard; Lelong, Jérôme. Neural network regression for bermudan option pricing. arXiv; 2019; arXiv: 1907.06474[DOI: https://dx.doi.org/10.1515/mcma-2021-2091]
Lokeshwar, Vikranth; Bharadwaj, Vikram; Jain, Shashi. Explainable neural network for pricing and universal static hedging of contingent claims. Applied Mathematics and Computation; 2022; 417, 126775. [DOI: https://dx.doi.org/10.1016/j.amc.2021.126775]
Longstaff, Francis A.; Schwartz, Eduardo S. Valuing american options by simulation: A simple least-squares approach. The Review of Financial Studies; 2001; 14, pp. 113-47. [DOI: https://dx.doi.org/10.1093/rfs/14.1.113]
Musiela, Marek; Rutkowski, Marek. Martingale Methods in Financial Modelling; Springer Finance: Berlin/Heidelberg, 2005.
Oosterlee, Kees; Feng, Qian; Jain, Shashi; Karlsson, Patrik; Kandhai, Drona. Efficient computation of exposure profiles on real-world and risk-neutral scenarios for bermudan swaptions. Journal of Computational Finance; 2016; 20, pp. 139-72. [DOI: https://dx.doi.org/10.21314/JCF.2017.337]
Pelsser, Antoon. Pricing and hedging guaranteed annuity options via static option replication. Insurance: Mathematics and Economics; 2003; 33, pp. 283-96. [DOI: https://dx.doi.org/10.1016/S0167-6687(03)00154-9]
Rogers, Leonard C. G. Monte carlo valuation of american options. Mathematical Finance; 2002; 12, pp. 271-86. [DOI: https://dx.doi.org/10.1111/1467-9965.02010]
Ruf, Johannes; Wang, Weiguan. Neural networks for option pricing and hedging: A literature review. Journal of Computational Finance; 2020; in press [DOI: https://dx.doi.org/10.21314/JCF.2020.390]
Shreve, Steven E. Stochastic calculus for finance II: Continuous-time models; Springer Science & Business Media: Berlin/Heidelberg, 2004; vol. 11.
Wang, Haojie; Chen, Han; Sudjianto, Agus; Liu, Richard; Shen, Qi. Deep learning-based bsde solver for libor market model with application to bermudan swaption pricing and hedging. arXiv; 2018; arXiv: 1807.06622[DOI: https://dx.doi.org/10.2139/ssrn.3214596]
Xiu, Dongbin. Numerical Methods for Stochastic Computations: A Spectral Method Approach; Princeton University Press: Princeton, 2010.
Zhu, Steven H.; Pykhtin, Michael. A guide to modeling counterparty credit risk. GARP Risk Review, July/August; 2007; Available online: https://ssrn.com/abstract=1032522 (accessed on 10 November 2020).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
We present a semi-static replication algorithm for Bermudan swaptions under an affine, multi-factor term structure model. In contrast to dynamic replication, which needs to be continuously updated as the market moves, a semi-static replication needs to be rebalanced on just a finite number of instances. We show that the exotic derivative can be decomposed into a portfolio of vanilla discount bond options, which mirrors its value as the market moves and can be priced in closed form. This paves the way toward the efficient numerical simulation of xVA, market, and credit risk metrics for which forward valuation is the key ingredient. The static portfolio composition is obtained by regressing the target option’s value using an interpretable, artificial neural network. Leveraging the universal approximation power of neural networks, we prove that the replication error can be arbitrarily small for a sufficiently large portfolio. A direct, a lower bound, and an upper bound estimator for the Bermudan swaption price are inferred from the replication algorithm. Additionally, closed-form error margins to the price statistics are determined. We practically study the accuracy and convergence of the method through several numerical experiments. The results indicate that the semi-static replication approaches the LSM benchmark with basis point accuracy and provides tight, efficient error bounds. For in-model simulations, the semi-static replication outperforms a traditional dynamic hedge.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 Informatics Institute, University of Amsterdam, Science Park 904, 1098XH Amsterdam, The Netherlands;
2 Indian Institute of Science, Department of Management Studies, Bangalore 560012, India;