1. Introduction
Continuous Mortality Investigation (2016b) introduced a new model for fitting to mortality data: the Age-Period-Cohort-Improvement (APCI) model. This is an extension of the Age–Period–Cohort (APC) model, but it also shares an important feature with the model from Lee and Carter (1992). The APCI model was intended to be used as a means of parameterising a deterministic targeting model for mortality forecasting. However, it is not the purpose of this paper to discuss the Continuous Mortality Investigation’s approach to deterministic targeting. Readers interested in a discussion of stochastic versus deterministic projections, in particular the use of targeting and expert judgement, should consult Booth and Tickle (2008). Rather, the purpose of this paper is to present a stochastic implementation of the APCI model for mortality projections, and to compare the performance of this model with various other models sharing similar structural features.
2. Data
The data used for this paper are the number of deaths d
x,y
aged x last birthday during each calendar year y, split by gender. Corresponding mid-year population estimates,
We use data provided by the Office for National Statistics for the population of the United Kingdom. For illustrative purposes we will just use the data for males. As we are primarily interested in annuity and pension liabilities, we will restrict our attention to ages 50–104 over the period 1971–2015. Although data are available for earlier years, there are questions over the reliability of the population estimates prior to 1971. All death counts were based on deaths registered in the United Kingdom in a particular calendar year and the population estimates for 2002–2011 are those revised to take account of the 2011 census results. More detailed discussion of this data set, particularly regarding the current and past limitations of the estimated exposures, can be found in Cairns et al. (2015).
One consequence of only having data to age 104 is having to decide how to calculate annuity factors for comparison. One option would be to create an arbitrary extension of the projected mortality rates up to (say) age 120. Another alternative is to simply look at temporary annuities to avoid artefacts arising from the arbitrary extrapolation, as used by Richards et al. (2014). We use the latter approach in this paper, and we therefore calculate expectations of time lived and continuously paid temporary annuity factors as follows:
(1)
Restricting our calculations to temporary annuities has no meaningful consequences at the main ages of interest, as shown in Richards et al. (2014). The methodology for approximating the integrals in equations (1–3) is detailed in Appendix A.
For discounting we will use UK government gilt yields, as shown in Figure 1. The broad shape of the yield curve in Figure 1 is as one would expect, namely with short-term yields lower than longer-term ones. However, there is one oddity, namely that yields decline for terms above 24 years.
Figure 1
Yields on UK government gilts (coupon strips only, no index-linked gilts) as at 20 April 2017. Source: UK Debt Management Office (DMO, accessed on 21 April 2017)
[Figure omitted. See PDF]
For v(t) we will follow McCulloch (1971) and McCulloch (1975) and use a spline basis for representing the yields. Note, however, that McCulloch placed his splines with knot points at non-equal distances, whereas we will use equally spaced splines with penalisation as per Eilers and Marx (1996); the plotted points in Figure 1 are sufficiently regular that they look like a smooth curve already, so no distortion is introduced by smoothing. In this paper the P-spline smoothing is applied to the yields directly, rather than to the bond prices as in McCulloch (1971) and McCulloch (1975). The resulting P-spline-smoothed yield curve reproduces all the main features of Figure 1.
3. Model Fitting
We fit models to the data assuming a Poisson distribution for the number of deaths, i.e.
(4)
The models we will fit are the following:
(5)
Following Brouhns et al. (2002) we estimate the parameters using the method of maximum likelihood, rather than the singular-value decomposition of Lee and Carter (1992) or a Bayesian approach. Our focus is on the practical implementation of stochastic models in industry applications, so we estimate
The Age–Period, APC and APCI models are all linear in the parameters to be estimated, so we will use the algorithm of Currie (2013, pages 87–92) to fit them. The Currie algorithm is a generalisation of the iteratively reweighted least-squares algorithm of Nelder and Wedderburn (1972) used to fit generalised linear models (GLMs), but extended to handle models which have both identifiability constraints and smoothing via the penalised splines of Eilers and Marx (1996); see Appendix D for an overview. The Lee–Carter model is not linear, but it can be fitted as two alternating linear models as described by Currie (2013, pages 77–80); as with the other three models, constraints and smoothing via penalised splines are applied during the fitting process. Smoothing will be applied to α x and β x , but not to κ y or γ y−x ; smoothing of α x and β x reduces the effective number of parameters and improves the quality of the forecasts by reducing the risk of mortality rates crossing over at adjacent ages; see Delwarde et al. (2007) and Currie (2013). The fitting algorithm is implemented in R (R Core Team, 2013).
4. Smoothing
An important part of modelling is choosing which parameters to smooth. This is not merely an aesthetic consideration – Delwarde et al. (2007) showed how judicious use of smoothing can improve the quality of forecasts, such as by reducing the likelihood of projected mortality rates crossing over at adjacent ages in the future. Figure 2 shows the
Figure 2
Parameter estimates
[Figure omitted. See PDF]
Figure 3 shows the
Figure 3
Parameter estimates
[Figure omitted. See PDF]
In contrast to Figures 2 and 3, Figures 4 and 5 suggest that smoothing κ and γ is less straightforward. In particular, the
Figure 4
Parameter estimates
[Figure omitted. See PDF]
Figure 5
Parameter estimates
[Figure omitted. See PDF]
We have omitted plots of the
Table 1 summarises our approach to smoothing the various parameters across the four models. The impact of the decision to smooth is shown in the contrast between Tables 2 and 3. We can see that smoothing has little impact on either the forecast time lived or the annuity factors for the Age–Period, Lee–Carter and APCI models. However, smoothing has led to a major change in the central forecast in the case of the APC model; this is due to a different autoregressive, integrated moving average (ARIMA) model being selected as optimal for the κ y terms: ARIMA(0, 1, 2) for the unsmoothed APC model, but ARIMA(3, 2, 0) for the smoothed version. This large change in forecast is an interesting, if extreme, example of the kind of issues discussed in Kleinow and Richards (2016). An ARIMA(p, 1, q) process models the differences in κ y , i.e., a model for improvements, whereas an ARIMA(p, 2, q) process models the rate of change in differences in κ y , i.e., accelerating or decelerating improvements. Smoothing has also improved the fit as measured by the Bayesian information criterion (BIC) – in each case the BIC for a given smoothed model in Table 3 is smaller than the equivalent unsmoothed model in Table 2. This is due to the reduction in the effective number of parameters from the penalisation of the spline coefficients; see equation (21) in Appendix D.
Table 1 Smoothed and Unsmoothed Parameters
| Model | Smoothed | Unsmoothed |
| Age–Period |
|
|
| APC |
|
|
| Lee–Carter |
|
|
| APCI |
|
|
Table 2 Expected time lived and annuity factors for unsmoothed models, together with the Bayesian Information Criterion (BIC) and Effective Dimension (ED).
| Models |
|
| BIC | ED |
| Age–Period | 15.739 | 13.811 | 78,667 | 99.0 |
| APC | 15.217 | 13.510 | 15,916 | 188.0 |
| Lee–Carter | 15.196 | 13.531 | 13,917 | 153.0 |
| APCI | 15.579 | 13.812 | 7,140 | 241.0 |
The yield curve used to discount future cashflows in the annuity factors is shown in Figure 1.
Table 3 Expected time lived and annuity factors for smoothed (S) models, together with the Bayesian Information Criterion (BIC) and Effective Dimension (ED).
| Models |
|
| BIC | ED |
| Age–Period(S) | 15.739 | 13.811 | 78,527 | 55.7 |
| APC(S) | 16.949 | 14.701 | 15,770 | 144.8 |
| Lee–Carter(S) | 15.199 | 13.534 | 13,506 | 63.8 |
| APCI(S) | 15.585 | 13.816 | 6,724 | 151.8 |
The yield curve used to discount future cashflows in the annuity factors is shown in Figure 1.
One other interesting aspect of Tables 2 and 3 is the dramatic improvement in overall fit of the APCI model compared to the others. However, it is worth repeating the caution of Currie (2016) that an “oft-overlooked caveat is that it does not follow that an improved fit to data necessarily leads to improved forecasts of mortality”. This was also noted in Kleinow and Richards (2016), where the best-fitting ARIMA process for κ in a Lee–Carter model for UK males led to the greatest parameter uncertainty in the forecast, and thus higher capital requirements under a value-at-risk (VaR) assessment. As we will see in Section 7, although the APCI model fits the data best of the four related models considered, it also produces relatively high capital requirements.
From this point on the models in this paper are smoothed as per Table 1, and the smoothed models will be denoted (S) to distinguish them from the unsmoothed versions.
5. Projections
The κ values will be treated throughout this paper as if they are known quantities, but it is worth noting that this is a simplification. In fact, the
As in Li et al. (2009) we will adopt a two-stage approach to mortality forecasting: (i) estimation of the time index, κ
y
, and (ii) forecasting that time index. The practical benefits of this approach over Bayesian methods, particularly with regards to VaR calculations in life-insurance work, are discussed in Kleinow and Richards (2016). The same approach is used for
Central projections under each of the four models are shown in Figure 6. The discontinuity between observed and forecast rates for the Age–Period(S) model arises from the lack of age-related modulation of the κ y term – at ages 50–60 there is continuity, at ages 65–75 there is a discontinuity upwards and at ages 85–90 there is a discontinuity downwards. It is for this kind of reason that the Age–Period model is not used in practice for forecasting work.
Figure 6
Observed mortality rates at age 70 and projected rates under Age–Period (smoothed) (AP(S)), Age–Period–Cohort (smoothed) (APC(S)), Lee–Carter (smoothed) (LC(S)) and APCI(S) models
[Figure omitted. See PDF]
6. Constraints and Cohort Effects
All four of the models in the main body of this paper require identifiability constraints, and the ones used in this paper are detailed in Appendix C. There is a wide choice of alternative constraint systems. For example, R’s gnm() function deletes sufficient columns from the model matrix until it is of full rank and the remaining parameters are uniquely estimable and hence identifiable; see Currie (2016). Cairns et al. (2009) imposed weighted constraints on the
However, one consequence of the treatment of corner cohorts described in Appendix B is that it reduces the number of constraints required to uniquely identify parameters in the fitting of the APC and APCI models. Following the rationale of Cairns et al. (2009) in imposing behaviour on
The shape of the APC parameters is largely unaffected by over-constraining, as evidenced by Figure 7. However, it is a matter of concern that the values for
Figure 7
Parameter estimates
[Figure omitted. See PDF]
Figure 8
Parameter estimates
[Figure omitted. See PDF]
The changes in κ
y
and
When we compare the minimal-constraint fits in Figures 7 and 8, we see that for both models
7. VaR Assessment
Insurers in the United Kingdom and European Union are required to use a 1 year, VaR methodology to assess capital requirements for longevity trend risk. Under Solvency II a VaR99.5 value is required, i.e., insurers must hold enough capital to cover 99.5% of possible changes over one year in the financial impact of mortality forecasts. For a set, S, of possible annuity values arising over the coming year, the VaR99.5 capital requirement would be:
(12)
Table 4 Results of Value-at-Risk Assessment
|
| |||
| Models | Median | VaR99.5 | Capital Requirement (%) |
| AP(S) | 13.696 | 14.263–14.317 | 4.14–4.54 |
| APC(S) | 14.993 | 15.192–15.316 | 1.33–2.15 |
| LC(S) | 13.447 | 13.740–13.832 | 2.18–2.86 |
| APCI(S) | 13.692 | 14.246–14.253 | 4.05–4.10 |
The 99.5% quantiles are estimated by applying the estimator from Harrell and Davis (1982) to 5,000 simulations. The ranges given are the 95% confidence intervals computed from the standard error for the Harrell–Davis estimate. The yield curve used to discount future cashflows is shown in Figure 1.
Table 4 shows VaR99.5 capital requirements at age 70, while Figure 9 shows a wide range of ages. The APCI(S) capital requirements appear less smooth and well-behaved than than those for the other models, but the VaR99.5 capital requirements themselves do not appear out of line. We note, however, that the APCI VaR capital requirements exceed the APC(S) and Lee–Carter (S) values at almost every age. How a model’s capital requirements vary with age may be an important consideration for life insurers under Solvency II, such as when calculating the risk margin and particularly for closed (and therefore ageing) portfolios.
Figure 9
VaR99.5 capital-requirement percentages by age for models in Table 4.
[Figure omitted. See PDF]
To understand how the VaR99.5 capital requirements in Table 4 arise, it is instructive to consider the smoothed densities of the annuity factors at age 70 for each model in Figure 10. Here we can see the reason for the higher capital requirement under the APCI model – there is a relatively wider gap between the median and the 99.5% quantile value.
Figure 10
Densities for annuity factors for age 70 from 2015 for 5,000 simulations under the models in Table 4. The dashed vertical lines show the medians and the dotted vertical lines show the Harrell–Davis estimates for the 99.5% quantiles. The shape of the right-hand tail of the APCI(S) model, and the clustering of values far from the median, leads to the higher VaR99.5 capital requirements in Table 4.
[Figure omitted. See PDF]
Table 4 shows the impact of model risk in both the median projected annuity factor and the capital requirement. This is a reminder that it is important for practical insurance work to always use a variety of models from different families. Indeed, we note that the best estimate under the APC(S) model in Table 4 is higher than the estimated VaR99.5 reserves for the other models, a phenomenon also observed by Richards et al. (2014).
8. Conclusions
The APCI model is an interesting addition to the actuarial toolbox. It shares features with the Lee–Carter and APC models and – as with all models – it has its own peculiarities. In the case of the APCI model, the
In the APCI model the
The APCI model fits the data better than the other models considered in this paper, but fit to the data is no guarantee of forecast quality. Interestingly, despite having an improved fit to the data, the APCI model leads to higher capital requirements under a VaR-style assessment of longevity trend risk than most of the other models considered here. These higher requirements vary by age, emphasising that insurers must not only consider multiple models when assessing longevity trend risk, but also the distribution of liabilites by age.
Acknowledgements
The authors thank Alison Yelland, Stuart McDonald, Kevin Armstrong, Yiwei Zhao and an anonymous reviewer for helpful comments. Any errors or omissions remain the responsibility of the authors. All models were fitted using the Projections Toolkit. Graphs were done in R and tikz, while typesetting was done in LATEX. Torsten Kleinow acknowledges financial support from the Actuarial Research Centre (ARC) of the Institute and Faculty of Actuaries through the research programme on Modelling, Measurement and Management of Longevity and Morbidity Risk.
Appendices
A. Integration
We need to evaluate the integrals in equations (1–3). There are several approaches which could be adopted when the function to be integrated can be evaluated at any point, such as adaptive quadrature; see Press et al. (2005, page 133) for details of this and other methods. However, since we only have data at integer ages, tp x,y can only be calculated at equally spaced grid points. Since we cannot evaluate the function to be integrated at any point we like, we maximise our accuracy by using the following approximations.
For two points separated by one year we use the Trapezoidal Rule:
(13)
For three points spaced 1 year apart we use Simpson’s Rule:
(14)
For four points spaced one year apart we use Simpson’s 3/8 Rule:
(15)
For five points spaced 1 year apart we use Boole’s Rule:
(16)
To integrate over n equally spaced grid points we first apply Boole’s Rule as many times as possible, then Simpson’s 3/8 Rule, then Simpson’s Rule and then the Trapezoidal Rule for any remaining points at the highest ages.
B. Corner Cohorts
One issue with the APC and APCI models is that the cohort terms can have widely varying numbers of observations, as illustrated in Figure A1; at the extremes, the oldest and youngest cohorts have just a single observation each. A direct consequence of this limited data is that any estimated γ term for the corner cohorts will have a very high variance, as shown in Figure A2. Cairns et al. (2009) dealt with this by simply discarding the data in the triangles in Figure A1, i.e., where a cohort had four or fewer observations. Instead of the oldest cohort having year of birth y min−x max, for example, it becomes c min=y min−x max+4. Similarly, the youngest cohort has year of birth c max=y max−x min−4 instead of y max−x min.
Figure A1
Number of observations for each cohort in the data region
[Figure omitted. See PDF]
Figure A2
Standard errors of
[Figure omitted. See PDF]
There is a drawback to the approach of Cairns et al. (2009), namely it makes it harder to compare model fits. We typically use an information criterion to compare models, such as the AIC or BIC. However, this is only valid where the data used are the same. If two models use different data, then their information criteria cannot be compared. This would be a problem for comparing the models in Tables 2, 3 and 12, for example, as the fit for an APC or APCI model could not be compared with the fits for the Age–Period and Lee–Carter models if corner cohorts were only dropped for some models. One approach would be to make the data the same by dropping the corner cohorts for the Age–Period and Lee–Carter fits, even though this is technically unnecessary. This sort of thing is far from ideal, however, as it involves throwing away data and would have to be applied to all sorts of other non-cohort-containing models.
An alternative approach is to use all the data, but to simply not fit cohort terms in the corners of Figure 1. This preserves the easy comparability of information criteria between different model fits. To avoid fitting cohort terms where they are too volatile we simply assume a value of γ=0 where there are four or fewer observations. This means that the same data are used for models with and without cohort terms, and thus that model fits can be directly compared via the BIC. Currie (2013) noted that this had the beneficial side effect of stabilising the variance of the cohort terms which are estimated, as shown in Figure A2.
For projections of γ we forecast not only for the unobserved cohorts, but also for the cohorts with too few observations, i.e., the cohorts in the dotted triangle in Figure A1.
C. Identifiability Constraints
The models in equations (5)–(8) all require identifiability constraints. For the Age–Period model we require one constraint, and we will use the following:
(17)
For the Lee–Carter model we require two constraints. For one of them we will use the same constraint as equation (17), together with the usual constraint on β
x
from Lee and Carter (1992):
(18)
There are numerous alternative constraint systems for the Lee–Carter model – see Girosi and King (2008), Renshaw and Haberman (2006) and Richards and Currie (2009) for examples. The choice of constraint system will affect the estimated parameter values, but will not change the fitted values of
For the APC model we require three constraints. For the first one we will use the same constraint as equation (17), together with the following two from Cairns et al. (2009):
(19)
For the APCI model we require five constraints. We will use equations (17), (19) and (20), together with the following additional two:
(21)
The number of constraints necessary for a linear model can be determined from the rank of the model matrix. Note that the approach of not fitting γ terms for cohorts with four or fewer observations, as outlined in Appendix B, makes the constraints involving γ unnecessary for identifiability. As in Continuous Mortality Investigation (2016b), this means that the APC and APCI models in this paper are over-constrained, and will thus usually produce poorer fits than would be expected if a minimal constraint system were adopted. Over-constraining has a different impact on the two models: for the APC model it leads to relatively little change in κ, as shown in Figure 7. However, for the APCI model κ is little more than a noise process in the minimally constrained model (see Figure 8), while any pattern in κ from the over-constrained model appears likely to have been caused by the constraints on γ y−x .
D. Fitting Penalised Constrained Linear Models
The Age–Period, APC and APCI models in equations (5), (6) and (8) are Generalized Linear Models (GLMs) with identifiability constraints. We smooth the parameters as described in Table 1. We accomplish the parameter estimation, constraint application and smoothing simultaneously using the algorithm presented in Currie (2013). In this section, we outline the three development stages leading up to this algorithm.
Nelder and Wedderburn (1972) defined the concept of a GLM. At its core we have the linear predictor, η, defined as follows:
(23)
X can also contain basis splines, which introduces the concept of smoothing and penalisation into the GLM framework; see Eilers and Marx (1996). Currie (2013). extended the IWLS algorithm to find the values,
We note that penalisation is applied to parameters which exhibit a smooth and continuous progression, such as the α
x
parameters in equations (5)–(8). If a second-order penalty is applied, as
Many linear mortality models also require identifiability constraints, i.e., the rank of the model matrix is less than the number of parameters to be estimated. The Age–Period, APC and APCI models of the main body of this paper fall into this category: they are all linear, but in each case rank(X)<length(θ). The gap between rank(X) and length(θ) determines the number of identifiability constraints required. To enable simultaneous parameter estimation, smoothing and application of constraints, Currie (2013) extended the concept of the model matrix, X, to the augmented model matrix, X
aug, defined as follows:
(25)
In this paper we use a Poisson distribution and a log link for our GLMs; this is the canonical link function for the Poisson distribution. This means that the fitted number of deaths is the anti-log of the linear predictor, i.e., E c ×e η . However, Currie (2014) noted that a logit link often provides a better fit to population data. This would make the fitted number of deaths a logistic function of the linear predictor, i.e., E c ×e η /(1+e η ). If the logistic link is combined with the straight-line assumption for α x in equations (5)–(8), this would simplify the models to variants of the Perks model of mortality; see Richards (2008). Currie (2016; Appendix 1) provides R code to implement the logit link for the Poisson distribution for the number of deaths in a GLM. From experience we further suggest specifying good initial parameter estimates to R’s glm() function when using the logit link, as otherwise there can be problems due to very low exposures at advanced ages. The start option in the glm() function can be used for this. In Appendix F we use a logit link to make a M5 Perks model as an alternative to the M5 Gompertz variant using the log link. As can be seen in Table A8, the M5 Perks model fits the data markedly better than the other M5 variants.
E. Projecting κ and γ
A time series is a sequence of elements ordered by the time at which they occur; stationarity is a key concept. Informally, a time series {Y(t)} is stationary if {Y(t)} looks the same at whatever point in time we begin to observe it – see Diggle (1990, page 13). Usually we make do with the simpler second-order stationarity, which involves the mean and autocovariance of the time series. Let:
(26)
be the mean and autocovariance function of the time series. Then the time series is second-order stationary if:
(28)
that is, the covariance between Y(t) and Y(s) depends only on their separation in time; see Diggle (1990, page 58). In practice, when we say a time series is stationary we mean the series is second-order stationary. The assumption of stationarity of the first two moments only is variously known as weak-sense stationarity, wide-sense stationarity or covariance stationarity
The lag operator, L, operates on an element of a time series to produce the previous element. Thus, if we define a collection of time-indexed values {κ t }, then Lκ t =κ t−1. Powers of L mean the operator is repeatedly applied, i.e., L i κ t =κ t−i . The lag operator is also known as the backshift operator, while the difference operator, Δ, is 1−L.
A time series, κ t , is said to be integrated if the differences of order d are stationary, i.e., (1−L) d κ t is stationary.
A time series, κ
t
, is said to be autoregressive of order p if it involves a linear combination of the previous values, i.e.
A time series, κ
t
, is said to be a moving average of order q if the current value can be expressed as a linear combination of the past q error terms, i.e.
A time series, κ
t
, can be modelled combining these three elements as an ARIMA model (Harvey, 1981) as follows:
(30)
An ARIMA model can be structured with or without a mean value. The latter is simply saying the mean value is set at 0. The behaviour and interpretation of this mean value is dependent on the degree of differencing, i.e., the value of d in ARIMA(p, d, q).
For the Age–Period, APC and Lee–Carter models (but not the APCI model), an ARIMA model for κ with d=1 is broadly modelling mortality improvements, i.e., κ t +1−κ t . It will be appropriate where the rate of mortality improvement has been approximately constant over time, i.e., without pronounced acceleration or deceleration. An ARIMA model with d=1 but no mean will project gradually decelerating improvements. An ARIMA model with d=1 and a fitted mean will project improvements which will gradually tend to that mean value. In most applications the rate at which the long-term mean is achieved is very slow and the curvature in projected values is slight. However, there are two exceptions to this:
Pure moving-average models, i.e., ARIMA(0, d, q) models. With such models the long-term mean will be achieved quickly, i.e., after q+d years.
ARIMA models where the autoregressive component is weak. For example, an ARIMA(1, d, q) model where the ar1 parameter is closer to 0 will also converge to the long-term mean relatively quickly, with the speed of convergence inversely proportional to the absolute value of the ar1 parameter.
For the Age–Period, APC and Lee–Carter models (but not the APCI model), an ARIMA model for κ with d=2 is broadly modelling the rate of change in mortality improvements, not the improvements themselves. Thus, with d=2 we are modelling (κ t + 2−κ t + 1)−(κ t + 1−κ t ). Such a model will be appropriate where the rate of mortality improvement has been accelerating or decelerating over time. An ARIMA model with d=2 and without a mean will project a gradual deceleration of the rate of change in mortality improvements.
To project κ and/or γ in each of the models in the paper, we fit an ARIMA model. We fit ARIMA models with a mean for κ in the Age–Period, APC and Lee–Carter models. We fit ARIMA models without a mean for γ in the APC and APCI models, and also for κ in the APCI model.
The ARIMA parameters, including the mean where required, are estimated using R’s arima(), which estimates ARIMA parameters assuming that κ y and γ y−x are known quantities, rather than the estimated quantities that they really are.
While R’s arima() function returns standard errors, for assessing parameter risk we use the methodology outlined in Kleinow and Richards (2016). The reason for this is that sometimes ARIMA parameter estimates can be borderline unstable, and this can lead to wider confidence intervals for the best-fitting model, as shown in Kleinow and Richards (2016).
To fit an ARIMA model we require to specify the autoregressive order (p), the order of differencing (d) and the order of the moving average (q). For a given level of differencing we fit an ARMA(p, q) model and choose the value of p and q by comparing an information criterion; in this paper we used Akaike’s Information Criterion (Akaike, 1987) with a small-sample correction (AICc). Choosing the order of differencing, d, is trickier, as the data used to fit the ARMA(p, q) model are different when d=1 and d=2: with n observations there are n−1 first differences, but only n−2 second differences. To decide on the ARIMA(p, d, q) model we select the best ARMA(p, q) model for a given value of d using the AICc, then we pick the ARIMA(p, d, q) model with the smallest root mean squared error as per Solo (1984).
The choice of differencing order is thorny: with d=1 we are modelling mortality improvements, but with d=2 we are modelling the rate of change of mortality improvements. The latter can produce very different forecasts, as evidenced by comparing the life expectancy for the APC(S) model in Table 3 (with d=2) with the life expectancy for the APC model in Table 2 (with d=1).
Table A1 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(1,1,2) process for κ in smoothed Age–Period model
| Parameters | Estimate | Standard Error |
| ar1 | 0.742 | 0.262 |
| ma1 | −1.366 | 0.350 |
| ma2 | 0.846 | 0.255 |
| Mean | −0.018 | 0.004 |
Table A2 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(1,1,2) process for κ in smoothed Lee–Carter model
| Parameters | Estimate | Standard Error |
| ar1 | 0.821 | 0.215 |
| ma1 | −1.306 | 0.315 |
| ma2 | 0.719 | 0.264 |
| Mean | −0.010 | 0.002 |
Table A3 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(0,1,2) process for κ in smoothed APC model
| Parameters | Estimate | Standard Error |
| ma1 | −0.682 | 0.151 |
| ma2 | 0.353 | 0.225 |
| Mean | −0.017 | 0.002 |
Table A4 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(2,1,0) process for γ in smoothed APC model
| Parameters | Estimate | Standard Error |
| ar1 | −0.037 | 0.079 |
| ar2 | 0.438 | 0.098 |
For a VaR assessment of in-force annuities we need to simulate sample paths for κ. If we want mortality rates in the upper right triangle of Figure A1, then we also need to simulate sample paths for γ. We use the formulae given in Kleinow & Richards (2016) for bootstrapping the mean (for κ only) and then use these bootstrapped parameter values for the ARIMA process to include parameter risk in the VaR assessment (Tables 5–10).
F. Other Models
In their presentation of a VaR framework for longevity trend risk, Richards et al. (2014) included some other models not considered in the main body of this paper. For interest we present comparison figures for members of the Cairns–Blake–Dowd family of stochastic projection models. We first consider a model sub-family based on Cairns et al. (2006) (M5) as follows (Table A6):
(31)
Table A5 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(1,1,2) process for κ in smoothed APCI model
| Parameters | Estimate | Standard Error |
| ar1 | 0.754 | 0.255 |
| ma1 | −1.375 | 0.346 |
| ma2 | 0.860 | 0.278 |
Table A6 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(2,1,0) process for γ in smoothed APCI model
| Parameters | Estimate | Standard Error |
| ar1 | 0.057 | 0.056 |
| ar2 | 0.536 | 0.078 |
Table A7 Definition of M5 Family Under Equation (31)
| Models | g(μ x,y ) | w(x) |
| M5 Gompertz | log |
|
| M5 Perks | logit |
|
| M5 P-spline | log |
|
for some functions g() and w() where κ
0 and κ
1 form a bivariate random walk with drift. The three members of the M5 family used here are defined in Table A7, with the results in Tables A8 and A9. We also consider two further models from Cairns et al. (2009). First, M6:
(32)
Model M6 in equation (32) needs two identifiability constraints and we use equations (19) and (20). As with the M5 family, κ
0 and κ
1 form a bivariate random walk with drift and γ is projected using an ARIMA model (as done for the APC and APCI models). We also consider M7 from Cairns et al. (2009):
(33)
Table A8 Expected time lived and annuity factors for unsmoothed models, together with Bayesian Information Criterion (BIC) and Effective Dimension (ED)
| Models |
|
| BIC | ED |
| M5 Gompertz | 15.519 | 13.734 | 46,597 | 90.0 |
| M5 Perks | 15.500 | 13.702 | 28,950 | 90.0 |
| M5 P-spline | 15.423 | 13.646 | 33,428 | 99.7 |
| M6 | 16.960 | 14.730 | 7,969 | 179.0 |
| M7 | 15.514 | 13.763 | 7,956 | 223.0 |
Table A9 Results of Value-at-Risk assessment for models in Table A8
|
| |||
| Models | Median | VaR99.5 | Capital Requirement (%) |
| M5 Gompertz | 13.836 | 14.268–14.328 | 3.13–3.55 |
| M5 Perks | 13.805 | 14.278–14.345 | 3.42–3.91 |
| M5 P-spline | 13.747 | 14.192–14.250 | 3.24–3.66 |
| M6 | 14.925 | 15.418–15.481 | 3.31–3.72 |
| M7 | 13.867 | 14.277–14.336 | 2.95–3.39 |
Comparing Table A8 with Tables 2 and 3 we can see that the stochastic version of the APCI model produces similar expected time lived and temporary annuity factors to most models, apart from the APC and M6 models. This suggests that the best-estimate forecasts under the APCI model are consistent and not extreme.
Comparing Table A9 with Table 4 we can see that, while the AP(S) and APCI(S) models produce the largest VaR99.5 capital requirements at age 70, these are not extreme outliers.
A comparison of Table 4 with the equivalent figures in Richards et al. (2014, Table 4) shows considerable differences in VaR99.5 capital at age 70. There are two changes between Richards et al. (2014) and this paper that drive these differences. The first change is that Richards et al. (2014) discounted cashflows using a flat 3% per annum, whereas in this paper we discount cashflows using the yield curve in Figure 1. The second change lies in the data: in this paper we use UK-wide data for 1971–2015 , whereas Richards et al. (2014) used England and Wales data for 1961–2010. There are three important sub-sources of variation buried in this change in the data: the first is that the population estimates for 1961–1970 are not as reliable as the estimates which came after 1970; the second is that the data used in this paper include revisions to pre-2011 population estimates following the 2011 census; and the third is that mortality experience after 2010 has been unusual and is not in line with trend. The combined effect of these changes to the discount function and the data has led to the VaR99.5 capital requirements at age 70 for the models in Table A9 being around 0.5% less than for the same models in Richards et al. (2014, Table 4). However, a comparison between Figures 9 and A3 shows that these results are strongly dependent on age. As in Richards et al. (2014), this means that it is insufficient to consider a few model points for a VaR assessment – insurer capital requirements not only need to be informed by different projection models, but they must take account of the age distribution of liabilities.
Figure A3
VaR99.5 capital requirements by age for models in Table A8
[Figure omitted. See PDF]
G. Differences compared to Continuous Mortality Investigation approach
In this paper we present a stochastic implementation of the APCI model proposed by Continuous Mortality Investigation (2016b). This is the central difference between the APCI model in this paper and its original implementation in Continuous Mortality Investigation (2016a, 2016b). However, there are some other differences of note and they are listed in this section as a convenient overview.
As per Cairns et al. (2009) our identifiability constraints for γ y−x weight each parameter according to the number of times it appears in the data, rather than assuming equal weight as in Continuous Mortality Investigation (2016b, page 91). As with Continuous Mortality Investigation (2016b) our APC and APCI models are over-constrained (see Appendix C and section 6).
For cohorts with four or fewer observed values we do not estimate a γ term – see Appendix B. In contrast, Continuous Mortality Investigation (2016a, pages 27–28) adopts a more complex approach to corner cohorts, involving setting the cohort term to the nearest available estimated term.
For smoothing α x and β x we have used the penalised splines of Eilers and Marx (1996), rather than the difference penalties in Continuous Mortality Investigation (2016b). Our penalties on α x and β x are quadratic, whereas Continuous Mortality Investigation (2016b) uses cubic penalties. Unlike Continuous Mortality Investigation (2016b) we do not smooth κ y or γ y−x . We also determine the optimal level of smoothing by minimising the BIC, whereas Continuous Mortality Investigation (2016b) smooths by user judgement.
As described in Section 3, for parameter estimation we use the algorithm presented in Currie (2013). This means that constraints and smoothing are an integral part of the estimation, rather than separate steps applied in Continuous Mortality Investigation (2016b, page 15).
Unlike Continuous Mortality Investigation (2016b) we make no attempt to adjust the exposure data.
For projections we use ARIMA models for both κ y and γ y−x , rather than the deterministic targeting approach of Continuous Mortality Investigation (2016b, pages 31–35). Unlike Continuous Mortality Investigation (2016b) we do not attempt to break down mortality improvements into age, period and cohort components, nor do we have a long-term rate to target and nor do we have any concept of a “direction of travel”(Continuous Mortality Investigation, 2016b, page 14).
H. Suggestions for Further Research
There were many other things which could have been done in this paper, but for which there was not the time available. We list some of them here in case others are interested in doing so:
Female lives. To illustrate our points, and to provide comparable figures to earlier papers such as Richards et al. (2014) and Kleinow and Richards (2016), we used the data for males. However, both insurers and pension schemes have material liabilities linked to female lives, and it would be interesting to explore the application of the APCI model to data on female lives.
Back-testing. It would be interesting to see how the APCI model performs against other models in back-testing, i.e., fit the models to first part of the data set and see how the resulting forecasts compare to the latter part of the data.
Sensitivity testing. Some models are sensitive to the range of ages selected or the period covered. It would be interesting to know how sensitive the APCI model is to such changes.
Canonical correlation. Models with both period and cohort terms, such as the APC and APCI models, usually have these terms projected as if they are independent. However, such terms are usually correlated, making the assumption of independence at best a simplifying assumption for convenience. It would be interesting to compare the correlations of κ and γ for the APC and APCI models. Joint models for κ and γ could be considered.
Over-dispersion. To fit the models, both we and Continuous Mortality Investigation (2017, page 5) assume that the number of deaths follows a Poisson distribution, i.e., that the variance is equal to the mean. However, in practice death counts are usually over-dispersed, i.e., the variance is greater than the mean.
* Correspondence to: Longevitas Ltd, 24a Ainslie Place, Edinburgh, EH3 6AJ. E-mail: [email protected]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© Institute and Faculty of Actuaries 2019. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (
Abstract
The Age-Period-Cohort-Improvement (APCI) model is a new addition to the canon of mortality forecasting models. It was introduced by Continuous Mortality Investigation as a means of parameterising a deterministic targeting model for forecasting, but this paper shows how it can be implemented as a fully stochastic model. We demonstrate a number of interesting features about the APCI model, including which parameters to smooth and how much better the model fits to the data compared to some other, related models. However, this better fit also sometimes results in higher value-at-risk (VaR)-style capital requirements for insurers, and we explore why this is by looking at the density of the VaR simulations.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer





