Full Text

Turn on search term navigation

Geosci. Model Dev., 9, 39073918, 2016 www.geosci-model-dev.net/9/3907/2016/ doi:10.5194/gmd-9-3907-2016 Author(s) 2016. CC Attribution 3.0 License.

Eric Larour1, Jean Utke2, Anton Bovin3, Mathieu Morlighem4, and Gilberto Perez5

1Jet Propulsion Laboratory, California Institute of technology, 4800 Oak Grove Drive MS 300-323, Pasadena, CA 91109-8099, USA

2Allstate Insurance Company, 2775 Sanders Rd., Northbrook, IL 60062, USA

3Department of Physics, The University of Chicago, 5720 South Ellis Avenue, Chicago, IL 60637, USA

4University of California, Irvine, Department of Earth System Science, Croul Hall, Irvine, CA 92697-3100, USA

5University of California, Irvine, School of Information and Computer Sciences, Irvine, CA 92697-3100, USA

Correspondence to: Eric Larour ([email protected])

Received: 26 April 2016 Published in Geosci. Model Dev. Discuss.: 9 May 2016 Revised: 15 September 2016 Accepted: 6 October 2016 Published: 1 November 2016

Abstract. Within the framework of sea-level rise projections, there is a strong need for hindcast validation of the evolution of polar ice sheets in a way that tightly matches observational records (from radar, gravity, and altimetry observations mainly). However, the computational requirements for making hindcast reconstructions possible are severe and rely mainly on the evaluation of the adjoint state of transient iceow models. Here, we look at the computation of adjoints in the context of the NASA/JPL/UCI Ice Sheet System Model (ISSM), written in C++ and designed for parallel execution with MPI. We present the adaptations required in the way the software is designed and written, but also generic adaptations in the tools facilitating the adjoint computations. We concentrate on the use of operator overloading coupled with the AdjoinableMPI library to achieve the adjoint computation of the ISSM. We present a comprehensive approach to (1) carry out type changing through the ISSM, hence facilitating operator overloading, (2) bind to external solvers such as MUMPS and GSL-LU, and (3) handle MPI-based parallelism to scale the capability. We demonstrate the success of the approach by computing sensitivities of hindcast metrics such as the mist to observed records of surface altimetry on the northeastern Greenland Ice Stream, or the mist to observed records of surface velocities on Upernavik Glacier, central West Greenland. We also provide metrics for the scalability of the approach, and the expected performance. This approach has the potential to enable a new generation of hindcast-validated projections that make full use of the wealth of datasets cur-

An approach to computing discrete adjoints for MPI-parallelized models applied to Ice Sheet System Model 4.11

rently being collected, or already collected, in Greenland and

Antarctica.

1 Introduction

Constant monitoring of polar ice sheets through remote sensing, in particular since the advent of altimeter, radar, and gravity sensors such as ICESat-1, CryoSat, RADARSAT-1, ERS-1 and ERS-2, Envisat, and GRACE, has created a large amount of data that has yet to nd its way through ice sheet models (ISMs) and hindcast reconstructions of polar ice sheet evolution. In particular, as evidenced by the wide discrepancy between ISMs involved in the SeaRISE and Ice2Sea projects (Nowicki et al., 2013; Bindschadler et al., 2013), signicant improvements in modeled projections of the mass balance of polar ice sheets and their contribution to future sea-level rise have not resulted from the increase in availability of data, but rather from improvements in the type of physics captured in forward models. One reason for this is the lack of data assimilation capabilities embedded in the current generation of state-of-the-art ISMs. In the past 10 years, great strides have been made in improving model initialization by using steady-state model inversions of basal friction (MacAyeal, 1993; Morlighem et al., 2010; Larour et al., 2012; Price et al., 2011; Arthern and Gudmundsson, 2010), ice rheology (Rommelaere and MacAyeal, 1997; Larour et al., 2005; Khazendar et al., 2007),

Published by Copernicus Publications on behalf of the European Geosciences Union.

3908 E. Larour et al.: Discrete adjoints for MPI models applied to the ISSM

and bedrock elevation (Morlighem et al., 2014), among others. However, these approaches aim at improving our knowledge of poorly constrained input parameters and boundary conditions as long as the ice-ow regime is captured in a steady-state conguration. These inversions rely on analytically derived adjoint states of forward stress balance or mass-transport models, but do not extend to transient regimes of ice ow.

Applications to transient models and long temporal time series such as the ICESat/CryoSat continuous altimetry record from 2003 to the present day have been much more rare, and, to our knowledge, are limited to a few studies such as Heimbach and Bugnion (2009), Goldberg and Sergienko (2011), Goldberg and Heimbach (2013), Larour et al. (2014), and Goldberg et al. (2015), among others. The main issue here precluding widespread application of transient data assimilation lies in the difculty in deriving temporal adjoints of transient models. In many cases, the sheer number of physical processes being represented makes the manual derivation of the adjoint state of a forward model extremely cumbersome and difcult to maintain. This state of affairs can be mitigated by adopting different approaches, such as (1) ensemble runs, as in Applegate et al. (2012), where model runs compatible with observations are selected; (2) methods similar to the ux-correction methods implemented in Aschwanden et al. (2013); Price et al. (2011), where boundary conditions are corrected in order to match time series of observations (tuning approach); (3) quasi-static approaches, where snapshot inversions are carried out in time, as in Habermann et al. (2012, 2013); and (4) sampling methods, which have the main drawback of being computationally very expensive (each sample at the cost of one forward run). Though this is not an exhaustive list of all available methods, the main advantage of adjoint-driven inversions is that it relies on the exact sensitivity of a forward model to its inputs, hence ensuring a physically meaningful inversion.

Understanding sensitivities of a forward ice-ow model, which is needed to physically constrain a temporal inversion, requires computation of derivatives of model outputs to model inputs. If such derivatives are approximated by nite-difference schemes, they are subject to the tradeoff between approximation and truncation errors for the perturbation, which is aggravated for higher-order derivatives. If the derivatives are computed using algorithmic differentiation (AD; Griewank and Walther, 2008), also known as automatic differentiation, then one can attain derivatives at a oating-point precision equivalent to that of the underlying program implementation of the numerical model, provided it is amenable to the application of an AD tool. In particular, this approach does not depend on the type of physics relied upon, and it is transparent to the model equations, provided each step of the overall software is differentiable. Indeed, the

AD method assumes a numerical model M that is a function

f : Rn ! Rm

x [mapsto]! y = f (x)

(1)

implemented as a computer program. The execution of the program implies the execution of a sequence of arithmetic operators such as +, , and and intrinsics sin,ex, and so

forth to which the chain rule can be applied. For each such elemental operation r = (a,b,...) with result r and argu

ments a,b,...1 we can write the total derivative as

r =

@a a +

@b b + .... (2)

For example, if is the multiplication operator , i.e., r =

ab, then one will get the product rule r = b a + a b. Applying

the above rule to each elemental operation in the sequence gives a method to compute

y = J x with the Jacobian, J =

[bracketleftbigg]

@fi @xj

,i = 1...m, j = 1...n,

without explicitly forming the Jacobian. This method applies the chain rule in the computation order of the values in the program and is known as forward-mode AD. The opposite order of applying the chain rule to the elemental operations, known as reverse or adjoint-mode AD, yields projections x = J T y.

This is achieved using the partial derivatives of the forward operation as weights in redistributing the adjoints of the result r to the adjoints of the respective arguments a, b following the rule

a = a +

@a r; b = b +

@b r; ...r;= 0. (3)

Here v denotes the adjoint corresponding to variable v. The nal zeroing of the result adjoint r corresponds to the notion that r had been assigned a new value overriding any previous value r might have had. Correspondingly the adjoint of r has been completely distributed, and to allow the increment formulation for the adjoint of any prior value of r, it has to be (re)set to 0.

For the multiplication example one therefore gets a =

a + br; b = b + ar; r = 0. In particular, for applications in

which m n, the reverse mode is advantageous because its

computational cost depends on m reverse sweeps instead of n forward sweeps. Typical problems in cryosphere science involve computations of diagnostics that are scalar-valued cost functions (m = 1), such as for example the spatio-temporally

averaged mist between modeled surface elevation and ob-served surface topography (Larour et al., 2014). For these

1In practice most are uni- or bi-variate.

Geosci. Model Dev., 9, 39073918, 2016 www.geosci-model-dev.net/9/3907/2016/

E. Larour et al.: Discrete adjoints for MPI models applied to the ISSM 3909

cases, one can compute the gradient rf = J T y with y = 1

as a single projection (where f is the function dened in Eq. 1). Thus, for high-resolution models implying very large n, the reverse mode is an enabling and potentially very efcient technique. This signicant capability in AD is what makes its application to data assimilation so efcient. Instead of evaluating a Jacobian using one forward sweep for each one of the model inputs, which would be signicantly consuming as it scales in n, the reverse mode evaluates the gradient of one specic output of interest with respect to the model inputs in only one reverse sweep.

Using this approach, it is possible, without manual derivation, to evaluate the derivative @f@x from Eq. (1). In cryosphere models, f could typically be a transient ice-ow model, with inputs x taken for example as the basal drag coefcient at the icebed interface, and the output y as the cost function measuring the spatio-temporally averaged difference between for example surface elevation and an altimetry record.In this case, m = 1, and n is the number of degrees of free

dom of the basal drag input. Obtaining the sensitivity function @f@x is critical both in terms of enabling data assimilation using inversion methods and in terms of understanding the sensitivity of a model to its model inputs, as well as the impact of processes represented in the model itself. Approaches other than AD can be relied upon to compute such a sensitivity, such as solving for the adjoint equation directly (Morlighem et al., 2010; Larour et al., 2012; Cornford et al., 2013), but usually, a mix of forward AD or reverse AD is implemented for formulations where the manual derivation is difcult (Pawlowski et al., 2012; Goldberg and Heimbach, 2013). In Pawlowski et al. (2012) for example, AD relies on template-based generic programming to compute sensitivities, while in Goldberg and Heimbach (2013), AD is based on source-to-source transformation. Our approach will instead be based on so-called operator overloading, which is uniquely suited to a framework written in C++.

Applying AD to large-scale frameworks such as the Ice Sheet System Model (ISSM) (Larour et al., 2012), MITgm (Heimbach, 2008), SICOPOLIS (Heimbach and Bugnion, 2009), or DassFlow (Monnier, 2010, among others) is a difcult proposition but one that enables signicant improvements in the way models can be initialized (Heimbach and Bugnion, 2009), hindcast validated (Larour et al., 2012), and calibrated (Goldberg et al., 2015) towards better projections.Traditional approaches relying on source-to-source transformation have been developed, but for frameworks such as the ISSM, which are C++ oriented, and highly parallelized, this type of approach breaks down. Our goal here is to demonstrate how the so-called operator-overloading AD approach can be implemented and validated for a framework such as the ISSM, and what developments were necessary to make this capability operational. Our approach is discussed in Sect. 2 of this paper, with Sect. 3 describing the method validation as well as applications with the ISSM. We discuss and conclude in the last section the applicability of such an ap-

proach to other frameworks, and the opportunities these new developments afford for cryosphere science and data assimilation of remote sensing data in particular.

2 Methodology and validation

2.1 Source-to-source vs. overloaded operators

Distinctions between the AD implementation approaches relate to the way the derivative data in v or v, respectively, are

associated with the original program variable v (data augmentation) and how the logic for coefcient propagation is added to the original program (logic augmentation). The basic options for the former are association by address and association by name. Association by address packs the original and the derivative data into a new type called the active type, and all differentiated program variables have their type changed to that new active type. For the logic augmentation one uses source code transformation or operator overloading. Because of the complexity of the C++ syntax and semantics, there currently still is no comprehensive AD tool for source code transformation of C++ models, and thus operator overloading remains the method of choice, implying association by address for the data augmentation. In practice, the overloaded operators and intrinsics for the forward mode typically execute Eq. (2) directly for the forward mode and for the reverse mode record the sequence of operations into a trace T (f (x)), which then is read backward by an interpreter R(T ) that executes the statements Eq. (3) for the reverse mode. The invocation of R(T ) is part of the user-written logic that then uses the derivatives. The model thus enhanced by an adjoint capability computing both f and rf

is denoted by M.

2.2 Type change to enable overloaded operators

Changing to the aforementioned active type is a signicant effort to be undertaken in the model code. Among the choices to effect this type change, one should select one that is transparent to the model development process, is maintainable, and minimizes the manual effort.

The tool of choice for this paper is ADOL-C (Griewank et al., 1996; ADOL-C, 2016) and our target model M is the ISSM (Larour et al., 2012). The ADOL-C manual is quiet on the practical approach to effect the type change. For models with a large C++ code base and many contributing developers such as the ISSM, not all of them aware of M but merely interested in running M, it is important to make this process transparent and robust. The representation of M as a computer program typically means that in the set of program variables V = A[P there is a subset P of passive vari

ables that do not carry derivative information, such as variables of non-differentiable type (integers, strings) but also oating-point parameters, physical constants, or descriptors of the problem domain. The size of the trace T is a major fac-

www.geosci-model-dev.net/9/3907/2016/ Geosci. Model Dev., 9, 39073918, 2016

3910 E. Larour et al.: Discrete adjoints for MPI models applied to the ISSM

tor in the computational efciency of the adjoint. Thus, one must categorize program variables into A and P with the aim

of minimizing the set A of active (i.e., type-changed) pro

gram variables. In the ISSM this is accomplished by changing double variables to a type named for brevity, e.g., TA (in the ISSM IssmDouble) for variables in A, and to, e.g.,

TP (in the ISSM IssmPDouble) for variables in P, respec

tively. This categorization must be performed by an expert familiar with the global data dependencies in M and the work that has been done for M. In addition, it can be incrementally done for code contributions supplied as double variables without breaking M. This is a key aspect of the overloaded AD approach that is amenable to easy maintenance and expansion of a complex C++ framework such as the ISSM, and one of the key reasons why this approach was selected to compute M in the ISSM. In the end, all double types should eventually be replaced with either TA or TP. Following good practices in C++ both TA and TP are typedefed in a central location switched only there via a preprocessing macro to use the active type supplied by the AD tool or hide the distinction altogether for running the plain M. This implies that one cannot explicitly overload methods or dene data structures for distinct TA and TP without ltering the declaration and denition of the TA variant by the preprocessor as well. Such code duplication introduces an undue code maintenance burden.

Instead, the recommended approach is the use of C++ template classes and methods where TA and TP are the concrete arguments for an abstract template type T. While the use of templates may imply a larger up-front effort if they were not present in the model code before, as was the case in the ISSM, the long-term benets not only for M but also the plain M are obvious when one wants to experiment with computation at different precision levels. Aside from the ISSM, an established general example of this kind of templating is the use of SACADO within Trilinos (The Trilinos Project, 2014; Phipps et al., 2008). Of particular value in the ISSM was the migration of data containers for arrays and (sparse) matrices and that of existing wrappers to malloc and free to templated wrappers of new and delete, called xNew and xDelete in the ISSM. The latter is necessary not only to properly instantiate TA variables, but also permits specializations for TA in M as explained in the following.

Overall, the use of C++ template classes introduces an additional element of complexity to the code, but it also means that the same exact framework can be used to compute both the forward run and its gradient, using the exact same code base, without the need to manage different code for the computation of M. This in turn means that the ISSM can be compiled using ADOLC and shipped to the community in only one binary version that can accommodate both M and M, which is a critical feature in terms of extending AD features to the wider community without having to dive into the intrinsic complexities of AD transformation itself.

Interpreting the trace by R(T ) means that the space holding the data for the variables r,a,b 2 V (the set of instan

tiated program variables at runtime) occurring in each operation r = (a,b) must be represented by some mapping

! : V [mapsto]!N+ to pseudo addresses. In ADOL-C these ad

dresses are called locations and represent indices in a work array held by R(T ). The pseudo addresses must be managed through the TA constructor and destructor in a fashion similar to the memory management of the actual program variables themselves; i.e., pseudo addresses are assigned from and returned to a pool [Omega1] of available addresses. However, no distinction between heap and stack variables is made and generally data locality will not be preserved. On the other hand, for special operations A with array arguments of size s, that is, calls to external solvers (see Sect. 2.3) or MPI routines (see Sect. 2.4), it would be counterproductive to record in T the pseudo addresses of each of the s array elements rather than a consecutive range. The latter, however, requires that the pool be primed by some call [Omega1](s) such that ! returns consecutive pseudo addresses when called by the constructor for each element in the array. In ADOL-C this is done by calling ensureContiguousLocations immediately before an active array is instantiated. Avoiding copying of array values in M as an efciency measure means that any given array has a good chance of being used in A, and to avoid littering the code with preprocessor-guarded calls to [Omega1](s), we decided in the ISSM to instead add a call to ensureContiguousLocations in the TA specialization of xNew:

#if defined(_HAVE_ADOL_C_) template <>adouble* xNew(unsigned int s) { ensureContiguousLocations(s);

adouble* aT_p=new adouble[s]; assert(aT_p);

return aT_p; }#endif

Thus, the use of A means frequent calls to [Omega1](s). The implementation of [Omega1](s) searches the pool to nd a sufciently large consecutive range or otherwise grow the pool by at least s addresses. In the ISSM, this search became a significant performance bottleneck. Minimizing the search effort and balancing it with pool growth has been newly implemented in ADOL-C and is controlled by setting parameters via setStoreManagerControl for the ratio of the addresses inside and outside of the pool and the maximal pool size to trigger searches.

2.3 Binding to external solvers

The strength of the AD approach lies in the automated differentiation of all the parts of the model that are changing and expanding as the model is being developed, as opposed

Geosci. Model Dev., 9, 39073918, 2016 www.geosci-model-dev.net/9/3907/2016/

E. Larour et al.: Discrete adjoints for MPI models applied to the ISSM 3911

to manually programming adjoints that are error-prone, and implies a signicant effort in terms of code development and maintenance. However, because models reference libraries, for xed mathematical mappings that may have easily programmable high-level adjoints, one can and should exploit this insight as it can yield efciency gains not obtainable by an AD tool remaining unaware that calls to some collection of subroutines represent a certain mathematical mapping. A good example are linear solver libraries that may not even be written in the same programming language as the model in question. Considering the system As = b, the

solver S computing s = S(A,b) needs an adjoint counterpart

solving AT t = s, incrementing b = b + t and, if A 2 A, also

A = A bsT . Clearly, the solve with the transposed can eas

ily be implemented reusing the solver S as in t = S(AT ,s).

However, the answers to the following questions affect the efciency and ought to be considered in the context of the given model (Giles, 2008).

Q1: Is A 2 A, which therefore requires the rank-1 update to

Q2: Are the previous values of s, r, and A overwritten by S but used to compute partial derivatives for Eq. (3), and therefore must be saved and restored?

Q3: If S is a direct solver, should one save the factors or refactorize for the adjoint solve?

Q4: How does the external function interface used by R(T ) allow for efcient reuse of intermediate buffers?

External functions fe that had been supported by ADOLC had the signature fe(lx,x,ly,y) with inputs x, outputs y for the original call, lx,ly for their respective array lengths, and fe(ly,y,lx,x) for the adjoint counterpart. In the case of a linear system with A 2 A, the input x is packed with both

A and r, while y contains the solution s of the system on return. This, however, was insufcient for binding to solvers from the GNU Scientic Library (GSL; Galassi, 2009) and MUMPS (Amestoy et al., 2001) used by the ISSM. The following extensions were implemented in ADOL-C to enable the ISSM adjoints, but are generic in nature and would need to be supported in some form by other tools.

E1: expanded the fe interface by x,y to fe(ly,y,lx,x,x,y) to enable refactoring;

E2: added optional parameters to pass in the sparsity pattern of A for MUMPS as a generic integer array i of length li for both fe and fe;

E3: added controls for storing and restoring prior values of x and y;

E4: added tracking of the maximal lx,ly in sequences of fe calls.

Regarding Q1, we know that in the ISSM, A 2 A, and re

garding Q2, the parameters passed to fe have no other uses and, therefore, using the controls (E3), we avoid (re)storing their values. The direct solver from the GSL used here had no API control to back-solve with the factors for the transposed and we did not want to reverse engineer the permutation representation. Hence the refactoring was done as a matter of convenience for the sequential reference case requiring E1.The parallel and therefore practically more efcient MUMPS solver operates on sparse, distributed A, therefore requiring E2. MUMPS offers both the ability to store the factors to le and perform the back-solve for the transpose. However, the MUMPS portion of the runtime is comparatively small (see Sect. 3). Consequently the overhead for the le I/O when considering the factor data size after ll-in is not expected to yield much practical benet in this context, answering Q3.Finally, for Q4, extension E4 is exploited because transient runs of the ISSM need to account for changes in the system size, and a preallocation with the maximal buffer sizes therefore avoids some of the memory management overhead.

2.4 Handling parallelism with the AdjoinableMPI library

As is the case with the ISSM, practically relevant science problems incur a computational complexity that necessitates execution on parallel hardware, often using MPI as the paradigm of choice. Sending data with MPI from a source buffer s to a target buffer t can be interpreted as a simple assignment t = s. This implies for the adjoint an increment

s = s + t; that is, the adjoint of a communication is a rever

sal of the data ow between buffers. Calculating the adjoint of t = s means sending back the value of t and computing

s = s + t. The principles for adjoining two-sided MPI com

munication have been explored in Utke et al. (2009). The development of the AdjoinableMPI (AMPI) wrapper library AdjoinableMPI (2016) started in the fall of 2012. It is designed to provide an AD tool-independent implementation for adjoining MPI-parallelized C, C++, and Fortran models.The wrapper interfaces distinguish themselves from the original MPI by the prex AMPI and have a few additional parameters where needed to enable the adjoint functionality.AMPI also provides additional types and predened symbols. Discussing the internal design of AMPI is outside the scope of this paper. However, the application to the ISSM is the rst large-scale practical use of AMPI and in the following we will discuss the steps taken to use it in the ISSM code base.

For testing and small-scale experiments, the ISSM code, as many other models, treats MPI parallelization as a compile-time option controlled by preprocessor macros. Furthermore, turning the adjoint capability on and off as suggested in Sect. 2.2 would imply additional switching between the original MPI and AMPI with code duplication and the potential for errors. To avoid these undesirable consequences we de-

www.geosci-model-dev.net/9/3907/2016/ Geosci. Model Dev., 9, 39073918, 2016

3912 E. Larour et al.: Discrete adjoints for MPI models applied to the ISSM

Table 1. Logic variants encapsulated in the ISSM MPI wrapper library.

MPI AD

1 no no emulate MPI semantic, e.g., with memcpy between buffers for MPI_Reduce

2 no yes emulate MPI semantic, but for differentiation need to use the overloaded assignments instead of memcpy

3 yes no pass-through to plain MPI

4 yes yes call AMPI routines

cided to introduce in the ISSM another wrapper layer (prexed ISSM_MPI) of calls and denitions to encapsulate completely the four functionality variants listed in Table 1.

The approach is shown for MPI_Reduce in Fig. 1.

Variant 1 is implemented by line 21, variant 2 by lines 13 18 and 21, variant 3 by line 9, and variant 4 by line 7. The example code reects some of the simplifying assumptions being made based on the MPI usage patterns in the ISSM.Examples are the absence of MPI_IN_PLACE and user-dened MPI types as well as all active data being of double precision. This permits the simple distinction in line 12, but of course the logic covers all other non-differentiated MPI types that may occur. The following lines (14, 15) indicate the pairing between the C++ type denition IssmDouble and the wrapper-dened ISSM_MPI_DOUBLE to signify active data being communicated when AD is enabled. Mirroring in MPI the approach described in Sect. 2.2 with two types TA and TP, the passive type TP (concrete IssmPDouble) has an MPI counterpart dened in the wrapper as ISSM_MPI_PDOUBLE. It too can be the value passed in via the datatype argument at line 1 and signals to AMPI passive oating-point communications. The denitions provided by the ISSM_MPI wrapper corresponding to types TA and TP are given in Table 2.

The matching between the actual type of the buffer and the corresponding MPI type must extend to the templating approach suggested for the type change in Sect. 2.2. The main reason is to avoid type errors, which become more common as the complexity of a code increases. Because the MPI standard keeps the denition of MPI_Datatype opaque and MPI types can be created at runtime, the value of MPI_Datatype cannot be used as the value template parameter itself (as it may not be a compile time constant).Therefore, a template buffer type T must be paired with the corresponding MPI type value declared as a pointer parameter, as shown in Fig. 2. Assuming many AMPI calls in foo, this connes the code locations where type errors may be introduced to the template instantiations themselves.

Finally, a practical concern for using the parallelized ad-joint is the handling of sensitivities to quantities that are uniformly initialized across ranks, such as parameters. Frequently, as was the case within the ISSM, these quantities are initialized from les or otherwise per process in the parallel case the same way as in the sequential case. In the parallel case that implies a replication of the same quantity across

ranks. However, to obtain the correct sensitivities, the quantity q in question should be unique; in other words, that quantity must be uniquely initialized at one root rank and then broadcast to the other ranks. Otherwise, for r ranks, at each rank one would then obtain only a part qi of the total q and would have to manually sum up q =

Pqi. With an initial broadcast of q, however, the corresponding adjoint provided through AMPI by using AMPI_Bcast is that exact sum reduction, and R(T ) yields the correct adjoint at the broadcast root. This notion similarly applies to any situation where a conceptually unique quantity of active type is implicitly replicated on some ranks.

2.5 Validation

The ISSM is validated in AD mode by continuously running a test suite within the Jenkins (Smart, 2011) integration and development framework (available here at http://issm.jpl.nasa.gov/developers/

Web End =http://issm.jpl. http://issm.jpl.nasa.gov/developers/

Web End =nasa.gov/developers/ ). A detailed description of the suite of benchmarks is given in Table 3. The aim is to (1) compare forward runs in the ISSM with their counterparts when overloaded operators are switched on (the results should be identical within double-precision tolerances); and to (2) compare forward and reverse runs carried out with the ISSM AD on and off, using the GSL and MUMPS solvers. Comparisons of gradients computed in AD mode with standard forward difference methods are also carried out to make sure the gradients computed (essentially in reverse scalar mode, which is the mode of predilection in the ISSM for data assimilation) in AD mode are accurate.

3 Application

3.1 Application

The ongoing use of adjoint computations includes sensitivity studies and state estimation problems for transient model runs. Because this paper concentrates on technical aspects, we show, only as exemplary evidence of the practical usefulness, some sensitivities of cost functions calibrated for two sensors commonly encountered in cryosphere science, altimeters (that measure surface elevation) and radars (that measure surface displacement or velocity). In Fig. 3, the sensitivity (gradient) of a cost function related to ICESat-1 altimetry with respect to surface mass balance (SMB) is shown

Geosci. Model Dev., 9, 39073918, 2016 www.geosci-model-dev.net/9/3907/2016/

E. Larour et al.: Discrete adjoints for MPI models applied to the ISSM 3913

1 i n t ISSM_MPI_Reduce ( void * sendbuf , void * recvbuf , i n t count ,

2 ISSM_MPI_Datatype d a t a t y p e , ISSM_MPI_Op op , i n t root ,

3 ISSM_MPI_Comm comm){

4 i n t r c =0;

5 # i f d e f _HAVE_MPI_6 # i f d e f _HAVE_AMPI_7 r c =AMPI_Reduce ( sendbuf , recvbuf , count , d a t a t y p e , op , root , comm ) ; 8 # e l s e9 r c =MPI_Reduce ( sendbuf , recvbuf , count , d a t a t y p e , op , root , comm ) ; 10 # e n d i f11 # e l s e12 # i f d e f _HAVE_ADOLC_13 i f ( d a t a t y p e ==ISSM_MPI_DOUBLE) {14 IssmDouble * a c t i v e S e n d B u f =( IssmDouble *) sendbuf ;

15 IssmDouble * activeRecvBuf =( IssmDouble *) r e c v b u f ;16 for ( i n t i =0; i < count ;++ i ) activeRecvBuf [ i ]= a c t i v e S e n d B u f [ i ] ;

18 }19 e l s e20 # e n d i f21 memcpy ( recvbuf , sendbuf , s i z e H e l p e r ( d a t a t y p e )* count ) ;

22 # e n d i f23 return r c ;

24 }

Figure 1. Code for wrapping reduction.

/ / t h i s d e c l a r a t i o n comes from t h e wrapper :extern ISSM_MPI_Datatype ourISSM_MPI_DOUBLEVal , ourISSM_MPI_DOUBLEVal ;

template < c l a s s T , ISSM_MPI_Datatype *mt_p> void foo (T * t ){

ISSM_MPI_Reduce ( t , . . . , ,* mt_p , . . ) ; }

void bar ( IssmDouble *x , IssmPDouble *p ) { foo <IssmDouble ,&ourISSM_MPI_DOUBLEVal >( x ) ;

foo <IssmPDouble ,&ourISSM_MPI_PDOUBLEVal >( p ) ; }

Figure 2. Code snippets for templating with corresponding MPI data types.

for three epochs, September 2003, June 2006, and February 2009. The cost function JAltimetry is the spatio-temporal average of the mist between modeled surface elevations (from the mass-transport module of the transient solution in the ISSM) and the corresponding ICESat-1 altimetry record. This cost function was used within an inversion method to reconstruct SMB over the entire ICESat-1 time record (Larour et al., 2014). This example would correspond to the case where function f from Eq. (1) is a transient ice-ow model

(including mass transport and stress balance, not thermal, described in Larour et al., 2014), fully nonlinear in its treatment of the material rheology, and highly resolved both spatially (kilometer resolution at the coastline) and temporally (2-week time step). The nal output (corresponding to y in Eq. 1) of the model is the altimetry cost function JAltimetry itself and the model input SMB (corresponding to x in Eq. 1). The sensitivity displayed in Fig.3 is therefore the AD com-

www.geosci-model-dev.net/9/3907/2016/ Geosci. Model Dev., 9, 39073918, 2016

3914 E. Larour et al.: Discrete adjoints for MPI models applied to the ISSM

Table 2. Types per variant from Table 1.

ISSM_MPI_DOUBLE IssmDouble (= TA) ISSM_MPI_PDOUBLE IssmPDouble (=TP)

#define to typedef to #define to typedef to

1 2 double 3 double

2 2 adouble1 3 double

3 MPI_DOUBLE double MPI_DOUBLE double

4 AMPI_ADOUBLE2 adouble1 MPI_DOUBLE double

1 Active type from ADOL-C. 2 Active MPI type from AMPI.

Table 3. ISSM AD validation suite integrated within Jenkins for continuous integration and delivery (Smart, 2011). Tests 3001 to 3010 and 3101 to 3110 test the repeatability of forward runs with and without ADOL-C compiled, but with no AD drivers specically called. The forward runs involved are the standard stress balance, mass transport, and thermal solutions, with 2-D SSA (shelfy-stream approximation MacAyeal, 1989), 3-D SSA, 3-D HO (higher-order, Blatter, 1995; Pattyn, 1996), 3-D full-Stokes (Stokes, 1845), and DG (discontinuous-Galerkin) formulations. Test 3015 tests the AD GSL capability by comparing the AD forward scalar mode (where we compute the gradient of ice volume with respect to ice thickness at one vertex of the mesh) against a simple forward differences computation on the same gradient. Test 3019 validates the AD GSL capability in reverse scalar mode (where we compute ice volume gradient with respect to thickness at all vertices of the mesh) vs. the forward vectorial mode. Both gradients should be identical. Finally, test3119 validates the parallel capabilities (AD MUMPS) by comparing reverse scalar computations with GSL and MUMPS.

Test Solution sequence Formulation Solver Description

3001/3101 Stress balance 2-D SSA GSL/MUMPS Equal runs with overload on and off 3002/3102 Stress balance 3-D SSA GSL/MUMPS Equal runs with overload on and off 3003/3103 Stress balance 3-D HO GSL/MUMPS Equal runs with overload on and off 3004/3104 Stress balance 3-D FS GSL/MUMPS Equal runs with overload on and off 3005/3105 Mass transport 2-D GSL/MUMPS Equal runs with overload on and off 3006/3106 Mass transport 2-D (DG) GSL/MUMPS Equal runs with overload on and off 3007/3107 Mass transport 3-D GSL/MUMPS Equal runs with overload on and off 3008/3108 Thermal steady state 3-D GSL/MUMPS Equal runs with overload on and off 3009/3109 Thermal transient 3-D GSL/MUMPS Equal runs with overload on and off 3010/3110 Transient 2-D 2-D GSL/MUMPS Equal runs with overload on and off3015 Mass transport 2-D 2-D GSL AD forward scalar vs. forward differences 3019 Thermal transient 3-D GSL AD reverse scalar vs. AD forward vectorial 3119 Thermal transient 3-D MUMPS vs. GSL AD reverse scalar (MUMPS vs. GSL) scalar

puted @JAltimetry@SMB at different time steps corresponding to the

transient model input SMB.

A second type of observation garnering considerable interest is temporally resolved time series of radar observations (using speckle-tracking or InSAR to infer surface deformation of the ice) to measure variations of surface velocity on a seasonal timescale (one observation every 2 months, as in Moon et al., 2015). Temporally inverting for such time series, trying to reconstruct for example basal friction, is a topic of interest due to its relevance in terms of understanding the dynamics of calving, basal slip, and shear softening, and associated feedback mechanisms. In Fig. 4, we demonstrate the feasibility of such inversions by computing the gradient of a cost function JVelocity related to the Moon et al. (2015) time series with respect to basal drag at epochs 2008, 2010, and 2013. The cost function is the spatio-temporal average of the mist between modeled surface velocity (from the stress balance module of the transient solution in the ISSM) and observed surface velocities. This example would correspond

to the case where function f from Eq. (1) is the same transient ice-ow model described previously, and the nal output (corresponding to y in Eq. 1) of the model is the velocity cost function JVelocity itself and the model input basal friction (corresponding to x in Eq. 1). The sensitivity displayed in

Fig. 4 is therefore the AD computed @JVelocity@ at different time

steps corresponding to the transient model input .

3.2 Benchmarking

Currently, these adjoint computations are performed on the Pleiades cluster at the NASA Advanced Supercomputing Center. Given the fact that the effort for computing the ad-joint rf relies on one reverse sweep, the practical ques

tion becomes what the actual runtime overhead of computing both f and rf when compared to computing just the

original f is. Here in particular one considers the effects of compiler optimization on the original code for f using the built-in oating-point operations on one side and in contrast

Geosci. Model Dev., 9, 39073918, 2016 www.geosci-model-dev.net/9/3907/2016/

E. Larour et al.: Discrete adjoints for MPI models applied to the ISSM 3915

Figure 3. Gradient of the surface altimetry cost function (spatio-temporal average of the mist between the 6-year record of observed ICESat altimetry and modeled surface elevation) with respect to SMB (in years). The gradient is computed for every time step of a 2-week interval time series between 2003 and 2009. We only show three periods (September 2003, June 2006, and February 2009) in the entire interval. The location for the study is the northeastern Greenland Ice Stream, and the gradient is taken from an inversion study of the surface forcings necessary to best t the ICESat altimetry record (Larour et al., 2014).

2008 2010 2013

Figure 4. Gradient of the surface velocity cost function (spatio-temporal average of the mist between a 5-year time series of observed surface velocities and the modeled surface velocity) with respect to the basal drag coefcient (in yr1/2 m5/2). The gradient is computed for every time step of a 2-week interval time series between 2008 and 2013. We only show three periods (2008, 2010, and 2013) in the entire interval. The location for the study is Upernavik Glacier, central West Greenland, and the gradient is taken from an inversion study of the basal forcings required to match observed RADAR time series of surface velocities and calving front positions.

to that the overhead incurred by calling the overloaded operators as well as the creation, storage, and interpretation of the trace T (as the means to compute rf ) on the other. Aside

from the fact that the tuning of the ISSM adjoint is a work in progress, we want to highlight the overwhelming impact of the application-specic aspects on the runtime ratio. Therefore, we emphasize that exhibiting any particular overhead number as the ultimate result of scenario-specic tuning is of little use to the practitioner wanting to answer science questions. Rather, using examples from the ISSM work, we want

to show what may prevent us from achieving a satisfactory overhead.

The earliest ISSM adjoint computations took place before the MPI wrapper library was introduced and therefore were done sequentially with the GSL solvers. To evaluate the performance, a representative test case was chosen from the ISSM regression test suite, test 101. This test models the steady-state stress balance of a square ice shelf of size 50 km [notdef] 50 km and thickness 1000 m (at the grounding

line) to 300 m (at the ice front). The ice shelf is clamped

www.geosci-model-dev.net/9/3907/2016/ Geosci. Model Dev., 9, 39073918, 2016

3916 E. Larour et al.: Discrete adjoints for MPI models applied to the ISSM

of mesh elements generated for a smaller-resolution parameter. The parallel runs were carried out using eight CPUs.The overhead factors are shown separately for the overloading as such and the generation of T and R(T ) as wall-clock comparisons to the unmodied model compiled with default optimization (O2) in Fig. 5 (upper frame). While this plot

indicates a small overhead factor of 4.5 in particular for

the largest mesh case (distance 12.5 km), the reason for this becomes apparent in the plot in the lower frame. It shows that the majority of the runtime is consumed by the GSL solver (libgsl), completely overshadowing any of the overhead caused by the adjoint for the largest mesh. We want to emphasize that GSL was chosen not for its efciency, but for the simplicity of the setup, which quickly enabled ad-joint computations. Introducing AMPI and thereby moving to a more appropriate solver (MUMPS) causes the adjoint overhead to become more prominent. The most consequential change necessitated by the use of MPI is the forced contiguity of the pseudo addresses (see Sect. 2.2).

In combination with the much reduced impact of the solver, the overhead factor for equivalent test problems reached temporarily up to 145, which clearly was not acceptable. Subsequent changes to the ADOL-C tool included improvements in the internal address management. Changes to the model included modications of the sparse data format, enabling the control of the I/O buffering for the trace T and the ADOL-C address manager via the ISSM conguration specic to the setup to be computed. The combination of these changes led to regaining of better performance and effective overall overheads between 10 and 30. Analyzing the details of the performance shows that the overhead factor for the trace creation and interpretation does not change signicantly (see Fig. 6, bottom) with the mesh size, in accordance with the theoretical result for the adjoint computation.The large majority of the total overhead, evidenced by the runtime portion for libadolc in Fig. 6 (top), still originates in the internal address management. While the overall overhead factors are sufciently small for the practical use of the adjoints for science problems, further improvements in the addressing scheme are clearly warranted and the subject of ongoing work.

4 Conclusions

We developed a new adjoint capability in the ISSM, based on the ADOL-C framework and the AdjoinableMPI library, which is to our knowledge the rst time this type of approach has been systematically applied to a software framework of this size and complexity. Despite the difculties encountered in rewriting the software, the overloaded approach is transparent to the user, which is critical given the size of the larger cryosphere science community that is not familiar with the adjoint work, and for which classic approaches such as source-to-source transformation may not be applica-

Geosci. Model Dev., 9, 39073918, 2016 www.geosci-model-dev.net/9/3907/2016/

100 50 25 12.5

0.8

0.6

0.4

0.2

100 50 25 12.5

Figure 5. Test problem setup with the sequential GSL solver; plots over max mesh distance show overhead factors (upper frame) and approximate runtime portions of the model execution with adjoint computation (lower frame).

0.8

0.6

0.4

0.2

100 50 25 12.5

TTrace overhead

1.5

0.5

100 50 25 12.5

Figure 6. Test problem setup with the parallel MUMPS solver; plots over max mesh distance show approximate runtime portions of the model execution with adjoint computation (upper frame) and overhead factors for the trace creation and interpretation (lower frame).

at the grounding line, free to ow at the ice front (on one side only), with nonlinear viscosity dependent on Glens law (Glen, 1955, 1958). The formulation relies on the shelfy-stream approximation (SSA, MacAyeal, 1989), and is inherently nonlinear, relying on a Picard iteration for convergence. For more details on this test, we refer the reader to Larour et al. (2012). For benchmarking, the problem size was indirectly set by specifying a measure for the maximal resolution of the generated mesh, thus increasing the number

E. Larour et al.: Discrete adjoints for MPI models applied to the ISSM 3917

ble or may prove more cumbersome. The exibility of this approach allows in particular for quick turn-around in developing adjoint models of new parameterizations that are not easily hand-derived. This is a major advantage in that it opens this approach to the wider community. This, given the large amount of remote sensing data currently being collected and under-utilized, could prove paramount if we are to hindcast validate projections of sea-level rise. Further work is of course required to bring in additional observations such as gravity sensors, or radar stratigraphy observations, which will involve development of new cost functions, and scalability in 3-D. Though this is complex in that it requires integrated resiliency and adjoint checkpointing schemes for long-running transient modeling scenarios, our approach has proven exible, and should lead to a brand new set of data assimilation capabilities that have already been available to other Earth science communities for a long time. Indeed, by allowing temporal data assimilation for a large number of sensors and models, such as demonstrated here with the use of altimetry and radar sensors for mass transport and stress balance models, respectively, the ISSM paves the way for wider integration between the modeling and observational cryosphere communities.

5 Code availability

The ISSM code and its AD components are available at http://issm.jpl.nasa.gov.

Web End =http: http://issm.jpl.nasa.gov.

Web End =//issm.jpl.nasa.gov. The instructions for the compilation of the ISSM in AD mode, along with test cases, are presented in the Supplement attached to this paper.

The Supplement related to this article is available online at http://dx.doi.org/10.5194/gmd-9-3907-2016-supplement

Web End =doi:10.5194/gmd-9-3907-2016-supplement .

References

AdjoinableMPI: AdjoinableMPI wiki, https://trac.mcs.anl.gov/projects/AdjoinableMPI/wiki

Web End =https://trac.mcs.anl.gov/ https://trac.mcs.anl.gov/projects/AdjoinableMPI/wiki

Web End =projects/AdjoinableMPI/wiki , last access: 19 October 2016.

ADOL-C: ADOL-C, http://www.coin-or.org/projects/ADOL-C.xml

Web End =http://www.coin-or.org/projects/ADOL-C. http://www.coin-or.org/projects/ADOL-C.xml

Web End =xml (last access: 19 October 2016), 2007.

Amestoy, P. R., Duff, I. S., Koster, J., and LExcellent, J.-Y.: A Fully

Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling, SIAM J. Matrix Anal. Appl., 23, 1541, 2001.Applegate, P. J., Kirchner, N., Stone, E. J., Keller, K., and Greve, R.:

An assessment of key model parametric uncertainties in projections of Greenland Ice Sheet behavior, The Cryosphere, 6, 589 606, doi:http://dx.doi.org/10.5194/tc-6-589-2012

Web End =10.5194/tc-6-589-2012 http://dx.doi.org/10.5194/tc-6-589-2012

Web End = , 2012.

Arthern, R. J. and Gudmundsson, G. H.: Initialization of ice-sheet forecasts viewed as an inverse Robin problem, J. Glaciol., 56, 527533, 2010.

Aschwanden, A., Aalgeirsdttir, G., and Khroulev, C.: Hindcasting to measure ice sheet model sensitivity to initial states, The Cryosphere, 7, 10831093, doi:http://dx.doi.org/10.5194/tc-7-1083-2013

Web End =10.5194/tc-7-1083-2013 http://dx.doi.org/10.5194/tc-7-1083-2013

Web End = , 2013.Bindschadler, R., Nowicki, S., Abe-Ouchi, A., Aschwanden, A.,

Choi, H., Fastook, J., Granzow, G., Greve, R., Gutowski, G., Herzfeld, U., Jackson, C., Johnson, J., Khroulev, C., Levermann,A., Lipscomb, W., Martin, M., Morlighem, M., Parizek, B., Pollard, D., Price, S., Ren, D., Saito, F.and Sato, T., Seddik, H., Seroussi, H., Takahashi, K., Walker, R., and Wang, W.: Ice-Sheet Model Sensitivities to Environmental Forcing and Their Use in Projecting Future Sea-Level (The SeaRISE Project), J. Glaciol., 59, 195224, doi:http://dx.doi.org/10.3189/2013JoG12J125

Web End =10.3189/2013JoG12J125 http://dx.doi.org/10.3189/2013JoG12J125

Web End = , 2013.

Blatter, H.: Velocity And Stress-Fields In Grounded Glaciers: A Simple Algorithm For Including Deviatoric Stress Gradients, J.

Glaciol., 41, 333344, 1995.

Cornford, S., Martin, D., Graves, D., Ranken, D. F., Le Brocq,A. M., Gladstone, R., Payne, A., Ng, E., and Lipscomb, W.: Adaptive mesh, nite volume modeling of marine ice sheets,J. Comput. Phys., 232, 529549, doi:http://dx.doi.org/10.1016/j.jcp.2012.08.037

Web End =10.1016/j.jcp.2012.08.037 http://dx.doi.org/10.1016/j.jcp.2012.08.037

Web End = , 2013.

Galassi, M. E. A.: GNU Scientic Library Reference Manual, Network Theory Ltd, 3rd Edn., http://www.gnu.org/software/gsl/manual/gsl-ref.ps.gz

Web End =http://www.gnu.org/software/gsl/ http://www.gnu.org/software/gsl/manual/gsl-ref.ps.gz

Web End =manual/gsl-ref.ps.gz (last access: 19 October 2016), 2009.Giles, M.: Collected Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation, in: Advances in Automatic Differentiation, Springer Berling Heidelberg, Berling, Heidelberg, 3544, 2008.

Glen, J.: The creep of polycrystalline ice, Proc. R. Soc. A, 228,519538, 1955.

Glen, J.: The ow law of ice: A discussion of the assumptions made in glacier theory, their experimental foundations and consequences, IASH Publ., 47, 171183, 1958.

Goldberg, D. N. and Sergienko, O. V.: Data assimilation using a hybrid ice ow model, The Cryosphere, 5, 315327, doi:http://dx.doi.org/10.5194/tc-5-315-2011

Web End =10.5194/tc-5-315-2011 http://dx.doi.org/10.5194/tc-5-315-2011

Web End = , 2011.

Goldberg, D. N. and Heimbach, P.: Parameter and state estimation with a time-dependent adjoint marine ice sheet model, The Cryosphere, 7, 16591678, doi:http://dx.doi.org/10.5194/tc-7-1659-2013

Web End =10.5194/tc-7-1659-2013 http://dx.doi.org/10.5194/tc-7-1659-2013

Web End = , 2013.Goldberg, D. N., Heimbach, P., Joughin, I., and Smith, B.: Committed retreat of Smith, Pope, and Kohler Glaciers over the next 30 years inferred by transient model calibration, The Cryosphere, 9, 24292446, doi:http://dx.doi.org/10.5194/tc-9-2429-2015

Web End =10.5194/tc-9-2429-2015 http://dx.doi.org/10.5194/tc-9-2429-2015

Web End = , 2015.

Griewank, A. and Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Society for In-

www.geosci-model-dev.net/9/3907/2016/ Geosci. Model Dev., 9, 39073918, 2016

Acknowledgements. DE-AC02-06CH11357. Eric Larour and Jean Utke were supported by the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the NASA Cryospheric Sciences and Modeling and Analysis Programs. Gilberto Perez was supported by a subcontract from the Jet Propulsion Laboratory to the University of California at Irvine and Mathieu Morlighem was supported under a contract with the NASA IceBridge research program.

Edited by: D. GoldbergReviewed by: L. Hascoet and one anonymous referee

3918 E. Larour et al.: Discrete adjoints for MPI models applied to the ISSM

dustrial and Applied Mathematics, Philadelphia, PA, USA, second Edn., 2008.

Griewank, A., Juedes, D., Mitev, H., Utke, J., Vogel, O., and Walther, A.: ADOL-C: A Package for the Automatic Differentiation of Algorithms Written in C/C++; this is the updated version of the paper published in ACM TOMS, vol. 22, June 1996, Algor, 755, 131167, 1996.

Habermann, M., Maxwell, D., and Truffer, M.: Reconstruction of basal properties in ice sheets using iterative inverse methods, J.Glaciol., 58, 795807, doi:http://dx.doi.org/10.3189/2012JoG11J168

Web End =10.3189/2012JoG11J168 http://dx.doi.org/10.3189/2012JoG11J168

Web End = , 2012.Habermann, M., Truffer, M., and Maxwell, D.: Changing basal conditions during the speed-up of Jakobshavn Isbr, Greenland, The Cryosphere, 7, 16791692, doi:http://dx.doi.org/10.5194/tc-7-1679-2013

Web End =10.5194/tc-7-1679-2013 http://dx.doi.org/10.5194/tc-7-1679-2013

Web End = , 2013.Heimbach, P.: The MITgcm/ECCO adjoint modelling infrastructure, CLIVAR Exchanges, 44 (Volume 13, No. 1), 1317, http://scholar.google.com/scholar?q=related:M8_kkL0Y1rUJ:scholar.google.com/&hl=en&num=20&as_sdt=0,5

Web End =http://scholar.google.com/scholar?q=related:M8_kkL0Y1rUJ: http://scholar.google.com/scholar?q=related:M8_kkL0Y1rUJ:scholar.google.com/&hl=en&num=20&as_sdt=0,5

Web End =scholar.google.com/&hl=en&num=20&as_sdt=0,5 (last access:19 October 2016), 2008.

Heimbach, P. and Bugnion, V.: Greenland ice-sheet volume sensitivity to basal, surface and initial conditions derived from an adjoint model, Ann. Glaciol., 50, 6780, 2009.

Khazendar, A., Rignot, E., and Larour, E.: Larsen B Ice Shelf rheology preceding its disintegration inferred by a control method, Geophys. Res. Lett., 34, 16, doi:http://dx.doi.org/10.1029/2007GL030980

Web End =10.1029/2007GL030980 http://dx.doi.org/10.1029/2007GL030980

Web End = , 2007.Larour, E., Rignot, E., Joughin, I., and Aubry, D.: Rheology of the

Ronne Ice Shelf, Antarctica, inferred from satellite radar interferometry data using an inverse control method, Geophys. Res.Lett., 32, 14, doi:http://dx.doi.org/10.1029/2004GL021693

Web End =10.1029/2004GL021693 http://dx.doi.org/10.1029/2004GL021693

Web End = , 2005.

Larour, E., Seroussi, H., Morlighem, M., and Rignot, E.: Continental scale, high order, high spatial resolution, ice sheet modeling using the Ice Sheet System Model (ISSM), J. Geophys. Res., 117, 120, doi:http://dx.doi.org/10.1029/2011JF002140

Web End =10.1029/2011JF002140 http://dx.doi.org/10.1029/2011JF002140

Web End = , 2012.

Larour, E., Utke, J., Csatho, B., Schenk, A., Seroussi, H., Morlighem, M., Rignot, E., Schlegel, N., and Khazendar, A.: Inferred basal friction and surface mass balance of the Northeast Greenland Ice Stream using data assimilation of ICESat (Ice Cloud and land Elevation Satellite) surface altimetry and ISSM (Ice Sheet System Model), The Cryosphere, 8, 23352351, doi:http://dx.doi.org/10.5194/tc-8-2335-2014

Web End =10.5194/tc-8-2335-2014 http://dx.doi.org/10.5194/tc-8-2335-2014

Web End = , 2014.

MacAyeal, D.: Large-scale ice ow over a viscous basal sediment: Theory and application to Ice Stream B, Antarctica, J. Geophys.

Res., 94, 40714087, 1989.

MacAyeal, D.: A tutorial on the use of control methods in ice-sheet modeling, J. Glaciol., 39, 9198, 1993.

Monnier, J.: DassFlow: Data Assimilation for Free Surface Flows, http://www.math.univ-toulouse.fr/DassFlow

Web End =http://www.math.univ-toulouse.fr/DassFlow (last access: 19 October 2016), 2010.

Moon, T., Joughin, I., and Smith, B.: Seasonal to multiyear variability of glacier surface velocity, terminus position, and sea ice/ice mlange in northwest Greenland, J. Geophys. Res.-Earth, 120, 818833, 2015.

Morlighem, M., Rignot, E., Seroussi, H., Larour, E., Ben Dhia, H., and Aubry, D.: Spatial patterns of basal drag inferred using control methods from a full-Stokes and simpler models for Pine Island Glacier, West Antarctica, Geophys. Res. Lett., 37, 16, doi:http://dx.doi.org/10.1029/2010GL043853

Web End =10.1029/2010GL043853 http://dx.doi.org/10.1029/2010GL043853

Web End = , 2010.

Morlighem, M., Rignot, E., Mouginot, J., Seroussi, H., and Larour,E.: High-resolution ice thickness mapping in South Greenland, Ann. Glaciol., 55, 6470, doi:http://dx.doi.org/10.3189/2014AoG67A088

Web End =10.3189/2014AoG67A088 http://dx.doi.org/10.3189/2014AoG67A088

Web End = , 2014. Nowicki, S., Bindschadler, R., Abe-Ouchi, A., Aschwanden, A.,

Bueler, E., Choi, H., Fastook, J., Granzow, G., Greve, R., Gutowski, G., Herzfeld, U., Jackson, C., Johnson, J., Khroulev,C., Larour, E., Levermann, A., Lipscomb, W., Martin, M., Morlighem, M., Parizek, B., Pollard, D., Price, S., Ren, D., Rignot, E., Saito, F., Sato, T., Seddik, H., Seroussi, H., Takahashi,K., Walker, R., and Wang, W.: Insights into spatial sensitivities of ice mass response to environmental change from the SeaRISE ice sheet modeling project I: Antarctica, J. Geophys. Res., 118, 123, doi:http://dx.doi.org/10.1002/jgrf.20081

Web End =10.1002/jgrf.20081 http://dx.doi.org/10.1002/jgrf.20081

Web End = , 2013.

Pattyn, F.: Numerical modelling of a fast-owing outlet glacier: experiments with different basal conditions, Ann. Glaciol., 23, 237246, 1996.

Pawlowski, R. P., Phipps, E. T., and Salinger, A. G.: Automating embedded analysis, CoRR, abs/1205.0790, http://arxiv.org/abs/1205.0790

Web End =http://arxiv.org/abs/ http://arxiv.org/abs/1205.0790

Web End =1205.0790 (last access: 19 October 2016), 2012.

Phipps, E. T., Bartlett, R. A., Gay, D. M., and Hoekstra, R. J.: Large-Scale Transient Sensitivity Analysis of a Radiation-Damaged Bipolar Junction Transistor via Automatic Differentiation, in: Advances in Automatic Differentiation, Springer Berlin Heidelberg, 351362, 2008.

Price, S., Payne, A., Howat, I., and Smith, B.: Committed sea-level rise for the next century from Greenland ice sheet dynamics during the past decade, P. Natl. Acad. Sci. USA, 108, 89788983, 2011.

Rommelaere, V. and MacAyeal, D.: Large-scale rheology of the Ross Ice Shelf, Antarctica, computed by a control method, Ann. Glaciol., 24, 4348, 1997.

Smart, J. F.: Jenkins: The Denitive Guide, OReilly Media, Inc.,2011.

Stokes, G.: On the theories of internal friction of uids in motion,Trans. Cambridge Philos. Soc., 8, 287305, 1845.

The Trilinos Project: Trilinos Home Page, http://trilinos.sandia.gov/

Web End =http://trilinos.sandia.gov/ (last access: 19 October 2016), 2014.

Utke, J., Hascot, L., Heimbach, P., Hill, C., Hovland, P., and Naumann, U.: Toward adjoinable MPI, 2009 IEEE International Symposium on Parallel & Distributed Processing, 18, 2009.

Geosci. Model Dev., 9, 39073918, 2016 www.geosci-model-dev.net/9/3907/2016/

Word count: 9431

Show less

Abstract

Translate

Within the framework of sea-level rise projections, there is a strong need for hindcast validation of the evolution of polar ice sheets in a way that tightly matches observational records (from radar, gravity, and altimetry observations mainly). However, the computational requirements for making hindcast reconstructions possible are severe and rely mainly on the evaluation of the adjoint state of transient ice-flow models. Here, we look at the computation of adjoints in the context of the NASA/JPL/UCI Ice Sheet System Model (ISSM), written in C++ and designed for parallel execution with MPI. We present the adaptations required in the way the software is designed and written, but also generic adaptations in the tools facilitating the adjoint computations. We concentrate on the use of operator overloading coupled with the AdjoinableMPI library to achieve the adjoint computation of the ISSM. We present a comprehensive approach to (1) carry out type changing through the ISSM, hence facilitating operator overloading, (2) bind to external solvers such as MUMPS and GSL-LU, and (3) handle MPI-based parallelism to scale the capability. We demonstrate the success of the approach by computing sensitivities of hindcast metrics such as the misfit to observed records of surface altimetry on the northeastern Greenland Ice Stream, or the misfit to observed records of surface velocities on Upernavik Glacier, central West Greenland. We also provide metrics for the scalability of the approach, and the expected performance. This approach has the potential to enable a new generation of hindcast-validated projections that make full use of the wealth of datasets currently being collected, or already collected, in Greenland and Antarctica.

Details

Title

An approach to computing discrete adjoints for MPI-parallelized models applied to Ice Sheet System Model 4.11

Author

Larour, Eric; Utke, Jean; Bovin, Anton; Morlighem, Mathieu; Perez, Gilberto

Pages

3907-3918

Publication year

2016

Publication date

2016

Publisher

Copernicus GmbH

ISSN

1991962X

e-ISSN

19919603

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/gmd-9-3907-2016

ProQuest document ID

1834543107

An approach to computing discrete adjoints for MPI-parallelized models applied to Ice Sheet System Model 4.11

Jump to:

Full Text

Abstract

Details

Suggested sources