Seon Ki Park1,* and Eugenia Kalnay1,2,3,*
3National Center for Environmental Prediction
1Cooperative Institute for Mesoscale Meteorological Studies
2School of Meteorology
University of Oklahoma, Norman, OK 73019, U.S.A.
NWS/NOAA, Washington, D.C., 20233, U.S.A.
*Current affiliation: Department of Meteorology, University of Maryland, College Park, MD 20742, U.S.A.
1. Introduction
Using a theoretical relationship (i.e., physical laws), for given values of model parameters, a `direct' (or forward) problem aims at predicting the values of some observable quantities. In contrast, for given measurements of observable quantities, an `inverse' problem aims at obtaining the values of model parameters (Tarantola 1987).
Over the past two decades, many inverse problems in meteorology have been solved using the adjoint models of corresponding meteorological prediction systems. These include the generation of singular vectors for ensemble prediction (e.g., Molteni et al. 1996); four-dimensional variational data assimilation (e.g., Lewis and Derber 1985; Le Dimet and Talagrand 1986; Courtier et al. 1994); forecast sensitivity to the initial conditions (Rabier et al. 1996; Pu et al. 1997a); and targeted observations (e.g., Rohaly et al. 1998; Pu et al. 1998).
The four-dimensional variational data assimilation (4D-Var) using the adjoint model, in which a cost function (weighted squared distance between model solutions and observations) is minimized via iterative processes, is considered as the most promising tool for fitting model solutions to observations and has received considerable attention in recent years (Derber 1989; Courtier et al. 1994; Zupanski 1993; and many other papers). However, a major problems with the adjoint 4D-Var is that it takes a large computational cost to obtain an optimal initial condition because several hundreds iterations are often required for the minimizations process to converge. Thus implementing the complete adjoint 4D-Var scheme into the operational system is practically infeasible. For example, ECMWF has the most powerful supercomputer available but still has implemented a 4D-Var package into its operational system with only a simplified version (Bouttier and Rabier 1997). In addition, the assimilation period affects the quality of data assimilation results. An inadequately long assimilation period may lead to a cost function which has multiple minima and thus convergence to a local minium (Li 1991). An inadequately short period may not allow the information of assimilated data to propagate to the whole model domain (Luong et al. 1998).
When applied to a stormscale prediction model, whose intrinsic predictability limit extends to only a few hours, the time required for data assimilation becomes more restrictive. From our experiences with the operational testing of a stormscale model, the Advanced Regional Prediction System (ARPS; Xue et al. 1995), the model outputs for the 7-hr forecast are available about after 3 hrs without the 4D-Var process (e.g., Droegemeier et al. 1996). When the 4D-Var is included additional computing times are required, which results in a shorter lead time. Thus the data assimilation process needs to be finished in a short time. Although the assimilation period can be set to be very short, e.g., a few minutes (Wang et al. 1997; Park and Droegemeier 1997a), the obtained optimal initial conditions do not necessarily produce optimal forecast for a forecast time longer than the assimilation period because the stormscale models include many strongly nonlinear microphysical processes. In addition, the information from the assimilated data might not be able to propagate to the whole model domain within such a short assimilation period (Luong et al. 1998). Furthermore, the linear adjoint model involves errors in describing correct gradient information of the original nonlinear forecast model and those errors can be serious for stormscale models in which strongly nonlinear and discontinuous microphysical processes are included (see Park and Droegemeier 1997b).
Recently Wang et al. (1997) suggested the use of inverse model integrations in order to accelerate the convergence of 4D-Var. They developed a new optimization method called adjoint Newton algorithm (ANA) and demonstrated the efficiency of the ANA when applied to a simplified 4D-Var problem using ARPS. In the ANA, the Newton descent direction is obtained by a backward integration of the tangent linear model (TLM), in which the sign of timestep is changed (i.e., inverse). Here, the sign of dissipative terms is also changed in order to avoid computational blow-up (i.e., quasi-inverse). By adopting this ``quasi-inverse'' approach and assuming the availability of a complete set of observations at final time, Wang et al. (1997) showed that the ANA converged in an order of magnitude fewer iterations, and to an error level an order of magnitude smaller than the conventional adjoint approach to solve the same (simplified) 4D-Var problem. However, their application of the quasi-inverse method was limited to finding the Newton descent direction only, and a complete data set at the end of the assimilation period.
The quasi-inverse technique can be applied to either the tangent linear or the full nonlinear model, each of which has advantages for different applications. Pu et al. (1997) showed that for the problem of forecast sensitivity, closely related to 4D-Var, a backward integration with the quasi-inverse TLM (Q-ILM) gave results far superior to those obtained using the adjoint model. The Q-ILM has been tested successfully at NCEP in several different applications, e.g., forecast error sensitivity analysis and data assimilation (Pu et al. 1997), and adaptive observations (Pu et al. 1998). Kalnay and Pu (1998) suggested the use of Q-ILM in a more general way in data assimilation problems. Kalnay et al. (1999) demonstrated, using the Q-ILMs of simple models, a much faster performance of the quasi-inverse method in data assimilation than the conventional adjoint 4D-Var method. The former is dubbed ``inverse 3D-Var'' (I3D-Var) in the sense that the observation is used only once at the end of the assimilation interval.
While the I3D-Var does not require to run either the adjoint model or the minimization process, the adjoint 4D-Var requires to run both the adjoint model, to provide the gradient information to the minimization algorithm, and the minimization process. For the minimization of a quadratic function such as the cost function in the 4D-Var, the full-Newton method is the most ideal algorithm because it brings to the minimum of cost function in just a single iteration. However, it has never been used in a realistic large-scale optimization problems because the computation and storage of the Hessian and its inverse are too costly to be practical (Gill 1981).
There exist several methods for large-scale unconstrained
optimization such as quasi-Newton (Gill and Murray 1972),
limited-memory quasi-Newton (LBFGS - Liu and Nocedal 1989),
truncated Newton (Nash 1984), adjoint truncated-Newton (Wang et
al. 1995), adjoint Newton (Wang et al. 1997, 1998), and conjugate
gradients methods (Navon and Legler 1987). When applied to
meteorological 4D-Var problems, these algorithms mostly require a
few tens to hundereds iterations to reach a local minimum of cost
function. Reviews on various optimization algorithms are given in
some texbooks, e.g., Gill (1981), Tarantola (1987), and Polak
(1997). In the following, we provide a brief overview on the
theoretical background of the I3D-Var. For a more detailed
discussion see Kalnay et al. (1999).
2. Formulation of I3D-Var
The major goal of I3D-Var is to find an increment in initial conditions that ``optimally'' corrects a perceived forecast error at the final time t. Here, the cost function includes both data and background distances. In order to maintain the ability to solve the minimization efficiently, however, the background term is estimated at the end of the interval, rather than at the beginning as in Lorenc (1986).
Assume (for the moment) that data yo is available at the end of the assimilation interval t, with =xa-xb, =yo-(xb). Here xa and xb are the analysis and first guess, respectively, and is the ``forward observation operator'' which converts the model first guess into first guess observations (see Ide et al. 1997 for notation). The cost function that we minimize is the 3D-Var cost function at the end of the interval. It is given by the distance to the background of the forecast at the end of the time interval (weighted by the inverse of the forecast error covariance B), plus the distance to the observations (weighted by the inverse of the observational error covariance R), also at the end of the interval:
(1) |
(2) |
From this equation we see that the gradient of the cost function is given by the backward adjoint integration of the rhs terms in (2). In the adjoint 4D-Var, an iterative algorithm (such as quasi-Newton, conjugate gradient, or steepest descent) is used to estimate an optimum perturbation:
(3) |
In the I3D-Var, however, we seek to obtain directly the ``perfect solution'' i.e., the that makes =0 for small . From (2) we can eliminate the adjoint operator, and obtain the ``perfect'' solution given by
(4) |
(5) |
This can be interpreted as starting from the 3D-Var analysis increment at the end of the interval and integrating backwards with the TLM or an approximation of it. If we do not include the forecast error covariance term B-1, (5) reduces to the ANA algorithm of Wang et al. (1997) except that we do not require line minimization. We have tested the I3D-VAR with the ARPS model and found that for this reason, the I3D-Var is computationally about twice as fast as Wang et al. (1997) ANA scheme.
Kalnay et al. (1999) proved that the I3D-Var is equivalent to
solving the minimization problem (at each time level) using the
full-Newton method. The results of Wang et al. (1997, 1998) and Pu
et al. (1997) support considerable optimism for this method. For a
quadratic function, the Newton algorithm (and the equivalent
I3D-Var) converges in a single iteration. Since the cost functions
used in the 4D-Var are close to quadratic functions, the I3D-Var
can be considered equivalent to perfect preconditioning of the
simplified 4D-Var problem.
3. Application to the Burgers' Equation
Figure 1 shows the performance of the I3D-Var and the 4D-Var (using LBFGS minimization algorithm) for a simple advection-diffusion problem (Burgers' equation) including a passive scalar transport, q. For an assimilation window of 81 time steps and errors of a 10% in initial velocity u and a 100% in q, using the I3D-Var in which observations are available only at the end of the interval, the cost function converges to 10-13 of its original value after 3 iterations. However, the 4D-Var with data at the end of the interval requires 44 equivalent model integrations (both forward and adjoint) for the cost function to converge to 10-10 of the original value. If we provide the 4D-Var with observations for every time step, the cost function converges to the same order in 12 time integrations.
If there are data at different time levels we can choose to bring the data increments to the same initial time level so that the increments corresponding to the different data can be averaged, with weights that may depend on the time level or the type of data. For applications in which ``knowing the future'' is allowed, such as reanalysis, the observational increments could be brought to the center of an interval, and used for the final analysis.
Figure 2 shows preliminary results of having multiple time levels in the observations using the Burgers' equation. Random errors are included in observations with a maximum amplitude of 10% of the total range. The observational increments from different observational times are brought backwards to the same initial time level, at which those increments are averaged to yield the averaged initial conditions. Then forecasts beyond the assimilation period (101 time steps) up to 141 time steps are compared for initial conditions obtained from the 4D-Var and those that are averaged from multiple (ensemble) I3D-Var. The results of the forecasts with one iteration of the I3D-Var were comparable to those of 20 iterations of the 4D-Var, whereas 3 iterations of the I3D-Var resulted in much better forecasts.
The I3D-Var also has some potential disadvantages. One of them is the growth of noise that projects on decaying modes during the backward integration (Kalnay et al. 1999). However, those errors decay again during the subsequent forward integration. We also found that the performance of I3D-Var is dependent upon the magnitude of dissipation. We believe that these problems can be overcome with further development and experimentation. At least the I3D-Var might serve as a preconditioner when carrying minimization in the framework of the 4D-Var. To test this idea, a preliminary experiment using the Burgers' equation is performed. In Fig. 3, using the initial conditions obtained from one iteration of the I3D-Var, the 4D-Var showed much better performance in minimizing the cost function. The result shows a possibility of using the I3D-Var as a preconditioner of the 4D-Var including full physics. The use of the I3D-Var may deal efficiently with the part of the spectrum of the Hessian of the cost function related to the dynamics part of the model. This will allow the minimization to focus only on the nonlinearities associated with the physics and result in a large computational economy (Navon, 1999; personal communication).
Some approaches similar to the I3D-Var, in the sense of the least squre fitting by solving the inverse matrix of the forward linear or nonlinear model operator, are also found in the inverse problems of other disciplines such as seismic waves (e.g., Gersztenkorn et al. 1986), electromagnetic data (e.g., Farquharson and Oldenburg 1998), and quantum dynamics (e.g., Caudill Jr. et al. 1994). For theoretical backgrounds on general inverse problems, see Tarantola (1989) and Anger (1993).
For a detailed discussion on the quasi-inverse method in the
context of data assimilation and some results with simple models
(Burgers' equation and Lorenz model), see Kalnay et al. (1999) and its web
page at ``http://weather.ou.edu/~ekalnay/INV3DVARpap/kal_web1_rev/''.
References
Corresponding Author: