\documentclass[12pt]{article}
\usepackage{wsh}
\setlength{\textwidth}{\paperwidth}
\addtolength{\textwidth}{1.6in}
\addtolength{\oddsidemargin}{0.8in}
%\addtolength{\textwidth}{2.in}
%\setlength{\oddsidemargin}{0.in}
\setlength{\evensidemargin}{\oddsidemargin}
\begin{document} \bibliographystyle{plain}
\markright{Acoustic parameter estimation by optimal control  W.S. Harlan}
\title{Parameter estimation of an acoustic\\
differential equation by optimal control}
\author{William S. Harlan}
\date{1984, 1986, 2002}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\def \xv {{\svector x}}
\def \wt {{~w(t)}}
\def \xs {{\xv_s}}
\def \xr {{\xv_r}}
\def \rx {{\rho(\xv)}}
\def \cx {{c(\xv)}}
\def \oor {{1 \over {\rx }}}
\def \oorr {{1 \over {\rx }^2}}
\def \ooccr {{1 \over {\cx^2 \rx }}}
\def \ooccrr {{1 \over {\cx^2 \rx^2 }}}
\def \oocccr {{1 \over {\cx^3 \rx }}}
\def \fvt {{\svector{f}(t)}}
\def \xt {{(\xv, t)}}
\def \Px {{P_s \xt }}
\def \qx {{q \xt }}
\def \mx {{\mv \xt }}
\def \sumrs {{\sum_{r,s}}}
\def \Drs {{D_{rs}(t)}}
\def \ddt {{{\partial}\over{\partial t}}}
\def \ddtt {{{\partial ^2}\over{\partial t^2}}}
\def \grad {{\svector \nabla}}
\def \div {{\grad \cdot}}
\def \Vmax {{V_{max}}}
\def \dx {{~d \xv }}
\def \dxt {{~d \xv \, d t}}
\def \dxtt {{~d \xv \, w(t) \, d t}}
\def \dxxi {{\delta (\xv  \xr )}}
\def \dxxs {{\delta (\xv  \xs )}}
\def \dP {{\delta P_s}}
\def \dPx {{\delta \Px}}
\def \dPdz {{{\partial \Px}\over{\partial z}}}
\def \dPdtt {{{\partial^2 \Px}\over{\partial t^2}}}
\def \ddz {{{\partial}\over{\partial z}}}
\def \intO {{\int_{\Omega}}}
\def \intS {{\int_{\partial \Omega}}}
\def \intT {{\int_0^T}}
\def \intint {{\intT \! \intO}}
\def \mv {{\svector m}}
\def \gradm {{\grad_{\mbox{\scriptsize \boldmath m}}}}
\def \bv {{\svector b}}
\def \nv {{\svector {\hat n} }}
\def \fs {{f_s}}
\def \ft {{\fs (t)}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\maketitle
\section {Introduction}
Usually numerical methods of inversion and
optimization begin with a discretized version of the
simulation equations.
Differential equations are often
easier to manipulate directly, so that discretization
can be delayed until last. Here is an example
of optimally perturbing acoustic parameters
to fit recorded acoustic waves. This approach
is seen most often by researchers in optimal control,
such as J.L. Lions \cite{lions}
who also publishes often in the field of geophysics.
This example derives from an explanation I received from
Patrick Lailly in 1984
at the Institut Fran\c cais du Petrole.
In 1986, I expanded on his example in my lab notebook.
After all these years, I still find it a useful example
of multidimensional seismic inversion, so I've finally typed it up.
In a separate paper \cite{ha:vsp}, I show a detailed
application with the onedimensional waveequation.
The methods in this particular paper should be much easier
to generalize to other equations and model parameters.
My preferred methods of optimization (GaussNewton)
require linearizing simulation equations and calculating
the adjoint of those linearized equations. Here is how
you can do most of that work analytically.
In the course of this derivation, I will derive what
is called reversetime migration, with an image directly related
to perturbations of physical parameters.
\section{Example linear equations and objective function}
Let us consider the experiment of a single exploration
seismic field gather with a seismic source places at a surface
point $\xs$ and with receivers placed at points
$\xr$. Each trace is recorded for a finite time $0 \leq t \leq T$.
Assume that land experiments record the derivative of pressure with depth.
(Marine experiments record the pressure directly, which is simpler.)
Let $\cx$ be the acoustic velocity, and $\rx$ the density.
Describe the pressure field $\Px$ with the following twodimensional
acoustic equation.
\begin{equation}
\label{eq:acoustic}
\left [ \ooccr \ddtt  \div \left ( \oor \grad \right )
\right ] \Px = \ft \delta (\xv\xs) \delta(t) ,
\end{equation}
where the source term
\begin{equation}
\label{eq:source}
\ft = \div [ \fvt / \rho(\xs)],
\end{equation}
and $\fvt$ contains body forces.
Use the free surface and causal time boundary conditions:
\begin{equation}
\label{eq:boundary}
\left. \Px \right_{z \leq 0} = 0 ,~
\left. \Px \right_{t \leq 0} = 0 ,~ \mbox{and}
\left. \ddtt \Px \right_{t \leq 0} = 0 ,
\end{equation}
where $z = x_1$ is the vertical dimension of $\xv$.
The surface boundary condition on $P$ is not sufficient and
must be extended before $P$ is welldetermined.
Choose a semispherical surface $\partial \Omega$ that cannot be reached by waves
traveling at the maximum velocity $\Vmax$ for the maximum recorded time $T$:
\begin{eqnarray}
\label{eq:hemisurf}
\mbox{ the surface }
\partial \Omega &=& \{ \xv ~~ z = 0 ~~ \mbox{or} ~ \parallel \xv \parallel
= T \Vmax \}, \\
\label{eq:hemiregion}
\mbox{ for the region }
\Omega &=& \{ \xv ~~ z \geq 0 ~~ \mbox{and} ~
\parallel \xv \parallel \leq T \Vmax \}.
\end{eqnarray}
The radius should be large enough so that no wave reaching this
boundary $\partial \Omega$
can be recorded during the time $0 \leq t \leq T$.
We can arbitrarily extend the free surface around this closed boundary
without affecting the modeled pressures within the recorded time:
\begin{equation}
\label{eq:bigfree}
\left. \Px \right_{\mathbf{x} \in \partial \Omega} = 0.
\end{equation}
\section{The variational form}
The acoustic differential equation (\ref{eq:acoustic})
can also be written in the following variational integral form,
also called the weak form:
\begin{eqnarray}
\label{eq:acousticint}
\intint \left \{ \left [ \ooccr \ddtt 
\div \left ( \oor \grad \right ) \right ] ~ \Px
 \ft \delta (\xv  \xs) \delta(t) \right \}
\qx \dxt = 0,
\end{eqnarray}
for all $H^2$ functions $\qx$.
This integral is true for all $q$ (whose second derivatives
are square integrable) if and only if the original equation
(\ref{eq:acoustic}) also holds.
Conveniently, finite elements \cite{hughes} also uses the weak form.
Zerovalue boundary conditions (\ref{eq:boundary}) and (\ref{eq:hemisurf})
are implied by the weak form, and are called natural boundary conditions.
\section{The objective function}
Let $\Drs$ be the pressure field recorded within the volume
at known receiver positions $\xr$ for a source located at $\xs$.
Define an objective function to match a recorded pressure field
$\Drs$ with a modeled field $\Px$ in a leastsquares sense.
The objective function is a functional of the
three unknown functions of physical parameters $\rho$, $c$, and $\ft$.
\begin{eqnarray}
\label{eq:obj1}
J [ \rho, c, \fs ] &=&
{1 \over 2} \sumrs \intT
\left [ \Drs  {1 \over \rho (\xr)}
{\partial P (\xr, t ) \over \partial z}\right ]^2 \wt dt
\nonumber \\
\label{eq:obj}
&=& {1 \over 2} \sumrs \intint
\left [ \Drs  \oor \dPdz \right ]^2
\dxxi \dxtt .
\end{eqnarray}
$\wt$ is a smooth weighting function that we will find convenient.
The optimization problem can be stated as follows: find the
model parameters $\rho$, $c$, and $s$ so that the field $\Px$
defined by (\ref{eq:acousticint}), (\ref{eq:bigfree}),
and (\ref{eq:boundary}) minimize the objective function (\ref{eq:obj}).
The objective function has a leastsquares form.
The weighting factor $w(t)$ compensates for
geometric spreading and absorption to give approximately equal
weight to all parts of the modeled data.
\section{General variational and objective functions}
To optimize the model parameters model parameters
$\rho$, $c$, and $s$ according to equations
(\ref{eq:acousticint}), (\ref{eq:bigfree}), and (\ref{eq:obj}), one must
discover how a perturbation of the model parameters
will affect the objective function (\ref{eq:obj}).
Let us abbreviate the variational form (\ref{eq:acousticint})
of our differential equations in the following form:
\begin{equation}
\label{eq:var1}
\intint \qx L_1 [ \mx ] \Px \dxt = 0, ~ \forall \qx.
\end{equation}
\begin{equation}
\label{eq:mdef}
\mbox{where } \mx = [ \cx, \rx, \fs ].
\end{equation}
$P$ is also understood to be a implicit function of $\mx$,
as determined by the acoustic equations.
The exact form of $L_1$ is unimportant for the remainder of this derivation,
except that it must be expressible as a linear functional operator on $P$.
$L_1$ is a nonlinear operator on $\mx$.
By a different choice of model parameters ($1/c^2 \rho$ , $1/\rho$, and $\fs$),
one could also linearize $L_1$ with respect to the model $\mv$, but this
is not necessary.
Similarly, we can rewrite our objective function (\ref{eq:obj}) in
a more general form as
\begin{equation}
\label{eq:obj2}
J [ \mx ] = {1 \over 2} \sumrs \intint
\left \{ \Drs  L_2 [\mx , t ] \Px \right \}^2 \dxxi \wt \dxt .
\end{equation}
%Here any weighting factors $w(t)$
%have been absorbed into the scaled data and the
%definition of the operator $L_2$.
\section{Perturbations of the general formulation}
First, let us use the variational equation (\ref{eq:var1}) to relate
perturbations of the model $\delta \mv$ to perturbations of the pressure
$\dP$:
\begin{equation}
\label{eq:dpdm}
\intint q L_1 \dP \dxt =
 \intint q \gradm ( L_1 P ) \cdot \delta \mv \dxt ,
~ \forall q.
\end{equation}
The gradient product can be understood as meaning
\begin{equation}
\gradm G(\mv) \cdot \delta \mv
= \left. {d \over d \epsilon} G ( \mv + \epsilon \delta \mv )
\right_{\epsilon = 0}, \mbox{ where } G(\mv) = L_1 P ,
\end{equation}
and both $L_1$ and $P$ are functions of $\mv$.
In this form the linear operators $L_1$ and $\gradm(L_1 P)$ both
operate on perturbed quantities. Let us replace this form
by one that uses the adjoint operators on $q$.
If $a$ and $b$ are functions, the adjoint $L^*$ of a linear operator
$L$ is defined by $_a = _b$. The brackets
indicate the scalar products in the domains of $a$ and $b$.
For example
\begin{eqnarray}
< q_1  q_2 >_q &=& \intint q_1 \xt q_2 \xt \dxt ,\\
< P_1  P_2 >_P &=& \intint P_1 \xt P_2 \xt \dxt , \mbox{ and} \\
< m_1  m_2 >_m &=& \intO m_1 (\xv) m_2 (\xv) ~ d \xv .
\end{eqnarray}
So we can rewrite our perturbation equation (\ref{eq:dpdm}) as follows:
\begin{eqnarray}
\label{eq:dpdm2}
\intint \dP L_1^* q \dxt =
 \intO \delta \mv \intT [ \gradm (L_1 P)]^* q ~ dt \dx, ~ \forall q .
\end{eqnarray}
If $L_1$ contains only linear differential operators,
then we can obtain their adjoints $L_1^*$ by integrating by parts, as I will
show in a later section.
This makes explicit the implicit dependence of $P$ on $\mv$.
\section{Gradient of the general objective function}
Next let us relate perturbations in the objective function $J$
to perturbations in the pressure $P$ and modeling parameters
used in $L_2$.
\begin{eqnarray}
 \delta J &=&
\sumrs \intint \dxxi ( \Drs  L_2 P_s) L_2 \delta P_s ~w~ \dxt
\nonumber \\
\label{eq:dJ}
&+& \sumrs \intint \dxxi ( \Drs  L_2 P_s) \gradm (L_2 P_s) \cdot \delta \mv ~w~ \dxt \\
&=&
\sumrs \intint \delta P_s L_2^* [ \dxxi ( \Drs  L_2 P_s) ~w] \dxt
\nonumber \\
\label{eq:dJ2}
&+& \sumrs \intO \delta \mv \cdot \intT [ \gradm (L_2 P_s) ]^*
[ \dxxi ( \Drs  L_2 P_s) ~w] dt \dx .
\end{eqnarray}
Now for the clever part. The perturbed integral (\ref{eq:dpdm2})
must be true for all $q$. Choose $q$ then so that
(\ref{eq:dpdm2}) equals the first term of the gradient (\ref{eq:dJ2})
for all $\delta P$:
\begin{eqnarray}
\intint \delta P L_1^* q \dxt = \sumrs \intint \delta P L_2^*
[ \dxxi (\Drs  L_2P) ~w ] \dxt .
\end{eqnarray}
For this equation to be true for all $\delta P$, $q$ must satisfy
\begin{eqnarray}
L_1^* q = \sumrs L_2^* [ \dxxi (\Drs  L_2 P) w] .
\end{eqnarray}
This adjoint system of equations is very similar to the original
set of equations for $P$, but with a new source term  the
difference between the modeled and the actual data.
If $q$ satisfies this equation, then we can combine the variational
equation (\ref{eq:dpdm2}) and the perturbation (\ref{eq:dJ2})
of the objective function to remove intermediate dependence on $\delta P$.
\begin{eqnarray}
\label{eq:dJ3}
\delta J &=&
\intO \delta \mv \intT [ \gradm (L_1 P)]^* q ~ dt ~ d \xv
\nonumber \\
&& \sumrs \intO \delta \mv \intT [ \gradm (L_2 P) ]^*
[ \dxxi ( \Drs  L_2 P) w ] ~ dt ~ d \xv .
\end{eqnarray}
We can express this perturbation (\ref{eq:dJ3}) as
a gradient of the objective function with respect
to model parameters $\mv (\xv)$ at a specific location $\xv$.
\begin{eqnarray}
\label{eq:grad}
\gradm J [\mv(\xv)] = \left. {\delta J \over \delta \mv} \right _{\mv(\xv)}
&=& \intT [ \gradm (L_1 P)]^* q ~dt
\nonumber \\
&& \sumrs \intT [ \gradm (L_2 P) ]^* [ \dxxi ( \Drs  L_2 P) w ] ~dt .
\end{eqnarray}
The first term of this gradient (\ref{eq:grad}) is the most important
because it provides information about $\mv$ at all positions.
The second term is nonzero only at points $\{\xr\}$ where the
data are recorded.
The simplest possible steepest descent optimization would perturb
a reference value of $\mv$ by a scaled version of its gradient.
That is,
\begin{eqnarray}
\min_\alpha J [ \mv_n + \alpha \gradm J (\mv_n) ], \mbox{ and }
\mv_{n+1} = \mv_n + \alpha \gradm J(\mv_n) .
\end{eqnarray}
One would also sum the perturbations over all source positions
to for a single perturbation of the model.
After one perturbation, the new reference value of $\mv_{n+1}$
could be used to calculate a new reference wavefield and a new
perturbation.
\section{Perturbation and adjoint of the acoustic equation}
Let us now leave the general formulation and see
how to handle the acoustic equation in particular.
Let us first perturb the objective function (\ref{eq:obj})
as the general perturbed form (\ref{eq:dJ2}):
\begin{eqnarray}
\label{eq:dJ4}
 \delta J [ \rho, c, \fs ] &=&
 \sumrs \intint \dxxi
\left [ \Drs  \oor \dPdz \right ] \oor \ddz \dPx \dxtt
\nonumber \\
&+& \sumrs \intint \dxxi
\left [ \Drs  \oor \dPdz \right ] \dPdz {\delta \rx \over \rx^2} \dxtt .
\end{eqnarray}
Next we integrate by parts over $z$.
\begin{eqnarray}
\label{eq:dacou}
 \delta J [ \rho, c, \fs ] = \sumrs \intint \dPx \ddz
\left \{ \dxxi \oor \left [ \Drs  \oor \dPdz \right ] \right \} \dxtt
\nonumber
\\
+ \sumrs \intint \delta \rx
\left \{ \dxxi \oorr \left [ \Drs  \oor \dPdz \right ] \dPdz
\right \} \dxtt .
\end{eqnarray}
The boundary term disappears because we require $\delta \Px$ to vanish
at the boundary, as does $\Px$.
The next step is to place the variational acoustic equation
(\ref{eq:acousticint}) as in our general perturbed form (\ref{eq:dpdm2}).
First perturb:
\begin{eqnarray}
&\hfil& \intint \left [ \ooccr \ddtt 
\div \left ( \oor \grad \right ) \right ] ~ \dPx \dxt
\nonumber\\
&=& \intint \qx \left [ \oocccr \dPdtt \right ] \delta \cx \dxt
\nonumber\\
&+& \intint \qx \left \{ \ooccrr \dPdtt
 \div \left [ \oorr \grad \Px \right ]
\right \} \delta \rx \dxt
\nonumber\\
\label{eq:genvar}
&& \intint \qx \delta (\xv) \delta \ft \dxt .
\end{eqnarray}
We can integrate the time derivatives in (\ref{eq:genvar})
by parts twice over time.
The divergence and gradient terms of (\ref{eq:genvar})
can be rewritten with the following chain rule and divergence theorem:
\begin{eqnarray}
\intO a \div \bv \dx
&=& \intO \div (a \bv) \dx  \intO \bv \cdot \grad a \dx
\nonumber \\
&=& \intS \nv \cdot \bv \, a \, d \sigma  \intO \bv \cdot \grad a \dx .
\end{eqnarray}
$a$ is a scalar and $\bv$ is a vector. $\nv$ is the unit normal
vector to the boundary $\partial \Omega$ of the volume $\Omega$.
So here is the rewritten form of the perturbed variational form
(\ref{eq:genvar}):
\begin{eqnarray}
\left. \intO \qx \ooccr \ddt \dPx \dx \right _{t=T}

\left. \intO {\partial \qx \over \partial t} \ooccr
\dPx \dx \right _{t=T}
\nonumber \\
+\intint \dPx \ooccr {\partial^2 \qx \over \partial t^2} \dxt

\intT \! \intS \qx \oor
{\partial \over \partial n} \dPx \, d \sigma \, dt
\nonumber \\
 \intint \dPx \div \left [ \oor \grad \qx \right ] \dxt
=
\nonumber \\
\intint \delta \cx \left [ \oocccr \dPdtt \right ] \qx \dxt
\nonumber \\
 \intint \delta \rx \left [ \ooccrr \dPdtt \right.

\left. \oorr \grad \Px \cdot \grad \right ] \qx \dxt
\nonumber \\
\label{eq:deriv}
+ \intT \! \intS \delta \rx
\left [ \oorr {\partial \Px \over \partial n} \right ] \qx
\, d \sigma \, dt
 \intint \delta \ft \delta (\xv) \qx \dxt .
\end{eqnarray}
The boundary terms at $t=0$ disappear because $\delta P$ must
observe the boundary conditions of $P$. The
$\partial P / \partial n = \nv \cdot \grad P$ term is
the derivative in the direction of the normal to the surface.
Such terms do not disappear. Other integrals that evaluate $\delta P$
on the surface $\partial \Omega$ do disappear.
Now set the right side of the expanded (\ref{eq:deriv})
equal to the first term of of the perturbation
of the objective function (\ref{eq:dacou}) for all perturbations
$\delta P$ of the wavefield. Then
\begin{eqnarray}
\left \{ \ooccr \ddtt  \div \left [ \oor \grad \right ] \right \} \qx
\nonumber \\
\label{eq:adj}
=  \sumrs \ddz \left \{ \dxxi { \wt \over \rx } \left [
\Drs  \oor \dPdz \right ] \right \}
\end{eqnarray}
in the interior. The boundary terms on $\partial \Omega$ and
$t=T$ must vanish so
\begin{eqnarray}
\label{eq:adjbound}
\left. \qx \right _{t=T} =
\left. \ddt \qx \right _{t=T} =
\left. \qx \right _{\partial \Omega} = 0 .
\end{eqnarray}
These adjoint equations (\ref{eq:adj}) and (\ref{eq:adjbound})
require an extrapolation on $q$ that is very similar to that on $P$,
except that extrapolation must now proceed backards in time from $t=T$.
A single application of this extrapolation is called reversetime
migration.
Errors between the modeled and recorded data act as sources through
the extrapolation at the locations of the receivers.
The first term of $\delta J$ in equation (\ref{eq:dJ4})
now equals the right side of (\ref{eq:deriv}). The boundary
conditions on $q$ make the surface $\partial \Omega$ integral
disappear. Thus
\begin{eqnarray}
\grad_c J(\xv) &=& {\delta J \over \delta \cx } =
\intT \oocccr \dPdtt \qx dt .
\\
\grad_{\rho} J(\xv) &=& {\delta J \over \delta \rx } =
\intT \left [ \ooccrr \dPdtt  \oorr \grad \Px \cdot \grad \right ] \qx dt
\nonumber \\
&\,&
 \sumrs \intT {\dxxi \over \rx^2} \left [ \Drs  \oor \dPdz \right ]
\dPdz \wt dt
\\
\grad_s J(t) &=& {\delta J \over \delta \ft } =
\left. \qx \right _{\mathbf{x} = \mathbf{0}}
\end{eqnarray}
To perturb $c$ we take the dot product of the extrapolated $q$
with a filtered version of the previous guess of the wavefield
$\frac{1}{c^3 \rho}\frac{\partial^2P}{\partial t^2}$.
The perturbation of $\rho$ requires the
application of a differential operator on $q$. The second term of
$\grad_{\rho} J_1$ (from equation [\ref{eq:dacou}]) applies only
at the receiver points and acts as an amplification of the recorded
trace. The perturbation of $s$ is simply equal to the adjoint
wavefield $q$ evaluated at the source position.
To start the iterations set $\ft = \delta(t)$ and the parameters
$c$ and $\rho$ to reasonable guesses. The first perturbation
will attempt to invert the primary reflections and is equivalent to a
socalled reversetime migration. The second iteration ostensibly would
invert secondorder scattering, but the poor choice of
background velocities would make it difficult to predict the secondorder
form from the firstorder scattering. This inversion, like all inverse
scattering methods, can only invert highfrequency parameters changes
that are likely to create reflections. Smoother changes in $c$
and $\rho$ must be known beforehand.
\bibliography{wsh}
\end{document}