\documentclass[12pt]{article}
\usepackage{wsh}

\begin{document}  
\bibliographystyle{plain}
\markright{Constrained Dix Inversion --- W.S. Harlan}

\title{Constrained Dix Inversion}

\author{William S. Harlan}

\date{Dec. 1999}

\maketitle

\section{Overview}

Dix inversion estimates interval velocities
from picked stacking velocities, usually as a
function of vertical two-way time.  The
stacking velocities are assumed to be
explained by a root-mean-square (RMS) averaging of
the interval velocities.  A conventional
method \cite{dix} uses an explicit solution that inverts
the RMS integral.  This explicit
solution easily produces wildly unrealistic interval
velocities from small variations in stacking velocities.

Constrained inversion fits stacking
velocities with a smooth, bounded interval
velocity function.  This method is slower
but almost always preferable to the fast
explicit solution.  Damped least-squares
minimizes errors in picked velocities and
also minimizes unnecessary complexities in
interval velocities.

Constrained inversion distributes errors uniformly 
when fitting the squared reciprocal of stacking velocity.
This distribution corresponds to uniform
errors in residual normal moveouts.

Interval velocities are constructed as a
sum of overlapping bell curves extending in all spatial
directions.  Coefficients of these curves are
damped to avoid unnecessary sharpness in the
estimated interval velocities.  Rough changes
in interval velocity are allowed only if
strongly required by the input data.
Finally, interval velocities are not allowed to
exceed specified minimum and maximum values.

An explicit Dix solution inverts one vertical
function at a time, whereas least-squares
finds a global solution.  Each estimated
coefficient must explain stacking velocities
over a range of spatial positions on the map.
Redundancy greatly improves, so a single bad
stacking function does not easily corrupt the
solution.  A few bad data points are largely
ignored when contradicted by many neighboring
values.

Many geophysical programmers familiar with
damped least-squares have developed similar
methods \cite{toldi85, clapp1, gee}.

\section {Smooth interval velocities}

We assume interval velocities to
be smooth in all physical directions.
This assumption is most appropriate
for ``soft'' rocks, where fluid pressure 
dominates seismic velocities.
In ``hard'' rocks, velocities tend to be homogeneous
in intervals, with abrupt discontinuities
at changes in lithology.   Soft constraints
can still accurately describe the time/depth
conversion of hard media.  

A smoothing operator with unit area (DC) does not
introduce any bias into smoothed values.
On average, values are no larger or smaller than before.
If interval velocities are sampled as a function
of vertical traveltime, then depth is just the 
integral of velocity over time.  If smoothing
does not bias the interval velocity, then it also does
not bias depth conversions.  Away from the
immediate vicinity of a large discontinuity, smoothing
has no effect on time/depth conversions.

Our convolutional smoothing operator is a
bell-shaped curve described by a third-order
polynomial.  The curve has unit area to
preserve magnitudes.  The convolution is
renormalized at boundaries to preserve unit
area when the convolution is truncated.  A
smoothing width is the ``half-width'' of the
curve, the span over which the curve drops to
half the peak value.  
The total width of the
curve is twice the smoothing interval.  Over
this interval, the third-order curve is 
$b(r) = r^2 (2 r - 3) + 1$, where $r \leq 1$
is the distance from the peak divided by the
smoothing distance.  The curve has zero slope
at the peak and endpoints.
(The half-width in the Fourier domain is 
approximately the reciprocal of the 
half-width in the untransformed domain.) 

\section {Squared stacking slownesses}

A stacking ``velocity'' is a parameter for
the hyperbolic curve that best fits the moveout of 
reflection times over source-receiver offset.
Stacking velocities are estimated from
prestack seismic data by scanning ranges of
acceptable values and examining weighted sums 
of the data over offset.  Resolution depends on the
width of seismic wavelets at the largest
recorded offsets.  Regular sampling of stacking
velocities does not correspond to regular
sampling of wavelets.

However, the squared reciprocal of stacking
velocity, which we call squared stacking
slowness (or ``sloth''), does regularly sample
wavelets at the farthest offset.
We prefer to minimize errors in squared
slowness as the best way to minimize errors in
corresponding reflection times.

Interpreters tend to pick stacking velocities
at locations where moveouts change the most.
The locations of picks are not necessarily
more significant or reliable than others. 
Interpreters also examine moveout adjustments 
at locations well away from the picks.  If the
interpolated behavior is acceptable, then no
new picks are added.  For this reason, we give
interpolated stacking velocities the same
significance as picked values.

We treat an interpolated regular grid of squared
stacking slownesses as our hard data to be inverted.
Usually input stacking velocities are interpolated
linearly between picked times, with constant
values off the ends.  Functions are then
triangulated and interpolated linearly over
spatial directions.  A regular grid of values
needs enough resolution to represent all useful
information in the original functions.

\section {Root-mean-square equations} 

For this inversion, we assume a stacking
``velocity'' to be equivalent to the
root-mean-square (RMS) average of interval
velocities.  This equivalence holds exactly
only for infinitesimal offsets in a
horizontally stratified medium.

Let a single sampled function of squared
stacking slownesses be represented by the
one-dimensional vector $\svector{s}$, and 
interval velocities by $\svector{v}$.  Vector
indices mark samples of vertical traveltime.
Index zero corresponds to zero time.
We write the RMS average of $\svector{v}$ 
in discrete form as 
%\gdef\ooj{{\textstyle \frac{1}{j+1}}}
\gdef\ooj{{\frac{1}{j+1}}}
\begin{eqnarray} 
\label{eq:forward}
1/ s_j = \ooj \sum_{k=0}^j v_k^{2} .
\end{eqnarray}

A fast, explicit inverse does exist for the RMS equation (\ref{eq:forward}):
\begin{eqnarray} 
\label{eq:inverse}
v_k = \left ( \frac{k}{s_k} - \frac{k-1}{s_{k-1}} \right )^{\frac{1}{2}} .
\end{eqnarray}
This equation (\ref{eq:inverse}) is typically referred to as
the Dix equation, although the original reference \cite{dix}
preferred more accurate variations.  This explicit
solution can easily fail when required to
take the square root of negative numbers.
Worse, statistically meaningless variations in stacking velocities
can cause interval velocities to vary wildly.

For a constrained inversion, we also find it useful to write
the linearization of this equation.  A small perturbation $\Delta v_k$
of interval velocity results in the
following perturbation $\Delta s_j$ of squared stacking slowness:
\begin{eqnarray} 
\label{eq:linear}
\Delta s_j = [ -2 \, s_j^2 / (j+1) ] \sum_{k=0}^j v_k \Delta v_k .
\end{eqnarray}
Unperturbed variables retain their reference values.

Finally, the adjoint linearized equation gives the perturbation
of interval velocity required to explain a small perturbation
of squared stacking slowness:
\begin{eqnarray} 
\label{eq:adjoint}
\Delta v_k = v_k \sum_{j=k}^\infty [-2 \, s_j^2 /(j+1)] \Delta s_j .
\end{eqnarray}
Gradient optimization methods like conjugate-gradients usually require
the adjoint.

\section{Damped least-squares}

Damped least-squares attempts to balance data errors with
minimal complexity in the model.  

Let $\stensor{B}$ be a linear smoothing
operator with unit area.  Define a smooth 
interval velocity with the convolution
\begin{eqnarray}
\label{eq:smooth}
v_k \equiv \sum_{i}  b_{k-i} \, w_i 
  \equiv (\stensor{B} \cdot \svector{w} )_k .
\end{eqnarray}
where the vector $\svector{w}$ contains the coefficients of
smooth, shifted basis functions.
Implicitly, this smoothing operator also convolves over
all spatial indices, which we suppress in our equations.

The best coefficients $\svector{w}$ should
minimize the following objective function:
\begin{eqnarray}
\label{eq:objective}
\left \{ s_j  - 
\left [ \ooj \sum_{k=0}^j 
( \stensor{B} \cdot \svector{w})_k^2
\right ]^{-1} \right \}^2
+ \epsilon \sum_k { w_k^2 } .
\end{eqnarray}
The small damping factor $\epsilon$ is the
ratio of the variance of data errors to the
variance of interval velocities.  A large
range of plausible values will give similar
results.  Damping insures that small
variations in squared stacking slowness will
not cause extreme variations in interval
velocity.   For a purely quadratic objective function, the
damping is equivalent to pre-whitening, which adds
a small constant to the diagonal of the least-squares
``normal'' equations.

\section{Optimization}

Once we have written the objective function
(\ref{eq:objective}), we have unambiguously
specified a solution, although only
implicitly.  Much has been written on the
optimization of objective functions, so we
will not cover the details here.  See
Luenberger \cite{luenberger} for more
information on the Gauss-Newton method and
conjugate-gradients.

The objective function (\ref{eq:objective})
is not a perfectly quadratic function of the interval
velocities $\svector{v}$ but behaves similarly to a quadratic.
The objective function has a clear global minimum 
and is convex far away from that minimum.
In the vicinity of the minimum, the objective function 
is indistinguishable from a quadratic.

If a suboptimum set of coefficients $\svector{w}$
produce a particular set of squared stacking slownesses $\svector{s}$,
then the actual picked slownesses may differ by
an error $\Delta \svector{s}$.  With
linearization (\ref{eq:linear}), we can
say that the best perturbation of coefficients
$\Delta \svector{w}$ should minimize the following
objective function:
\begin{eqnarray}
\label{eq:quadratic}
\left [ \Delta s_j  - 
\frac{2 s_j^2}{j+1} \sum_{k=0}^j 
( \stensor{B} \cdot \svector{w})_k
( \stensor{B} \cdot \Delta \svector{w})_k
\right ]^2
+ \epsilon \sum_k { (w_k + \Delta w_k) ^2 } .
\end{eqnarray}
This approximate objective function (\ref{eq:quadratic})
is perfectly quadratic.
The optimum solution $\Delta \svector{w}$ 
is a linear function of the data error $\Delta \svector{s}$.
Quadratic objective functions are easily optimized
by the conjugate-gradient algorithm.

In our implementation, an outer Gauss-Newton loop 
iteratively replaces the objective function by the
quadratic approximation (\ref{eq:quadratic}).  
Each Gauss-Newton iteration begins with the
best interval velocity function so far.  The
first iteration uses a constant interval
velocity function far from the correct solution.  
An inner conjugate-gradient loop minimizes the
objective function that has been approximated
as a quadratic to find a perturbation to
the reference interval velocity.  
A non-linear line-search finds the best factor
to scale this perturbation before adding to
the reference interval velocity function.
(The line-search algorithm uses a combination of a parabolic
Newton method for speed and a golden-section search for 
robustness.)
Finally, the Gauss-Newton loop begins again with a
new approximation of the objective function.  
Typically, some four to eight iterations are necessary 
for the Gauss-Newton and conjugate-gradient loops.

We apply hard constraints (minimum and maximum values)
to interval velocities immediately after updating with
a perturbation.  These constraints are honored during
the non-linear line-search, but not during the temporary
linearization for conjugate-gradients.

As a final optimization, early iterations
begin with a large smoothing operator, and
thus few degrees of freedom.  After full
optimization with an over-simplified interval
velocity, the smoothing is reduced.  Finer
details are allowed into the velocity model
only when the background velocity is known to
be near the final correct solution.  Because
of damping, rough details are introduced only
when justified to fit a sufficiently large
error in the picked data.

\bibliography{wsh}

\end{document}


