INVGN - Gauss-Newton inversion, version 1.0.

by Andrew Ganse, Applied Physics Laboratory, University of Washington, Seattle.

Contact at andy@ganse.org

Copyright (c) 2015, University of Washington, via 3-clause BSD license.

See LICENSE.txt for full license statement.

-------------------------------------------------------------------------------

INVGN: Calculate Tikhonov-regularized, Gauss-Newton nonlinear iterated inversion

to solve the damped nonlinear least squares problem:

minimize ||g(m)-d||^2_2 + lambda^2||Lm||^2_2

For appropriate choices of regularization parameter lambda, this problem is

equivalent to:

minimize ||Lm||_2 subject to ||g(m)-d||_2 where delta is some statistically-determined noise threshold

and also to:

minimize ||g(m)-d||_2 subject to ||Lm||_2 where epsilon is some model-norm threshold

The function call is set up to allow use on both nonlinear and linear problems,

both regularized (inverse) and non-regularized (parameter estimation) problems,

and both frequentist and Bayesian problems. See EXAMPLES below for these calls.

The dampled NLS regularization is accomplished with the L-curve method (see e.g.

Hansen 1987).

Code based on material from:

* Professor Ken Creager's ESS-523 inverse theory class, Univ of Washington, 2005

* RC Aster, B Borchers, & CH Thurber, "Parameter Estimation and Inverse

Problems," Elsevier Academic Press, 2004.

* A Tarantola, "Inverse Problem Theory and Methods for Model Parameter

Estimation", Society for Industrial and Applied Mathematics, 2005.

* PC Hansen, "Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects

of Linear Inversion," Society for Industrial Mathematics, 1987.

* PE Gill, W Murray, & MH Wright, "Practical Optimization," Academic Press, 1981.

INVGN is compatible with Octave (version >3), but to handle a few minor

remaining differences between Matlab and Octave, do note the "usingOctave"

flag among the options listed at top of the InvGN.m script. Perhaps in future

versions INVGN will allow these options to be set via keyword/value pairs in the

input argument list, but not implemented yet.

USAGE:

[m,normdm,normmisfit,normrough,lambda,dpred,G,C,R,Ddiag,i_conv] = ...

invGN(fwdfunc,derivfunc,epsr,dmeas,covd,lb,ub,m0,L,lambda,maxiters,useMpref,...

verbose,fid,vargin);

INPUTS: (note all vectors are always column vectors)

fwdfunc = Matlab/Octave function handle to forward problem function which

is of form outvector=myfunction(invector,...). (See example 1

of "help function_handle" in Matlab for how to implement)

derivfunc = Matlab/Octave function handle to derivatives function which

is of form outvector=myfunction(invector,...).

If derivfunc=[] then finitediffs are automatically used instead.

epsr = relative accuracy of forward problem computation (not the

accuracy of the modeling of the physical process, but of the

computation itself, see e.g. Gill,Murray,Wright 1981). Affects

convergence testing and is passed into finite diff function (if

used), in both cases providing stepsize h (h is about sqrt(epsr)

if the model params are all of order unity). Epsr has a minimum

at the tradeoff between degrading Talor series expansion accuracy

if h too large, and increasing roundoff error in the numerator

G(m+h)-G(m) of finite diffs if h too small.

Epsr can be scalar (ie same for all elements of model vector) or

vector of same length as model vector for 1-to-1 correspondence

to different variable "types" (eg layer bulk properties vs layer

thicknesses).

dmeas = measured data vector (includes noise)

covd = if sqr matrix, then this is measurement noise covariance matrix.

if col vector, then this is diag of meas noise covariance matrix.

if neg, then = -covd^(-1/2), allowing e.g. use of zeros for a mask,

so note in that case it's the inverse of the STDEVS (sqrt(cov)).

lb = lower model bounds vector (hard bnds, no tolerance)

ub = upper model bounds vector (hard bnds, no tolerance)

These bounds are only implemented at the iteration steps after

solving for the model perturbation without them; ie functions

such as BVLS or LSQNONNEG are not used. If the bounds are seen

to be passed in a candidate model perturbation, the stepsize of

the perturbation is reduced (but retaining its direction) until

within bounds. As long as we know the final solution must be

within the bounds and not on them, this is merely like changing

the starting point of inversion and is valid (perhaps somtimes

not as efficient as one of those more complicated functions).

But if the solution from this code ends up on a bound, note you

should treat that solution with some caution and not consider it

a true GN solution. In my own inverse problems using this code

the bounds only ever engage at the tails of my Lcurve where I'm

not too worried about being rigorous, so I haven't found it

necessary to complicate things with those other functions.

m0 = initial model vector estimate to start iterating from; if this

is instead a matrix (NxM), then it's a collection of M initial

model vectors of length N each, corresponding to M lambda values

(Typically this is done to continue iterations on previous runs

without starting over.)

Alternate use: if L (next input arg) = I (identity matrix), then

this is also the "preferred model" whose distance from the soln

model is minimized.

L = If this is a matrix, then this is the user-supplied finite

difference operator for Tikhonov regularization (function

finiteDiffOP.m can help construct it) in the frequentist

framework. If in the Bayesian framework and lambda is set to 1,

then L can be supplied as the Cholesky decomposition of the

inverse model prior covariance matrix.

If L is scalar then it is the number of profile segments (for

example if the model vector represents concatenated P & S wave

speed profiles then Lin=2) and 2nd order Tikhonov (smoothness)

frequentist regularization will be used.

lambda = Used for the frequentist inversion framework (if using Bayesian

framework then set lambda=1 and see Bayesian note for L above).

If lambda is a negative scalar, auto calculate Lcurve with num

pts equal to magnitude of the negative scalar. The auto-calc'd

curve starts at lambda_mid = sqrt(max(eig(G'*G))/max(eig(L'*L)))

and decreases logarithmically from there to 3 orders of mag

less and more than lambda_mid by default. If lambda >=0, use

its value(s) directly as frequentist Tikhonov tradeoff param(s),

in which case lambda can be scalar or vector.

If lambda is a 3-element vector, lambda(1) is the pos or neg scalar

just described; and lambda(2:3) are how many orders of magnitude

greater (lambda(2)) or less (lambda(3)) than lambda_mid to sweep

in the auto-calc'd curve. (lambda(2:3) defaults are 3,-3)

useMpref = flag for one of three problem frameworks:

useMpref=0 is for a frequentist formulation with higher-order

Tikhonov regularization. Here the regularization doesn't

handle distance from preferred (initial) model m0, but no

restriction on L (unlike useMpref=1).

useMpref=1 is for using 0th-order-Tikh (ridge regr), ie

L==eye(size(L)), in a frequentist formulation in which m0 is

the "preferred model" the inversion minimizes distance to.

useMpref=2 is for using Bayesian formulation, in which case:

m0 is the prior model mean, lambda must =1, L=chol(inv(Cprior))

(user must set that L externally, note ironically L there must

be RIGHT triangular, the "L" notation is from frequentist case),

output C is now the Bayesian posterior covariance matrix (which

note is different from the frequentist C), outputs R and Ddiag

are empty, and normrough won't have literal meaning anymore.

maxiters = Number of GN iterations to perform at each lambda on the Lcurve.

So total number of derivative estimations will be

numlambdas*maxiters.

verbose = 0-3, with 0 = run quietly, and 3 = lots of progress text

Progress text (percent done) in findiffs is useful for slow

forward problems and/or debugging - but note percent done text

is unavailable in jacfwddist and jaccendist.

fid = File handle to which progress text is output in verbose option.

Note fid=1 outputs to stdout, but an output file is useful for

long batch runs when the forward problem is slow.

vargin = the extra arguments (if any) that must be passed to the fwdprob

function are tacked onto the end here. You may have none, in

which case your last arg is the fid.)

OUTPUTS: (note all vectors are always column vectors)

m = matrix of maximum likelihood model solution vectors for each

lambda. So m is MxP for M model params and P points on Lcurve,

ie each column of m is a model vector for a different lambda.

normdm = vectors of L2 norms of model perturbation steps "dm" for each

lambda. (so normdm is imax x P, with i steps to convergence

for l'th lambda, imax=max(i(l)), and P points on Lcurve). Unused

elements in normdm are set to -1, since norms can't be negative.

normmisfit = vector of L2 norms of data misfit between measured and

predicted data at solution point for each lambda.

normrough = vector of L2 norms of roughness (for which units are 2nd deriv)

at solution points for each lambda. "Roughness" refers to case

of 2nd order Tikhonov (smoothing constraint) but note really here

this norm's meaning depends on what the supplied L matrix is

(eg might be other model norms than roughness). And in Bayesian

case (useMpref=2) this doesn't have same meaning anymore although

it's still part of the objective function as it relates to Cprior.

lambda = vector of tradeoff parameters used for Tikhonov regularization.

(you might already have this if you passed in vector of lambdas

as one of the input arguments, but this is useful in the case of

specifying to auto-calculate N Lcurve points).

dpred = predicted data at solution models (so there are as many vectors

in dpred as there are points on the Lcurve, so dpred is NxP, for

N data pts and P points on Lcurve).

G = Jacobian matrices of partial derivatives at solution models

(so there are as many jacobian matrices in G as there are points

on the Lcurve, so G is NxMxP, for N data pts, M model params,

and P points on Lcurve).

C = if useMpref<2, C contains the frequentist covariance matrices of

solution models (so there are as many cov matrices in C as there

are points on the Lcurve, so C is MxMxP, for M model params and

P points on Lcurve). If useMpref==2 (flagging Bayesian case),

then C is instead the Bayesian posterior model covariance matrix.

R = resolution matrices of solution models (so there are as many cov

matrices in C as there are points on the Lcurve, so C is MxMxP,

for M model params and P points on Lcurve).

(R is empty for Bayesian case when useMpref=2.)

Ddiag = column vector of diagonal of data res matrix (length Ndata)

(Ddiag is empty for Bayesian case when useMpref=2.)

iconv = column vector of number of iterations before convergence at each

lambda. if didn't converge, the iconv value for that lambda

is 0. (length Ndata).

For more clarity in the code, this function computes its model perturbations via

(regularized) normal equations, rather than generalized SVD or subiterations of

gradient descent, which may be more efficient and/or scalable. The assumption

in this choice was that forward function code takes much longer to compute than

the model perturbation solution itself (which can otherwise be computed faster

via GSVD). If that assumption is not true, then much improvement in compute

time of this function would result from solving for the perturbation using GSVD

methods such as those provided in Per Christian Hansen's toolkit (however note

that toolkit would take a lot of tweaking/paring to make it Octave compatible):

http://www2.imm.dtu.dk/~pch/Regutools

Aside from computational efficiency/speed, another important consequence of

solving normal equations in this code is that it constrains the size of the

problem (number of model parameters) relative to amount of computer memory due

to calculation of a matrix inverse. If useful to note I have successfully run

this code for over 10,000 model parameters on a computer with 16GB of RAM, and

for over 2500 model parameters on a computer with 8GB RAM (in both those cases

the matrix inverse was the memory constraint rather than forward problem).

EXAMPLES:

(leaving off output part "[var1,var2,etc]=" which is same format for all):

Regularized frequentist inversion with auto-chosen lambdas (numlambdas of them):

invGN(fwdp,[],epsr,datameas,-invcovd,lb,ub,minit,L,-numlambdas,0,maxiters,3,1);

Regularized frequentist inversion with prechosen lambdas:

invGN(fwdp,[],epsr,datameas,-invcovd,lb,ub,minit,L,...

[1.04193e+07; 10419.3; 10.4193; 0.0104193;],0,maxiters,3,1);

Regu inv, picking up after previous run (using lambda vector from prev run):

invGN(fwdp,[],epsr,datameas,-invcovd,lb,ub,minit,L,lambda,0,maxiters,3,1);

Parameter estimation (no regularization):

invGN(fwdp,[],epsr,datameas,-invcovd,lb,ub,minit,zeros(size(L)),1,0,...

maxiters,3,1);

Frequentist linear problem (should converge after 1st iter), w/ smoothing regu:

deriv=@(x) A; fwdp=@(x) A*x; % define funcs using the A matrix inline

invGN(fwdp,deriv,epsr,datameas,covd,lb,ub,zeros(M,1),L,-numlambdas,0,2,0,1);

(Note maxiters=2 there: actually the first "iter" in saved results is for the

initial value, which admittedly isn't relevant for purely linear problems, but

that's how this nonlinear script fits purely linear problems. Yes definitely

an inefficient kludge.)

Frequentist nonlinear problem with preferred model:

invGN(fwdp,[],epsr,datameas,covd,lb,ub,mpref,eye(M),-numlambdas,1,maxiters,0,1);

Bayesian nonlinear problem:

invGN(fwdp,[],epsr,datameas,covd,lb,ub,mprior,chol(inv(Cprior)),1,2,...

maxiters,0,1);