Research.ganse.org


INVGN - Gauss-Newton inversion, version 1.0.  
by Andrew Ganse, Applied Physics Laboratory, University of Washington, Seattle.  
Contact at andy@ganse.org  
Copyright (c) 2015, University of Washington, via 3-clause BSD license.  
See LICENSE.txt for full license statement.  

-------------------------------------------------------------------------------

INVGN: Calculate Tikhonov-regularized, Gauss-Newton nonlinear iterated inversion
to solve the damped nonlinear least squares problem:  
          minimize ||g(m)-d||^2_2 + lambda^2||Lm||^2_2  
For appropriate choices of regularization parameter lambda, this problem is
equivalent to:  
          minimize ||Lm||_2 subject to ||g(m)-d||_2          where delta is some statistically-determined noise threshold  
and also to:  
          minimize ||g(m)-d||_2 subject to ||Lm||_2          where epsilon is some model-norm threshold  
          
The function call is set up to allow use on both nonlinear and linear problems,
both regularized (inverse) and non-regularized (parameter estimation) problems,
and both frequentist and Bayesian problems.  See EXAMPLES below for these calls.
The dampled NLS regularization is accomplished with the L-curve method (see e.g.
Hansen 1987).

Code based on material from:  

* Professor Ken Creager's ESS-523 inverse theory class, Univ of Washington, 2005  
* RC Aster, B Borchers, & CH Thurber, "Parameter Estimation and Inverse
    Problems," Elsevier Academic Press, 2004.  
* A Tarantola, "Inverse Problem Theory and Methods for Model Parameter
    Estimation", Society for Industrial and Applied Mathematics, 2005.  
* PC Hansen, "Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects
    of Linear Inversion," Society for Industrial Mathematics, 1987.  
* PE Gill, W Murray, & MH Wright, "Practical Optimization," Academic Press, 1981.  

INVGN is compatible with Octave (version >3), but to handle a few minor
remaining differences between Matlab and Octave, do note the "usingOctave"
flag among the options listed at top of the InvGN.m script.  Perhaps in future
versions INVGN will allow these options to be set via keyword/value pairs in the
input argument list, but not implemented yet.


USAGE:  
 [m,normdm,normmisfit,normrough,lambda,dpred,G,C,R,Ddiag,i_conv] = ...
 invGN(fwdfunc,derivfunc,epsr,dmeas,covd,lb,ub,m0,L,lambda,maxiters,useMpref,...
       verbose,fid,vargin);


INPUTS: (note all vectors are always column vectors)  
    fwdfunc = Matlab/Octave function handle to forward problem function which
              is of form outvector=myfunction(invector,...).  (See example 1
              of "help function_handle" in Matlab for how to implement)  
  derivfunc = Matlab/Octave function handle to derivatives function which
              is of form outvector=myfunction(invector,...).
              If derivfunc=[] then finitediffs are automatically used instead.  
       epsr = relative accuracy of forward problem computation (not the
              accuracy of the modeling of the physical process, but of the
              computation itself, see e.g. Gill,Murray,Wright 1981).  Affects
              convergence testing and is passed into finite diff function (if
              used), in both cases providing stepsize h (h is about sqrt(epsr)
              if the model params are all of order unity).  Epsr has a minimum
              at the tradeoff between degrading Talor series expansion accuracy
              if h too large, and increasing roundoff error in the numerator
              G(m+h)-G(m) of finite diffs if h too small.
              Epsr can be scalar (ie same for all elements of model vector) or 
              vector of same length as model vector for 1-to-1 correspondence
              to different variable "types" (eg layer bulk properties vs layer
              thicknesses).  
      dmeas = measured data vector (includes noise)  
       covd = if sqr matrix, then this is measurement noise covariance matrix.
              if col vector, then this is diag of meas noise covariance matrix.
              if neg, then = -covd^(-1/2), allowing e.g. use of zeros for a mask,
              so note in that case it's the inverse of the STDEVS (sqrt(cov)).  
         lb = lower model bounds vector (hard bnds, no tolerance)  
         ub = upper model bounds vector (hard bnds, no tolerance)
              These bounds are only implemented at the iteration steps after
              solving for the model perturbation without them; ie functions
              such as BVLS or LSQNONNEG are not used.  If the bounds are seen
              to be passed in a candidate model perturbation, the stepsize of
              the perturbation is reduced (but retaining its direction) until
              within bounds.  As long as we know the final solution must be
              within the bounds and not on them, this is merely like changing
              the starting point of inversion and is valid (perhaps somtimes
              not as efficient as one of those more complicated functions).
              But if the solution from this code ends up on a bound, note you
              should treat that solution with some caution and not consider it
              a true GN solution.  In my own inverse problems using this code
              the bounds only ever engage at the tails of my Lcurve where I'm
              not too worried about being rigorous, so I haven't found it
              necessary to complicate things with those other functions.  
         m0 = initial model vector estimate to start iterating from; if this
              is instead a matrix (NxM), then it's a collection of M initial
              model vectors of length N each, corresponding to M lambda values
              (Typically this is done to continue iterations on previous runs
              without starting over.)
              Alternate use: if L (next input arg) = I (identity matrix), then
              this is also the "preferred model" whose distance from the soln
              model is minimized.  
          L = If this is a matrix, then this is the user-supplied finite
              difference operator for Tikhonov regularization (function
              finiteDiffOP.m can help construct it) in the frequentist
              framework.  If in the Bayesian framework and lambda is set to 1,
              then L can be supplied as the Cholesky decomposition of the
              inverse model prior covariance matrix.
              If L is scalar then it is the number of profile segments (for
              example if the model vector represents concatenated P & S wave
              speed profiles then Lin=2) and 2nd order Tikhonov (smoothness)
              frequentist regularization will be used.  
     lambda = Used for the frequentist inversion framework (if using Bayesian
              framework then set lambda=1 and see Bayesian note for L above).
              If lambda is a negative scalar, auto calculate Lcurve with num
              pts equal to magnitude of the negative scalar.  The auto-calc'd
              curve starts at lambda_mid = sqrt(max(eig(G'*G))/max(eig(L'*L)))
              and decreases logarithmically from there to 3 orders of mag
              less and more than lambda_mid by default.  If lambda >=0, use
              its value(s) directly as frequentist Tikhonov tradeoff param(s),
              in which case lambda can be scalar or vector.
              If lambda is a 3-element vector, lambda(1) is the pos or neg scalar
              just described; and lambda(2:3) are how many orders of magnitude
              greater (lambda(2)) or less (lambda(3)) than lambda_mid to sweep
              in the auto-calc'd curve.  (lambda(2:3) defaults are 3,-3)  
   useMpref = flag for one of three problem frameworks:
              useMpref=0 is for a frequentist formulation with higher-order
                Tikhonov regularization.  Here the regularization doesn't
                handle distance from preferred (initial) model m0, but no
                restriction on L (unlike useMpref=1).
              useMpref=1 is for using 0th-order-Tikh (ridge regr), ie
                L==eye(size(L)), in a frequentist formulation in which m0 is
                the "preferred model" the inversion minimizes distance to.
              useMpref=2 is for using Bayesian formulation, in which case:
                m0 is the prior model mean, lambda must =1, L=chol(inv(Cprior))
                (user must set that L externally, note ironically L there must
                be RIGHT triangular, the "L" notation is from frequentist case),
                output C is now the Bayesian posterior covariance matrix (which
                note is different from the frequentist C), outputs R and Ddiag
                are empty, and normrough won't have literal meaning anymore.  
   maxiters = Number of GN iterations to perform at each lambda on the Lcurve.
              So total number of derivative estimations will be
              numlambdas*maxiters.  
    verbose = 0-3, with 0 = run quietly, and 3 = lots of progress text
              Progress text (percent done) in findiffs is useful for slow
              forward problems and/or debugging - but note percent done text
              is unavailable in jacfwddist and jaccendist.  
        fid = File handle to which progress text is output in verbose option.
              Note fid=1 outputs to stdout, but an output file is useful for
              long batch runs when the forward problem is slow.  
      vargin = the extra arguments (if any) that must be passed to the fwdprob
              function are tacked onto the end here.  You may have none, in
              which case your last arg is the fid.)  

OUTPUTS: (note all vectors are always column vectors)  
          m = matrix of maximum likelihood model solution vectors for each
              lambda.  So m is MxP for M model params and P points on Lcurve,
              ie each column of m is a model vector for a different lambda.  
     normdm = vectors of L2 norms of model perturbation steps "dm" for each
              lambda.  (so normdm is imax x P, with i steps to convergence
              for l'th lambda, imax=max(i(l)), and P points on Lcurve).  Unused
              elements in normdm are set to -1, since norms can't be negative.  
 normmisfit = vector of L2 norms of data misfit between measured and 
              predicted data at solution point for each lambda.  
  normrough = vector of L2 norms of roughness (for which units are 2nd deriv)
              at solution points for each lambda.  "Roughness" refers to case
              of 2nd order Tikhonov (smoothing constraint) but note really here
              this norm's meaning depends on what the supplied L matrix is
              (eg might be other model norms than roughness).  And in Bayesian
              case (useMpref=2) this doesn't have same meaning anymore although
              it's still part of the objective function as it relates to Cprior.  
     lambda = vector of tradeoff parameters used for Tikhonov regularization.
              (you might already have this if you passed in vector of lambdas
              as one of the input arguments, but this is useful in the case of
              specifying to auto-calculate N Lcurve points).  
      dpred = predicted data at solution models (so there are as many vectors
              in dpred as there are points on the Lcurve, so dpred is NxP, for
              N data pts and P points on Lcurve).  
          G = Jacobian matrices of partial derivatives at solution models
              (so there are as many jacobian matrices in G as there are points
              on the Lcurve, so G is NxMxP, for N data pts, M model params,
              and P points on Lcurve).  
          C = if useMpref<2, C contains the frequentist covariance matrices of
              solution models (so there are as many cov matrices in C as there
              are points on the Lcurve, so C is MxMxP, for M model params and
              P points on Lcurve).  If useMpref==2 (flagging Bayesian case),
              then C is instead the Bayesian posterior model covariance matrix.  
          R = resolution matrices of solution models (so there are as many cov
              matrices in C as there are points on the Lcurve, so C is MxMxP,
              for M model params and P points on Lcurve).
              (R is empty for Bayesian case when useMpref=2.)  
      Ddiag = column vector of diagonal of data res matrix (length Ndata)
              (Ddiag is empty for Bayesian case when useMpref=2.)  
      iconv = column vector of number of iterations before convergence at each
              lambda.  if didn't converge, the iconv value for that lambda
              is 0.  (length Ndata).  

For more clarity in the code, this function computes its model perturbations via
(regularized) normal equations, rather than generalized SVD or subiterations of
gradient descent, which may be more efficient and/or scalable.  The assumption
in this choice was that forward function code takes much longer to compute than
the model perturbation solution itself (which can otherwise be computed faster
via GSVD).  If that assumption is not true, then much improvement in compute
time of this function would result from solving for the perturbation using GSVD
methods such as those provided in Per Christian Hansen's toolkit (however note
that toolkit would take a lot of tweaking/paring to make it Octave compatible):
http://www2.imm.dtu.dk/~pch/Regutools  
Aside from computational efficiency/speed, another important consequence of
solving normal equations in this code is that it constrains the size of the
problem (number of model parameters) relative to amount of computer memory due
to calculation of a matrix inverse.  If useful to note I have successfully run
this code for over 10,000 model parameters on a computer with 16GB of RAM, and
for over 2500 model parameters on a computer with 8GB RAM (in both those cases
the matrix inverse was the memory constraint rather than forward problem).


EXAMPLES:  
(leaving off output part "[var1,var2,etc]=" which is same format for all):

Regularized frequentist inversion with auto-chosen lambdas (numlambdas of them):  
 invGN(fwdp,[],epsr,datameas,-invcovd,lb,ub,minit,L,-numlambdas,0,maxiters,3,1);

Regularized frequentist inversion with prechosen lambdas:  
 invGN(fwdp,[],epsr,datameas,-invcovd,lb,ub,minit,L,...
           [1.04193e+07; 10419.3; 10.4193; 0.0104193;],0,maxiters,3,1);

Regu inv, picking up after previous run (using lambda vector from prev run):  
 invGN(fwdp,[],epsr,datameas,-invcovd,lb,ub,minit,L,lambda,0,maxiters,3,1);

Parameter estimation (no regularization):  
 invGN(fwdp,[],epsr,datameas,-invcovd,lb,ub,minit,zeros(size(L)),1,0,...
           maxiters,3,1);

Frequentist linear problem (should converge after 1st iter), w/ smoothing regu:  
 deriv=@(x) A;  fwdp=@(x) A*x;  % define funcs using the A matrix inline  
 invGN(fwdp,deriv,epsr,datameas,covd,lb,ub,zeros(M,1),L,-numlambdas,0,2,0,1);  
(Note maxiters=2 there: actually the first "iter" in saved results is for the
initial value, which admittedly isn't relevant for purely linear problems, but
that's how this nonlinear script fits purely linear problems.  Yes definitely
an inefficient kludge.)

Frequentist nonlinear problem with preferred model:  
 invGN(fwdp,[],epsr,datameas,covd,lb,ub,mpref,eye(M),-numlambdas,1,maxiters,0,1);

Bayesian nonlinear problem:  
 invGN(fwdp,[],epsr,datameas,covd,lb,ub,mprior,chol(inv(Cprior)),1,2,...  
           maxiters,0,1);