% Lab #1.5 for Geophysical Inverse Theory ESS523, Fall 2005, Univ of WA
% by Andy Ganse, course TA
% course professor: Ken Creager
% 6 Oct 2005
% ---------------------------------------------------------------------
% IF YOU HAVEN'T WORKED THRU REVIEWTUT.M ALREADY YOU SHOULD DO THAT FIRST.
% THEN JUST START READING AND FOLLOW THE INSTRUCTIONS AS YOU GO...
% ---------------------------------------------------------------------
% Hey gang,
% Here's that statistics lowdown I promised that hopefully may help lend some
% intuition about what covariance is, and also about the difference between
% sample and population statistics, which was part of the point in lecture
% this morning too.
% (Eg "why aren't mtilde and mtrue equal?")
% It's about as long as reviewtut.m, so it may take an hour or so to work thru,
% but hopefully you may find this useful if you're not quite clear on covariance
% and other statistics concepts. As before, the approach with this file is to
% uncomment lines as you go, and run and change this file just as you did with
% reviewtut.m last week...
% Cheers! -Andy
% Note you'll want to read and uncomment as you go like in reviewtut.m -
% the first part below is uncommented already for you...
% Say I had a "true" timeseries signal (data) that I'm adding noise to:
t=(1:.1:10); % 91 time points
% and let's create some "true" data values for each of the 91 time points:
datapopmean=1+10*t-5*t.^2; % This signal is a "population mean" not "sample mean"
% of the data. In real problems we don't know the pop mean,
% or the true signal, since what we measure has noise on it.
% But to learn about this stuff we've made up our own true
% signal here, so we know the pop mean here.
% Let's add some gaussian noise that has the same characteristics for all data pts:
datanoise1=randn(1,91)*10; % ie mean=0 and stdev=10, same for all 91 data pts
% note this noise is uncorrelated - those are independent fluctuations on each data point.
datameasured=datapopmean+datanoise1;
% Plot both the true and noisy signals:
subplot(2,1,1);
plot(t,datameasured); title('datameasured, ie datapopmean + datanoise1');
subplot(2,1,2);
plot(t,datanoise1); title('just datanoise1 alone')
% We see in the plots that we have a 91 point timeseries with fluctuations on each point.
% You really must see that plot to follow the matrix/random-variables discussion below:
% To make things easy above, we obtained all 91 noise points at once from the randn()
% function, but note that's really only one sample of noise for each of the 91 random
% variables. That is, the value of the data at each of the 91 times t is a
% random variable (abbreviated "RV").
% Try recreating datanoise1 and replotting several times, to see that the instances
% of the noise - the "samples" - are different each time although they're in the
% same range since the noise statistics are the same.
% So in the above, each datanoise1(i) is an RV (datanoise1 as a whole has 91 elements).
% Similarly, datameasured(i) is an RV too because an RV plus a deterministic value
% equals an RV (datapopmean(i) was deterministic not random) .
% Okay, now let's just talk about the datanoise1 RV's to learn about covariance.
% If instead of one sample of noise for each of the 91 random variables, we had a
% bunch of samples for each of the 91 data values, I could estimate the covariances
% via the Matlab function cov(). I say "estimate" because we can only do so well
% with a finite number of datanoise(i) samples to try and find the statistics of
% the whole infinite population of datanoise(i).
% So let's make 1000 samples of the 91 RVs, and keep them all in a matrix for the
% sake of tidiness. So this matrix is 1000 down by 91 across - each column contains
% a vector of 1000 samples of the one RV for a given time point.
% datanoise1lots=randn(1000,91)*10; % still have mean=0 and stdev=10 for all 91 RVs
% Plot the covariance estimate for the 91 datanoise RVs based on those 1000 samples:
% close; % gets rid of previous figure
% imagesc(cov(datanoise1lots)); colorbar;
% In this plot you should see a pretty good approximation of an identity matrix "I"
% times our constant variance of 100 ( = stdev^2 = 10^2 which we chose above).
% It's not exactly 100*I because we didn't have an infinite number of samples.
% So our computed approximation here is the sample covariance, while 100*I is the
% population covariance. Try redoing this with 100 or even 10 samples, and see if
% the covariance matrix is a better or worse approximation of 100*I.
% Okay, now let's step up in complexity a tiny bit.
% Now instead of the same stats for all 91 RVs, let's say the stdevs (sqrt of cov)
% are smaller near the first time point and larger near the 91st time point:
% datanoise2=randn(1,91).*(1:.1:10); % ie stdev on pt.1 is 1, stdev on pt.91 is 10
% Plot it to see what it looks like:
% plot(datanoise2);
% Recreate datanoise2 and its plot several times to see that the samples are different
% although they're in the same range, eg their spread is wider at the end.
% And just like before let's make 1000 samples of each of those 91 RVs to find cov:
% datanoise2lots=randn(1000,91)*diag(1:.1:10);
% And let's look at these covariances:
% imagesc(cov(datanoise2lots)); colorbar;
% This should again be very close to diagonal, but the variances increase down the diagonal.
% Last step in complexity now.
% In the above two cases, we analyzed samples of 91 *independent* RVs. So the
% covariance estimates were pretty close to diagonal. As independent RVs, we can
% say that knowing for example datanoise(14) doesn't give us any clue if datanoise(15)
% may be close or positive or negative. Whereas if there were some dependence between
% the 91 RVs, some "correlation", then knowing datanoise(14)=5.231 might let us know
% that datanoise(15) is likely pretty close to 5.231 rather than say = -2.508. See?
%
% Turns out if we smooth our noise, s we introduce such a correlation, which makes sense
% intuitively too - if it's smoothed and we know datanoise(14)=5.231, then in a smooth
% datanoise vector we'd expect datanoise(15) to be pretty close to 5.231 as well.
%
% To do the smoothing I'll "convolve with a gaussian kernel" like in reviewtut.m.
% Look at the plot to see that it smoothes the noise:
% x=-10:10;
% s=4.5;
% g=exp(-x.^2/2/s^2)/sqrt(2*pi)/s; % the gaussian kernel
% datanoise1=randn(1,91)*10;
% datanoise1filt=conv(datanoise1,g);
% subplot(2,1,1);
% plot(datanoise1); title('datanoise1');
% subplot(2,1,2);
% plot(datanoise1filt); title('datanoise1filt (ie smoothed version of above noise)');
% Try recreating that datanoise1 and datanoise1filt and their plot several times,
% to see how that works.
% Now let's do the 1000 samples again and estimate a covariance matrix to see
% how those correlations appear in there:
% datanoise1lots=randn(1000,91)*10;
% disp('sorry, this will take a minute or two to compute the 1000 convolutions...');
% for i=1:1000
% datanoise1filt(i,:)=conv(datanoise1lots(i,:),g);
% end
% close; % close divided-up plot window
% imagesc(cov(datanoise1filt)); colorbar;
% See how the covariance matrix is no longer diagonal? We see from it that
% the values of each of the 91 RVs vary in a way that is particularly
% correlated with the 10 or so RVs on either side of it. (Compare that to
% the width of the Gaussian filter kernel, by the way.)
% One catch here with my convolution approach - see how things broke down
% toward RVs 1 and 91? Ie those results are kindof screwy - this is because
% my convolution approach has edge effects troubles, so pretend you didn't
% see that. ;)
% Those of you who are statistics buffs may point out that by my filtering
% I've changed the number of degrees of freedom and so biased my estimate
% of the covariance matrix - yeah yeah yeah, but that's a small effect since
% we've 1000 samples here.
% If you have a few more minutes, you should try this stuff over again with
% larger stdev values to see what that does to the covariance matrix plots.
% And lastly, see if you can set up the filtered noise covariance example
% on the case above where the stdevs increase from the 1st to 91st RV. Just
% combine what we did farther above re the increasing stdevs with the filtering.
% Okay okay, one more last thing to try yourself, to learn more about the
% difference between statistics of a sample set from an RV, vs the population
% statistics which we can only estimate if we have a finite # of samples.
% Choose a mean and stdev with which to create 100 random samples using randn().
% Use Matlab's mean() and std() functions to find the sample mean and sample
% stdev of those 100 samples. Are they the same as your original mean and
% stdev you used to create the samples? If you have more or less than 100
% samples, do the sample mean and stdev get closer or farther from the original?
% It turns out those sample means are actually Gaussian RVs themselves -
% if you create a loop to recreate these 100 samples say 500 times and thus
% get 500 means via mean(), and plot those in a histogram as we learned in
% reviewtut.m, you should see a bell curve. What is the mean of that bell
% curve? (Ie the mean of the means!) How does the stdev of that bell curve
% change with the number of samples? (Turns out the answer is 1/sqrt(N)).
% At any rate, this is why when we rerun our gravity problem and create
% new noise on our datatrue each time, and get an estimate of the 3 model
% parameters each run, those estimates will vary a bit each run.
% If we reran this 100 times, we would expect that roughly 95 of those
% times we would see the 95% confidence ellipse encompass the true
% parameter values (mtrue). The mtrue value is the same every run, but
% the mest value and the ellipse will move around a bit each run.