Aqua Phoenix
     >>  Lectures >>  Matlab 10  
 

Navigator
   
 
       
   

10.1 Probability Distributions

This section discusses two basic probability density functions and probability distributions: uniform and normal, Gaussian mixture models, and GMM curve fitting.

10.1.1 Common PDFs

Uniform probability density functions can be generated using function unifpdf . Given a range x, and a left and right endpoint, unifpdf distributes probabilities uniformly over x.

unifpdf(x, a, b) : x = vector of range (including granularity), a = left endpoint, b = right endpoint

x = -10:10;
pdfUniform = unifpdf(x, -5, 5);
plot(x, pdfUniform);
Figure 10.1
Click image to enlarge, or click here to open
Normal probability density functions are generated using function normpdf . Characteristic of a normal distribution are mean and standard deviation.

normpdf(x, mean, std) : x = vector of range (including granularity)

x = -15:0.1:25;
mu = 3;
sigma = 4;
pdfNormal = normpdf(x, mu, sigma);
plot(x, pdfNormal);
Figure 10.2
Click image to enlarge, or click here to open

10.1.2 Randomly generated PDFs

unifpdf and normpdf generate "perfect" densities; however, typical data observations only fit these distributions approximately. To simulate these situtations, Matlab offers functions for random number generation for both uniform and normal distributions.

Function rand generates uniformly distributed random values between 0 and 1.

rand(rows, columns) : matrix with rows and columns of random values.

pdfUniform = rand(1, 10000);
subplot(2,1,1), hist(pdfUniform), title('Uniform distribution, histogram');
subplot(2,1,2), hist(pdfUniform, 100), title('Uniform distribution, 100 bin histogram');
Figure 10.3
Click image to enlarge, or click here to open
To fit a different range of x-values, we can shift and scale the random value matrix accordingly:

pdfUniformSS = pdfUniform * 10 + 100;
hist(pdfUniformSS, 100), title('Uniform distribution Shifted and Scaled, 100 bin histogram');
Figure 10.4
Click image to enlarge, or click here to open
Function randn generates normally distributed random values with mean=0 and standard deviation=0.

randn(rows, columns) : matrix with rows and columns random values.

pdfNormal = randn(1, 10000);
subplot(2,1,1), hist(pdfNormal), title('Normal distribution, histogram');
subplot(2,1,2), hist(pdfNormal, 100), title('Normal distribution, 100 bin histogram');
Figure 10.5
Click image to enlarge, or click here to open
A random normal distribution with mean mu and standard deviation std is obtained by shifting and scaling:

pdfMean = 5;
pdfStd = 7;
pdfNormalSS = randn(1, 10000) * pdfStd + pdfMean;
subplot(2,1,1), hist(pdfNormalSS), title('Normal distribution with Mean=5, Std=7');
subplot(2,1,2), hist(pdfNormalSS, 100), title('Normal distribution with Mean=5, Std=7');
Figure 10.6
Click image to enlarge, or click here to open
Evaluating the mean and standard deviation of the random distribution confirms their approximate values:

meanSS = mean(pdfNormalSS)
stdSS = std(pdfNormalSS)
Figure 10.7
Click image to enlarge, or click here to open

10.1.3 Fitting normal distributions

Given a normal distribution of observed data, we can determine its best fit to a Gaussian by taking its mean mu and standard deviation std, and then reconstructing it:

xRange = floor(min(pdfNormalSS)):0.1:ceil(max(pdfNormalSS));
pdfNormalFitted = normpdf(xRange, meanSS, stdSS);
We can verify by plotting a histogram of the original data and the fitted Gaussian. For proper comparison, both graphs must share the same x-range, and thus we use the function axis to retrieve and set axis properties.

h1 = subplot(2,1,1); hist(pdfNormalSS, 100);
h2 = subplot(2,1,2); plot(xRange, pdfNormalFitted);
a1 = axis(h1);
a2 = axis(h2);
axis(h2, [a1(1), a1(2), a2(3), a2(4)]);
Figure 10.8
Click image to enlarge, or click here to open
For a mixture of different normal distributions (a Gaussian Mixture Model), the correct parameters for mean and standard deviation for each Gaussian cannot be computed by simply taking mean and std of the entire data set. The observed data must be divided into several Gaussians, each of with its own mean and standard deviation. One approach for obtaining the paramaters for each Gaussian is to apply the algorithm for Expectation Maximization:

  em_1dim.m   Expectation Maximization - Approximation of Gaussian Mixture Models (GMM)
This requires an initial guess as to how many Gaussians are hidden in the distribution.

A GMM with 2 Gaussian distributions:

pdfGMM = [(randn(1, 10000) * 7 + 3), (randn(1, 10000) * 2 + 9)];
subplot(2,1,1), hist(pdfGMM, 100);
Figure 10.9
Click image to enlarge, or click here to open
Estimate mean and standard deviations for 2 Gaussians:

[em_thr,em_thr_behavior,P,meanV,stdV,pdf_x,xx,pdf_xx,cdf_xx] = em_1dim(pdfGMM, 2);
meanV
stdV
Figure 10.10
Click image to enlarge, or click here to open
To verify, we plot a histogram of the original data and the 2 Gaussian distributions:

h1 = subplot(2,1,1); hist(pdfGMM, 100);
h2 = subplot(2,1,2); plot(xx, normpdf(xx, meanV(1), stdV(1)), 'r');
hold on;
subplot(2,1,2), plot(xx, normpdf(xx, meanV(2), stdV(2)), 'g');
hold off;
a1 = axis(h1);
a2 = axis(h2);
axis(h2, [a1(1), a1(2), a2(3), a2(4)]);
Figure 10.11
Click image to enlarge, or click here to open