fitgmdist
Fit a Gaussian mixture model with k components to data. Each row of data is a data sample. Each column is a variable.
Optional parameters are:
"start"
: Initialization conditions. Possible values are:
"randSample"
(default) Takes means uniformly from rows of data.
"plus"
Use k-means++ to initialize means.
"cluster"
Performs an initial clustering with 10% of the data.
mu
, Sigma
and
ComponentProportion
.
For "randSample"
, "plus"
, and "cluster"
, the initial
variance of each component is the variance of the entire data sample.
"Replicates"
: Number of random restarts to perform.
"RegularizationValue"
or "Regularize"
: A small number
added to the diagonal entries of the covariance to prevent singular
covariances.
"SharedCovariance"
or "SharedCov"
(logical). True if
all components must share the same variance, to reduce the number of free
parameters
"CovarianceType"
or "CovType"
(string). Possible values
are:
"full"
(default) Allow arbitrary covariance matrices.
"diagonal"
Force covariances to be diagonal, to reduce the
number of free parameters.
"Options"
: A structure with all of the following fields:
MaxIter
Maximum number of EM iterations (default 100).
TolFun
Threshold increase in likelihood to terminate EM
(default 1e-6).
Display
Possible values are:
"off"
(default): Display nothing.
"final"
: Display the total number of iterations and likelihood
once the execution completes.
"iter"
: Display the number of iteration and likelihood after
each iteration.
"Weight"
: A column vector or matrix. The first
column consists of non-negative weights given to the samples. If these are
all integers, this is equivalent to specifying weight(i)
copies
of row i
of data, but potentially faster. If a row of
data is used to represent samples that are similar but not identical,
then the second column of weight indicates the variance of those
original samples. Specifically, in the EM algorithm, the contribution of row
i
towards the variance is set to at least weight(i,2)
,
to prevent spurious components with zero variance.
See also: gmdistribution, kmeans
Source Code: fitgmdist
## Generate a two-cluster problem C1 = randn (100, 2) + 2; C2 = randn (100, 2) - 2; data = [C1; C2]; ## Perform clustering GMModel = fitgmdist (data, 2); ## Plot the result figure [heights, bins] = hist3([C1; C2]); [xx, yy] = meshgrid(bins{1}, bins{2}); bbins = [xx(:), yy(:)]; contour (reshape (GMModel.pdf (bbins), size (heights))); |
Angle_Theta = [ 30 + 10 * randn(1, 10), 60 + 10 * randn(1, 10) ]'; nbOrientations = 2; initial_orientations = [38.0; 18.0]; initial_weights = ones (1, nbOrientations) / nbOrientations; initial_Sigma = 10 * ones (1, 1, nbOrientations); start = struct ("mu", initial_orientations, "Sigma", initial_Sigma, ... "ComponentProportion", initial_weights); GMModel_Theta = fitgmdist (Angle_Theta, nbOrientations, "Start", start , ... "RegularizationValue", 0.0001) Gaussian mixture distribution with 2 components in 1 dimension(s) Clust 1: weight 0.701113 Mean: 50.5551 Variance:135.42 Clust 2: weight 0.298887 Mean: 19.3242 Variance:23.764 AIC=175.832 BIC=180.811 NLogL=82.9162 Iter=10 Cged=1 Reg=0.0001 |