Statistics: silhouette

Function Reference: `silhouette`

statistics: silhouette (X, clust)
statistics: [si, h] = silhouette (X, clust)
statistics: [si, h] = silhouette (…, Metric, MetricArg)

Compute the silhouette values of clustered data and show them on a plot.

X is a n-by-p matrix of n data points in a p-dimensional space. Each datapoint is assigned to a cluster using clust, a vector of n elements, one cluster assignment for each data point.

Each silhouette value of si, a vector of size n, is a measure of the likelihood that a data point is accurately classified to the right cluster. Defining "a" as the mean distance between a point and the other points from its cluster, and "b" as the mean distance between that point and the points from other clusters, the silhouette value of the i-th point is:

$$ S_i = \frac{b_i - a_i}{max(a_1,b_i)} $$

Each element of si ranges from -1, minimum likelihood of a correct classification, to 1, maximum likelihood.

Optional input value Metric is the metric used to compute the distances between data points. Since silhouette uses pdist to compute these distances, Metric is similar to the Distance input argument of pdist and it can be:

A known distance metric defined as a string: euclidean, squaredeuclidean (default), seuclidean, mahalanobis, cityblock, minkowski, chebychev, cosine, correlation, hamming, jaccard, or spearman.
A vector as those created by pdist. In this case X does nothing.
A function handle that is passed to pdist with MetricArg as optional inputs.

Optional return value h is a handle to the silhouette plot.

Reference Peter J. Rousseeuw, Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis. 1987. doi:10.1016/0377-0427(87)90125-7

See also: dendrogram, evalclusters, kmeans, linkage, pdist

Source Code: silhouette

Example: 1


 load fisheriris;
 X = meas(:,3:4);
 cidcs = kmeans (X, 3, "Replicates", 5);
 silhouette (X, cidcs);
 y_labels(cidcs([1 51 101])) = unique (species);
 set (gca, "yticklabel", y_labels);
 title ("Fisher's iris data");

Categories &

Functions List

Clustering

Clustering

Classification Classes

Classification Classes

Clustering Classes

Clustering Classes

Regression Classes

Regression Classes

Data Manipulation

Data Manipulation

Descriptive Statistics

Descriptive Statistics

Distribution Classes

Distribution Classes

Distribution Fitting

Distribution Fitting

Distribution Functions

Distribution Functions

Distribution Statistics

Distribution Statistics

Distribution Wrappers

Distribution Wrappers

Experimental Design

Experimental Design

Machine Learning

Machine Learning

Model Fitting

Model Fitting

Hypothesis Testing

Hypothesis Testing

I/O

I/O

Plotting

Plotting

Regression

Regression

Transforms

Transforms

Function Reference: silhouette

Example: 1

Function Reference: `silhouette`