Function Reference: cophenet

statistics: [c, d] = cophenet (Z, y)

Compute the cophenetic correlation coefficient.

The cophenetic correlation coefficient C of a hierarchical cluster tree Z is the linear correlation coefficient between the cophenetic distances d and the euclidean distances y. $$ c = \frac {\sum_{i < j}(Y_{ij}-{\bar {y}})(Z_{ij}-{\bar{z}})} {\sqrt{\sum_{i < j}(Y_{ij}-{\bar {y}})^2(Z_{ij}-{\bar{z}})^2}} $$

It is a measure of the similarity between the distance of the leaves, as seen in the tree, and the distance of the original data points, which were used to build the tree. When this similarity is greater, that is the coefficient is closer to 1, the tree renders an accurate representation of the distances between the original data points.

Z is a hierarchical cluster tree, as the output of linkage. y is a vector of euclidean distances, as the output of pdist.

The optional output d is a vector of cophenetic distances, in the same lower triangular format as y. The cophenetic distance between two data points is the height of the lowest common node of the tree.

See also: cluster, dendrogram, inconsistent, linkage, pdist, squareform

Source Code: cophenet

Example: 1

 

 randn ("seed", 5)  # for reproducibility
 X = randn (10,2);
 y = pdist (X);
 Z = linkage (y, "average");
 cophenet (Z, y)

ans = 0.8025