Function Reference: pca

statistics: coeff = pca (x)
statistics: coeff = pca (x, Name, Value)
statistics: [coeff, score, latent] = pca (…)
statistics: [coeff, score, latent, tsquared] = pca (…)
statistics: [coeff, score, latent, tsquared, explained, mu] = pca (…)

Performs a principal component analysis on a data matrix.

A principal component analysis of a data matrix of N observations in a D dimensional space returns a D×D transformation matrix, to perform a change of basis on the data. The first component of the new basis is the direction that maximizes the variance of the projected data.

Input argument:

  • x : a N×D data matrix

The following Name, Value pair arguments can be used:

  • "Algorithm" defines the algorithm to use:
    • "svd" (default), for singular value decomposition
    • "eig" for eigenvalue decomposition
  • "Centered" is a boolean indicator for centering the observation data. It is true by default.
  • "Economy" is a boolean indicator for the economy size output. It is true by default. Hence, pca returns only the elements of latent that are not necessarily zero, and the corresponding columns of coeff and score, that is, when N <= D, only the first N - 1.
  • "NumComponents" defines the number of components k to return. If k < p, then only the first k columns of coeff and score are returned.
  • "Rows" defines how to handle missing values:
    • "complete" (default), missing values are removed before computation.
    • "pairwise" (only valid when "Algorithm" is "eig"), the covariance of rows with missing data is computed using the available data, but the covariance matrix could be not positive definite, which triggers the termination of pca.
    • "complete", missing values are not allowed, pca terminates with an error if there are any.
  • "Weights" defines observation weights as a vector of positive values of length N.
  • "VariableWeights" defines variable weights:
    • a vector of positive values of length D.
    • the string "variance" to use the sample variance as weights.

Return values:

  • coeff : the principal component coefficients, a D×D transformation matrix
  • score : the principal component scores, the representation of x in the principal component space
  • latent : the principal component variances, i.e., the eigenvalues of the covariance matrix of x
  • tsquared : Hotelling’s T-squared Statistic for each observation in x
  • explained : the percentage of the variance explained by each principal component
  • mu : the estimated mean of each variable of x, it is zero if the data are not centered

Matlab compatibility note: the alternating least square method ’als’ and associated options ’Coeff0’, ’Score0’, and ’Options’ are not yet implemented

References

  1. Jolliffe, I. T., Principal Component Analysis, 2nd Edition, Springer, 2002

See also: barttest, factoran, pcacov, pcares

Source Code: pca