Categories &

Functions List

Class Definition: cvpartition

Class: cvpartition

Partition data for cross-validation

The cvpartition class generates a partitioning scheme on a dataset to facilitate cross-validation of statistical models utilizing training and testing subsets of the dataset.

See also: crossval

Source Code: cvpartition

Properties

A logical scalar specifying whether the cvpartition object was created using custom partition partitioning (true) or not (false). This property is read-only.

A logical scalar specifying whether the cvpartition object was created using grouping variables (true) or not (false). This property is read-only.

A logical scalar specifying whether the cvpartition object was created with a 'stratifyOption' value of true. This property is read-only.

A positive integer scalar specifying the number of observations in the dataset (including any missing data, where applicable). This property is read-only.

A positive integer scalar specifying the number of folds for partition types 'kfold' and 'leaveout'. When partition type is 'holdout' and 'resubstitution', then NumTestSets is 1. This property is read-only.

A positive integer scalar specifying the size of the test set for partition types 'holdout' and 'resubstitution' or a vector of positive integers specifying the size of each testing set for partition types 'kfold' and 'leaveout'. This property is read-only.

A positive integer scalar specifying the size of the train set for partition types 'holdout' and 'resubstitution' or a vector of positive integers specifying the size of each training set for partition types 'kfold' and 'leaveout'. This property is read-only.

A character vector specifying the type of the cvpartition object. It can be kfold, holdout, leaveout, or resubstitution. This property is read-only.

Methods

cvpartition: C = cvpartition (n, 'KFold')
cvpartition: C = cvpartition (n, 'KFold', k)
cvpartition: C = cvpartition (n, 'KFold', k, 'GroupingVariables', grpvars)
cvpartition: C = cvpartition (n, 'Holdout')
cvpartition: C = cvpartition (n, 'Holdout', p)
cvpartition: C = cvpartition (n, 'Leaveout')
cvpartition: C = cvpartition (n, 'Resubstitution')
cvpartition: C = cvpartition (X, 'KFold')
cvpartition: C = cvpartition (X, 'KFold', k)
cvpartition: C = cvpartition (X, 'KFold', k, 'Stratify', opt)
cvpartition: C = cvpartition (X, 'Holdout')
cvpartition: C = cvpartition (X, 'Holdout', p)
cvpartition: C = cvpartition (X, 'Holdout', p, 'Stratify', opt)
cvpartition: C = cvpartition ('CustomPartition', testSets)

Repartition data for cross-validation.

C = cvpartition (n, 'KFold') creates a cvpartition object C, which defines a random nonstratified partition for k-fold cross-validation on n observations with each fold (subsample) having approximately the same number of observations. The default number of folds is 10 for n >= 10 or equal to n otherwise.

C = cvpartition (n, 'KFold', k) also creates a nonstratified random partition for k-fold cross-validation with the number of folds defined by k, which must be a positive integer scalar smaller than the number of observations n.

C = cvpartition (n, 'KFold', k, 'GroupingVariables', grpvars) creates a cvpartition object C that defines a random partition for k-fold cross-validation with each fold containing the same combination of group labels as defined by grpvars. The grouping variables specified in grpvars can be one of the following:

  • A numeric vector, logical vector, categorical vector, character array, string array, or cell array of character vectors containing one grouping variable.
  • A numeric matrix or cell array containing two or more grouping variables. Each column in the matrix or array must correspond to one grouping variable.

C = cvpartition (n, 'Holdout') creates a cvpartition object C, which defines a random nonstratified partition for holdout validation on n observations. 90% of the observations are assigned to the training set and the remaining 10% to the test set.

C = cvpartition (n, 'Holdout', p) also creates a nonstratified random partition for holdout validation with the percentage of training and test sets defined by p, which can be a scalar value in the range (0,1) or a positive integer scalar in the range [1,n).

C = cvpartition (n, 'Leaveout') creates a cvpartition object C, which defines a random partition for leave-one-out cross-validation on n observations. This is a special case of k-fold cross-validation with the number of folds equal to the number of observations.

C = cvpartition (n, 'Resubstitution') creates a cvpartition object C without partitioning the data and both training and test sets containing all observations n.

C = cvpartition (X, 'KFold') creates a cvpartition object C, which defines a stratified random partition for k-fold cross-validation according to the class proportions in Χ. X can be a numeric, logical, categorical, or string vector, or a character array or a cell array of character vectors. Missing values in X are discarded. The default number of folds is 10 for numel (X) >= 10 or equal to numel (X) otherwise.

C = cvpartition (X, 'KFold', k) also creates a stratified random partition for k-fold cross-validation with the number of folds defined by k, which must be a positive integer scalar smaller than the number of observations in X.

C = cvpartition (X, 'KFold', k, 'Stratify', opt) creates a random partition for k-fold cross-validation, which is stratified if opt is true, or nonstratified if opt is false.

C = cvpartition (X, 'Holdout') creates a cvpartition object C, which defines a stratified random partition for holdout validation while maintaining the class proportions in Χ. 90% of the observations are assigned to the training set and the remaining 10% to the test set.

C = cvpartition (X, 'Holdout', p) also creates a stratified random partition for holdout validation with the percentage of training and test sets defined by p, which can be a scalar value in the range (0,1) or a positive integer scalar in the range [1,n).

C = cvpartition (X, 'Holdout', p, 'Stratify', opt) creates a random partition for holdout validation, which is stratified if opt is true, or nonstratified if opt is false.

C = cvpartition ('CustomPartition', testSets) creates a custom partition according to testSets, which can be a positive integer vector, a logical vector, or a logical matrix according to the following options:

  • A positive integer vector of length n with values in the range [1,k], where k < n, will specify a k-fold cross-validation partition, in which each value indicates the test set of each observation. Alternatively, the same vector with values in the range [1,n] will specify a leave-one-out cross-validation.
  • A logical vector will specify a holdout validation, in which the true elements correspond to the test set and the false elements correspond to the traning set.
  • A logical matrix with k columns will specify a k-fold cross-validation partition, in which each collumn corresponds to a fold and each row to an observation. Alternatively, an n×n logical matrix will specify a leave-one-out cross-validation, where n is the number of observations. true elements correspond to the test set and the false elements correspond to the traning set.

See also: cvpartition, summary, test, training

cvpartition: Cnew = repartition (C)
cvpartition: Cnew = repartition (C, sval)
cvpartition: Cnew = repartition (C, 'legacy')

Repartition data for cross-validation.

Cnew = repartition (C) creates a cvpartition object Cnew that defines a new random partition of the same type as the cvpartition C.

Cnew = repartition (C, sval) also uses the value of sval to set the state of the random generator used in repartitioning C. If sval is a vector, then the random generator is set using the "state" keyword as in rand ("state", sval). If sval is a scalar, then the "seed" keyword is used as in rand ("seed", sval) to specify that old generators should be used.

Cnew = repartition (C, 'legacy') only applies to cvpartition objects C that use k-fold partitioning and it will repartition C in the same non-random manner that was previously used by the old-style cvpartition class of the statistics package.

See also: cvpartition, summary, test, training

cvpartition: tbl = summary (C)

Summarize cross-validation partition.

tbl = summary (C) returns a summary table tbl of the cvpartition object C as long as its type is either k-fold or holdout and it is either stratified of grouped. This function requires support for the table class, which is provided by the datatypes package.

See also: cvpartition, repartition, test, training

cvpartition: idx = test (C)
cvpartition: idx = test (C, i)
cvpartition: idx = test (C, "all")

Test indices for cross-validation.

idx = test (C) returns a logical vector idx with true values indicating the elements corresponding to the test set defined in the code{cvpartition object C. For k-fold and leave-one-out partitions, the indices corresponding to the first test set are returned.

idx = test (C, i) returns a logical vector or matrix with the indices of the test set indicated by i. If i is a scalar, then idx is a logical vector with the indices of the i-th set. If i is a vector, then idx is a logical matrix in which idx(:,j) specified the observations in the test set i(j). The value(s) in i must not excced the number of tests in the cvpartition object C.

idx = test (C, "all") returns a logical vector or matrix for all test sets defined in the cvpartition object C. For holdout and resubstitution partition types, a vector is returned. For k-fold and leave-one-out, a matrix is returned.

See also: cvpartition, repartition, summary, training

cvpartition: idx = training (C)
cvpartition: idx = training (C, i)
cvpartition: idx = training (C, "all")

Training indices for cross-validation.

idx = training (C) returns a logical vector idx with true values indicating the elements corresponding to the training set defined in the code{cvpartition object C. For k-fold and leave-one-out partitions, the indices corresponding to the first training set are returned.

idx = training (C, i) returns a logical vector or matrix with the indices of the training set indicated by i. If i is a scalar, then idx is a logical vector with the indices of the i-th set. If i is a vector, then idx is a logical matrix in which idx(:,j) specified the observations in the training set i(j). The value(s) in i must not excced the number of tests in the cvpartition object C.

idx = training (C, "all") returns a logical vector or matrix for all training sets defined in the cvpartition object C. For holdout and resubstitution partition types, a vector is returned. For k-fold and leave-one-out, a matrix is returned.

See also: cvpartition, repartition, summary, test