Statistics: cvpartition

Class Definition: `cvpartition`

Class: cvpartition

Partition data for cross-validation

The cvpartition class generates a partitioning scheme on a dataset to facilitate cross-validation of statistical models utilizing training and testing subsets of the dataset.

See also: crossval

Source Code: cvpartition

Properties

A logical scalar specifying whether the cvpartition object was created using custom partition partitioning (true) or not (false). This property is read-only.

A logical scalar specifying whether the cvpartition object was created using grouping variables (true) or not (false). This property is read-only.

A logical scalar specifying whether the cvpartition object was created with a 'stratifyOption' value of true. This property is read-only.

A positive integer scalar specifying the number of observations in the dataset (including any missing data, where applicable). This property is read-only.

A positive integer scalar specifying the number of folds for partition types 'kfold' and 'leaveout'. When partition type is 'holdout' and 'resubstitution', then NumTestSets is 1. This property is read-only.

A positive integer scalar specifying the size of the test set for partition types 'holdout' and 'resubstitution' or a vector of positive integers specifying the size of each testing set for partition types 'kfold' and 'leaveout'. This property is read-only.

A positive integer scalar specifying the size of the train set for partition types 'holdout' and 'resubstitution' or a vector of positive integers specifying the size of each training set for partition types 'kfold' and 'leaveout'. This property is read-only.

A character vector specifying the type of the cvpartition object. It can be kfold, holdout, leaveout, or resubstitution. This property is read-only.

Methods

cvpartition: C = cvpartition (n, 'KFold')
cvpartition: C = cvpartition (n, 'KFold', k)
cvpartition: C = cvpartition (n, 'KFold', k, 'GroupingVariables', grpvars)
cvpartition: C = cvpartition (n, 'Holdout')
cvpartition: C = cvpartition (n, 'Holdout', p)
cvpartition: C = cvpartition (n, 'Leaveout')
cvpartition: C = cvpartition (n, 'Resubstitution')
cvpartition: C = cvpartition (X, 'KFold')
cvpartition: C = cvpartition (X, 'KFold', k)
cvpartition: C = cvpartition (X, 'KFold', k, 'Stratify', opt)
cvpartition: C = cvpartition (X, 'Holdout')
cvpartition: C = cvpartition (X, 'Holdout', p)
cvpartition: C = cvpartition (X, 'Holdout', p, 'Stratify', opt)
cvpartition: C = cvpartition ('CustomPartition', testSets)

Repartition data for cross-validation.

C = cvpartition (n, 'KFold') creates a cvpartition object C, which defines a random nonstratified partition for k-fold cross-validation on n observations with each fold (subsample) having approximately the same number of observations. The default number of folds is 10 for n >= 10 or equal to n otherwise.

C = cvpartition (n, 'KFold', k) also creates a nonstratified random partition for k-fold cross-validation with the number of folds defined by k, which must be a positive integer scalar smaller than the number of observations n.

C = cvpartition (n, 'KFold', k, 'GroupingVariables', grpvars) creates a cvpartition object C that defines a random partition for k-fold cross-validation with each fold containing the same combination of group labels as defined by grpvars. The grouping variables specified in grpvars can be one of the following:

A numeric vector, logical vector, categorical vector, character array, string array, or cell array of character vectors containing one grouping variable.
A numeric matrix or cell array containing two or more grouping variables. Each column in the matrix or array must correspond to one grouping variable.

C = cvpartition (n, 'Holdout') creates a cvpartition object C, which defines a random nonstratified partition for holdout validation on n observations. 90% of the observations are assigned to the training set and the remaining 10% to the test set.

C = cvpartition (n, 'Holdout', p) also creates a nonstratified random partition for holdout validation with the percentage of training and test sets defined by p, which can be a scalar value in the range $(0,1)$ or a positive integer scalar in the range $[1,$ n).

C = cvpartition (n, 'Leaveout') creates a cvpartition object C, which defines a random partition for leave-one-out cross-validation on n observations. This is a special case of k-fold cross-validation with the number of folds equal to the number of observations.

C = cvpartition (n, 'Resubstitution') creates a cvpartition object C without partitioning the data and both training and test sets containing all observations n.

C = cvpartition (X, 'KFold') creates a cvpartition object C, which defines a stratified random partition for k-fold cross-validation according to the class proportions in Χ. X can be a numeric, logical, categorical, or string vector, or a character array or a cell array of character vectors. Missing values in X are discarded. The default number of folds is 10 for numel (X) >= 10 or equal to numel (X) otherwise.

C = cvpartition (X, 'KFold', k) also creates a stratified random partition for k-fold cross-validation with the number of folds defined by k, which must be a positive integer scalar smaller than the number of observations in X.

C = cvpartition (X, 'KFold', k, 'Stratify', opt) creates a random partition for k-fold cross-validation, which is stratified if opt is true, or nonstratified if opt is false.

C = cvpartition (X, 'Holdout') creates a cvpartition object C, which defines a stratified random partition for holdout validation while maintaining the class proportions in Χ. 90% of the observations are assigned to the training set and the remaining 10% to the test set.

C = cvpartition (X, 'Holdout', p) also creates a stratified random partition for holdout validation with the percentage of training and test sets defined by p, which can be a scalar value in the range $(0,1)$ or a positive integer scalar in the range $[1,$ n).

C = cvpartition (X, 'Holdout', p, 'Stratify', opt) creates a random partition for holdout validation, which is stratified if opt is true, or nonstratified if opt is false.

C = cvpartition ('CustomPartition', testSets) creates a custom partition according to testSets, which can be a positive integer vector, a logical vector, or a logical matrix according to the following options:

A positive integer vector of length $n$ with values in the range $[1,k]$ , where $k < n$ , will specify a k-fold cross-validation partition, in which each value indicates the test set of each observation. Alternatively, the same vector with values in the range $[1,n]$ will specify a leave-one-out cross-validation.
A logical vector will specify a holdout validation, in which the true elements correspond to the test set and the false elements correspond to the traning set.
A logical matrix with $k$ columns will specify a k-fold cross-validation partition, in which each collumn corresponds to a fold and each row to an observation. Alternatively, an $n×n$ logical matrix will specify a leave-one-out cross-validation, where $n$ is the number of observations. true elements correspond to the test set and the false elements correspond to the traning set.

See also: cvpartition, summary, test, training

cvpartition: Cnew = repartition (C)

cvpartition: Cnew = repartition (C, sval)

cvpartition: Cnew = repartition (C, 'legacy')

Repartition data for cross-validation.

Cnew = repartition (C) creates a cvpartition object Cnew that defines a new random partition of the same type as the cvpartition C.

Cnew = repartition (C, sval) also uses the value of sval to set the state of the random generator used in repartitioning C. If sval is a vector, then the random generator is set using the "state" keyword as in rand ("state", sval). If sval is a scalar, then the "seed" keyword is used as in rand ("seed", sval) to specify that old generators should be used.

Cnew = repartition (C, 'legacy') only applies to cvpartition objects C that use k-fold partitioning and it will repartition C in the same non-random manner that was previously used by the old-style cvpartition class of the statistics package.

See also: cvpartition, summary, test, training

cvpartition: tbl = summary (C)

Summarize cross-validation partition.

tbl = summary (C) returns a summary table tbl of the cvpartition object C as long as its type is either k-fold or holdout and it is either stratified of grouped. This function requires support for the table class, which is provided by the datatypes package.

See also: cvpartition, repartition, test, training

cvpartition: idx = test (C)
cvpartition: idx = test (C, i)
cvpartition: idx = test (C, "all")

Test indices for cross-validation.

idx = test (C) returns a logical vector idx with true values indicating the elements corresponding to the test set defined in the code{cvpartition object C. For k-fold and leave-one-out partitions, the indices corresponding to the first test set are returned.

idx = test (C, i) returns a logical vector or matrix with the indices of the test set indicated by i. If i is a scalar, then idx is a logical vector with the indices of the $i-th$ set. If i is a vector, then idx is a logical matrix in which idx(:,j) specified the observations in the test set i(j). The value(s) in i must not excced the number of tests in the cvpartition object C.

idx = test (C, "all") returns a logical vector or matrix for all test sets defined in the cvpartition object C. For holdout and resubstitution partition types, a vector is returned. For k-fold and leave-one-out, a matrix is returned.

See also: cvpartition, repartition, summary, training

cvpartition: idx = training (C)
cvpartition: idx = training (C, i)
cvpartition: idx = training (C, "all")

Training indices for cross-validation.

idx = training (C) returns a logical vector idx with true values indicating the elements corresponding to the training set defined in the code{cvpartition object C. For k-fold and leave-one-out partitions, the indices corresponding to the first training set are returned.

idx = training (C, i) returns a logical vector or matrix with the indices of the training set indicated by i. If i is a scalar, then idx is a logical vector with the indices of the $i-th$ set. If i is a vector, then idx is a logical matrix in which idx(:,j) specified the observations in the training set i(j). The value(s) in i must not excced the number of tests in the cvpartition object C.

idx = training (C, "all") returns a logical vector or matrix for all training sets defined in the cvpartition object C. For holdout and resubstitution partition types, a vector is returned. For k-fold and leave-one-out, a matrix is returned.

See also: cvpartition, repartition, summary, test

Categories &

Functions List

Clustering

Clustering

Classification Classes

Classification Classes

Clustering Classes

Clustering Classes

Regression Classes

Regression Classes

Data Manipulation

Data Manipulation

Descriptive Statistics

Descriptive Statistics

Distribution Classes

Distribution Classes

Distribution Fitting

Distribution Fitting

Distribution Functions

Distribution Functions

Distribution Statistics

Distribution Statistics

Distribution Wrappers

Distribution Wrappers

Experimental Design

Experimental Design

Machine Learning

Machine Learning

Model Fitting

Model Fitting

Hypothesis Testing

Hypothesis Testing

I/O

I/O

Plotting

Plotting

Regression

Regression

Transforms

Transforms

Class Definition: cvpartition

Properties

IsCustom

IsGrouped

IsStratified

NumObservations

NumTestSets

TestSize

TrainSize

Type

Methods

cvpartition

repartition

summary

test

training

Class Definition: `cvpartition`

`IsCustom`

`IsGrouped`

`IsStratified`

`NumObservations`

`NumTestSets`

`TestSize`

`TrainSize`

`Type`

`cvpartition`

`repartition`

`summary`

`test`

`training`