Categories &

Functions List

Class Definition: ClassificationGAM

statistics: ClassificationGAM

Generalized additive model classification

The ClassificationGAM class implements a gradient boosting algorithm for classification, using spline fitting as the weak learner. This approach allows the model to capture non-linear relationships between predictors and the binary response variable.

Generalized additive model classification is a statistical method that extends linear models by allowing non-linear relationships between each predictor and the response variable through smooth functions. It combines the interpretability of linear models with the flexibility of non-parametric methods.

Create a ClassificationGAM object by using the fitcgam function or the class constructor.

See also: fitcgam

Source Code: ClassificationGAM

Properties

A numeric matrix containing the unstandardized predictor data. Each column of X represents one predictor (variable), and each row represents one observation. This property is read-only.

Specified as a logical or numeric column vector, or as a character array or a cell array of character vectors with the same number of rows as the predictor data. Each row in Y is the observed class label for the corresponding row in X. This property is read-only.

A positive integer value specifying the number of observations in the training dataset used for training the ClassificationGAM model. This property is read-only.

A logical column vector with the same length as the observations in the original predictor data X specifying which rows have been used for fitting the ClassificationGAM model. This property is read-only.

A positive integer value specifying the number of predictors in the training dataset used for training the ClassificationGAM model. This property is read-only.

A cell array of character vectors specifying the names of the predictor variables. The names are in the order in which the appear in the training dataset. This property is read-only.

A character vector specifying the name of the response variable Y. This property is read-only.

An array of unique values of the response variable Y, which has the same data types as the data in Y. This property is read-only. ClassNames can have any of the following datatypes:

  • Cell array of character vectors
  • Character array
  • Logical vector
  • Numeric vector

A square matrix specifying the cost of misclassification of a point. Cost(i,j) is the cost of classifying a point into class j if its true class is i (that is, the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns in Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. By default, Cost(i,j) = 1 if i != j, and Cost(i,j) = 0 if i = j. In other words, the cost is 0 for correct classification and 1 for incorrect classification.

Add or change the Cost property using dot notation as in:

  • obj.Cost = costMatrix

A 2-element numeric vector specifying the prior probabilities for each class. The order of the elements in Prior corresponds to the order of the classes in ClassNames. This property is read-only.

Specified as a function handle for transforming the classification scores. Add or change the ScoreTransform property using dot notation as in:

  • obj.ScoreTransform = 'function_name'
  • obj.ScoreTransform = @function_handle

When specified as a character vector, it can be any of the following built-in functions. Nevertherless, the ScoreTransform property always stores their function handle equivalent.

ValueDescription
"doublelogit"1 ./ (1 + e×p .^ (-2××))
"invlogit"log (× ./ (1 -×))
"ismax"Sets the score for the class with the largest score to 1, and for all other classes to 0
"logit"1 ./ (1 + e×p .^ (-×))
"none"× (no transformation)
"identity"× (no transformation)
"sign"-1 for× < 0, 0 for× = 0, 1 for× > 0
"symmetric"2×× + 1
"symmetricismax"Sets the score for the class with the largest score to 1, and for all other classes to -1
"symmetriclogit"2 ./ (1 + e×p .^ (-×)) - 1

A character vector specifying the model formula in the form "Y ~ terms" where Y represents the response variable and terms specifies the predictor variables and interaction terms. This property is read-only.

A logical matrix, positive integer scalar, or character vector "all" specifying the interaction terms between predictor variables. This property is read-only.

A scalar or row vector specifying the number of knots for each predictor variable in the spline fitting. This property is read-only.

A scalar or row vector specifying the order of the spline for each predictor variable. This property is read-only.

A scalar or row vector specifying the degrees of freedom for each predictor variable in the spline fitting. This property is read-only.

A scalar value between 0 and 1 specifying the learning rate used in the gradient boosting algorithm. This property is read-only.

A positive integer specifying the maximum number of iterations for the gradient boosting algorithm. This property is read-only.

A structure containing the parameters of the base model without any interaction terms. The base model represents the generalized additive model with only the main effects (predictor terms) included. This property is read-only.

A structure containing the parameters of the model that includes interaction terms. This model extends the base model by adding interaction terms between predictors. This property is read-only.

A logical matrix or matrix of column indices describing the interaction terms applied to the predictor data. This property is read-only.

Methods

statistics: obj = ClassificationGAM (X, Y)
statistics: obj = ClassificationGAM (…, name, value)

obj = ClassificationGAM (X, Y) returns a ClassificationGAM object, with X as the predictor data and Y containing the class labels of observations in X.

  • X must be a N×P numeric matrix of input data where rows correspond to observations and columns correspond to features or variables. X will be used to train the GAM model.
  • Y is N×1 matrix or cell matrix containing the class labels of corresponding predictor data in X. Y can contain any type of categorical data. Y must have the same number of rows as X.

obj = ClassificationGAM (…, name, value) returns a ClassificationGAM object with parameters specified by the following name, value paired input arguments:

NameValue
'PredictorNames'A cell array of character vectors specifying the names of the predictors. The length of this array must match the number of columns in X.
'ResponseName'A character vector specifying the name of the response variable.
'ClassNames'Names of the classes in the class labels, Y, used for fitting the GAM model. ClassNames are of the same type as the class labels in Y.
'Cost'An N×R numeric matrix containing misclassification cost for the corresponding instances in X, where R is the number of unique categories in Y. If an instance is correctly classified into its category the cost is calculated to be 1, otherwise 0. The cost matrix can be altered by using Mdl.cost = somecost. By default, its value is cost = ones (rows (X), numel (unique (Y))).
'Prior'A numeric vector specifying the prior probabilities for each class. The order of the elements in Prior corresponds to the order of the classes in ClassNames. Alternatively, you can specify "empirical" to use the empirical class probabilities or "uniform" to assume equal class probabilities.
'ScoreTransform'A user-defined function handle or a character vector specifying one of the following builtin functions specifying the transformation applied to predicted classification scores. Supported values include 'doublelogit', 'invlogit', 'ismax', 'logit', 'none', 'identity', 'sign', 'symmetric', 'symmetricismax', and 'symmetriclogit'.
'Formula'A character vector specifying the model formula in the form "Y ~ terms" where Y represents the response variable and terms specifies the predictor variables and interaction terms.
'Interactions'A logical matrix, a positive integer scalar, or the string "all" for defining the interactions between predictor variables.
'Knots'A scalar or row vector specifying the number of knots for each predictor variable in the spline fitting.
'Order'A scalar or row vector specifying the order of the spline for each predictor variable.
'DoF'A scalar or row vector specifying the degrees of freedom for each predictor variable in the spline fitting.
'LearningRate'A scalar value between 0 and 1 specifying the learning rate used in the gradient boosting algorithm.
'NumIterations'A positive integer specifying the maximum number of iterations for the gradient boosting algorithm.

See also: fitcgam

ClassificationGAM: label = predict (obj, XC)
ClassificationGAM: [label, score] = predict (obj, XC)
ClassificationGAM: [label, score] = predict (…, 'IncludeInteractions', includeInteractions)

label = predict (obj, XC) returns the predicted labels for the data in XC based on the model stored in the ClassificationGAM object, obj.

[label, score] = predict (obj, XC) also returns score, which contains the predicted class scores or posterior probabilities for each observation.

[label, score] = predict (obj, XC, 'IncludeInteractions', includeInteractions) allows you to specify whether interaction terms should be included when making predictions.

  • obj must be a ClassificationGAM class object.
  • XC must be an M×P numeric matrix where each row is an observation and each column corresponds to a predictor variable.
  • includeInteractions is a logical scalar indicating whether to include interaction terms in the predictions.

See also: ClassificationGAM, fitcgam

ClassificationGAM: CVMdl = crossval (obj)
ClassificationGAM: CVMdl = crossval (…, name, value)

CVMdl = crossval (obj) returns a cross-validated model object, CVMdl, from a trained model, obj, using 10-fold cross-validation by default.

CVMdl = crossval (obj, name, value) specifies additional name-value pair arguments to customize the cross-validation process.

NameValue
"KFold"Specify the number of folds to use in k-fold cross-validation. "KFold", k, where k is an integer greater than 1.
"Holdout"Specify the fraction of the data to hold out for testing. "Holdout", p, where p is a scalar in the range (0,1).
"Leaveout"Specify whether to perform leave-one-out cross-validation. "Leaveout", Value, where Value is ’on’ or ’off’.
"CVPartition"Specify a cvpartition object used for cross-validation. "CVPartition", cv, where isa (cv, "cvpartition") = 1.

See also: fitcgam, ClassificationGAM, cvpartition, ClassificationPartitionedModel

ClassificationGAM: CVMdl = compact (obj)

CVMdl = compact (obj) creates a compact version of the ClassificationGAM object, obj.

See also: fitcgam, ClassificationGAM, CompactClassificationGAM

ClassificationGAM: savemodel (obj, filename)

savemodel (obj, filename) saves each property of a ClassificationGAM object into an Octave binary file, the name of which is specified in filename, along with an extra variable, which defines the type classification object these variables constitute. Use loadmodel in order to load a classification object into Octave’s workspace.

See also: loadmodel, fitcgam, ClassificationGAM

Examples

 
 X = [1, 2; 2, 3; 3, 3; 4, 5; 5, 5; ...
     6, 7; 7, 8; 8, 8; 9, 9; 10, 10];
 Y = [0; 0; 0; 0; 0; ...
     1; 1; 1; 1; 1];

 ## Train the GAM model
 obj = fitcgam (X, Y, "Interactions", "all")

 ## Create a grid of values for prediction
 x1 = [min(X(:,1)):0.1:max(X(:,1))];
 x2 = [min(X(:,2)):0.1:max(X(:,2))];
 [x1G, x2G] = meshgrid (x1, x2);
 XGrid = [x1G(:), x2G(:)];
 [labels, score] = predict (obj, XGrid);
 
obj =

  ClassificationGAM

             ResponseName: 'Y'
               ClassNames: {'0' '1'}
           ScoreTransform: 'none'
          NumObservations: 10
            NumPredictors: 2
             Interactions: 'all'