ClassificationGAM
statistics: ClassificationGAM
Generalized additive model classification
The ClassificationGAM class implements a gradient boosting algorithm
for classification, using spline fitting as the weak learner. This approach
allows the model to capture non-linear relationships between predictors and
the binary response variable.
Generalized additive model classification is a statistical method that extends linear models by allowing non-linear relationships between each predictor and the response variable through smooth functions. It combines the interpretability of linear models with the flexibility of non-parametric methods.
Create a ClassificationGAM object by using the fitcgam
function or the class constructor.
See also: fitcgam
Source Code: ClassificationGAM
A numeric matrix containing the unstandardized predictor data. Each column of X represents one predictor (variable), and each row represents one observation. This property is read-only.
Specified as a logical or numeric column vector, or as a character array or a cell array of character vectors with the same number of rows as the predictor data. Each row in Y is the observed class label for the corresponding row in X. This property is read-only.
A positive integer value specifying the number of observations in the training dataset used for training the ClassificationGAM model. This property is read-only.
A logical column vector with the same length as the observations in the original predictor data X specifying which rows have been used for fitting the ClassificationGAM model. This property is read-only.
A positive integer value specifying the number of predictors in the training dataset used for training the ClassificationGAM model. This property is read-only.
A cell array of character vectors specifying the names of the predictor variables. The names are in the order in which the appear in the training dataset. This property is read-only.
A character vector specifying the name of the response variable Y. This property is read-only.
An array of unique values of the response variable Y, which has the
same data types as the data in Y. This property is read-only.
ClassNames can have any of the following datatypes:
A square matrix specifying the cost of misclassification of a point.
Cost(i,j) is the cost of classifying a point into class j
if its true class is i (that is, the rows correspond to the true
class and the columns correspond to the predicted class). The order of
the rows and columns in Cost corresponds to the order of the
classes in ClassNames. The number of rows and columns in
Cost is the number of unique classes in the response. By
default, Cost(i,j) = 1 if i != j, and
Cost(i,j) = 0 if i = j. In other words, the cost is 0
for correct classification and 1 for incorrect classification.
Add or change the Cost property using dot notation as in:
obj.Cost = costMatrix
A 2-element numeric vector specifying the prior probabilities for each
class. The order of the elements in Prior corresponds to the
order of the classes in ClassNames. This property is read-only.
Specified as a function handle for transforming the classification
scores. Add or change the ScoreTransform property using dot
notation as in:
obj.ScoreTransform = 'function_name'
obj.ScoreTransform = @function_handle
When specified as a character vector, it can be any of the following
built-in functions. Nevertherless, the ScoreTransform property
always stores their function handle equivalent.
| Value | Description | |
|---|---|---|
"doublelogit" | ||
"invlogit" | ||
"ismax" | Sets the score for the class with the largest score to 1, and for all other classes to 0 | |
"logit" | ||
"none" | (no transformation) | |
"identity" | (no transformation) | |
"sign" | ||
"symmetric" | ||
"symmetricismax" | Sets the score for the class with the largest score to 1, and for all other classes to -1 | |
"symmetriclogit" |
A character vector specifying the model formula in the form
"Y ~ terms" where Y represents the response variable and
terms specifies the predictor variables and interaction terms.
This property is read-only.
A logical matrix, positive integer scalar, or character vector
"all" specifying the interaction terms between predictor
variables. This property is read-only.
A scalar or row vector specifying the number of knots for each predictor variable in the spline fitting. This property is read-only.
A scalar or row vector specifying the order of the spline for each predictor variable. This property is read-only.
A scalar or row vector specifying the degrees of freedom for each predictor variable in the spline fitting. This property is read-only.
A scalar value between 0 and 1 specifying the learning rate used in the gradient boosting algorithm. This property is read-only.
A positive integer specifying the maximum number of iterations for the gradient boosting algorithm. This property is read-only.
A structure containing the parameters of the base model without any interaction terms. The base model represents the generalized additive model with only the main effects (predictor terms) included. This property is read-only.
A structure containing the parameters of the model that includes interaction terms. This model extends the base model by adding interaction terms between predictors. This property is read-only.
A logical matrix or matrix of column indices describing the interaction terms applied to the predictor data. This property is read-only.
statistics: obj = ClassificationGAM (X, Y)
statistics: obj = ClassificationGAM (…, name, value)
obj = ClassificationGAM (X, Y) returns
a ClassificationGAM object, with X as the predictor data
and Y containing the class labels of observations in X.
X must be a numeric matrix of input data where rows
correspond to observations and columns correspond to features or
variables. X will be used to train the GAM model.
Y is matrix or cell matrix containing the class labels
of corresponding predictor data in X. Y can contain any type
of categorical data. Y must have the same number of rows as
X.
obj = ClassificationGAM (…, name,
value) returns a ClassificationGAM object with parameters
specified by the following name, value paired input
arguments:
| Name | Value | |
|---|---|---|
'PredictorNames' | A cell array of character vectors specifying the names of the predictors. The length of this array must match the number of columns in X. | |
'ResponseName' | A character vector specifying the name of the response variable. | |
'ClassNames' | Names of the classes in the class
labels, Y, used for fitting the GAM model.
ClassNames are of the same type as the class labels in Y. | |
'Cost' | An numeric matrix containing
misclassification cost for the corresponding instances in X, where
is the number of unique categories in Y. If an instance
is correctly classified into its category the cost is calculated to be 1,
otherwise 0. The cost matrix can be altered by using
Mdl.cost = somecost. By default, its value is
cost = ones (rows (X), numel (unique (Y))). | |
'Prior' | A numeric vector specifying the prior
probabilities for each class. The order of the elements in Prior
corresponds to the order of the classes in ClassNames.
Alternatively, you can specify "empirical" to use the empirical
class probabilities or "uniform" to assume equal class
probabilities. | |
'ScoreTransform' | A user-defined function handle
or a character vector specifying one of the following builtin functions
specifying the transformation applied to predicted classification scores.
Supported values include 'doublelogit', 'invlogit',
'ismax', 'logit', 'none', 'identity',
'sign', 'symmetric', 'symmetricismax', and
'symmetriclogit'. | |
'Formula' | A character vector specifying the model
formula in the form "Y ~ terms" where Y represents the
response variable and terms specifies the predictor variables and
interaction terms. | |
'Interactions' | A logical matrix, a positive
integer scalar, or the string "all" for defining the interactions
between predictor variables. | |
'Knots' | A scalar or row vector specifying the number of knots for each predictor variable in the spline fitting. | |
'Order' | A scalar or row vector specifying the order of the spline for each predictor variable. | |
'DoF' | A scalar or row vector specifying the degrees of freedom for each predictor variable in the spline fitting. | |
'LearningRate' | A scalar value between 0 and 1 specifying the learning rate used in the gradient boosting algorithm. | |
'NumIterations' | A positive integer specifying the maximum number of iterations for the gradient boosting algorithm. |
See also: fitcgam
ClassificationGAM: label = predict (obj, XC)
ClassificationGAM: [label, score] = predict (obj, XC)
ClassificationGAM: [label, score] = predict (…, 'IncludeInteractions', includeInteractions)
label = predict (obj, XC) returns the predicted
labels for the data in XC based on the model stored in the
ClassificationGAM object, obj.
[label, score] = predict (obj, XC) also
returns score, which contains the predicted class scores or
posterior probabilities for each observation.
[label, score] = predict (obj, XC,
'IncludeInteractions', includeInteractions) allows you to specify
whether interaction terms should be included when making predictions.
ClassificationGAM class object.
See also: ClassificationGAM, fitcgam
ClassificationGAM: CVMdl = crossval (obj)
ClassificationGAM: CVMdl = crossval (…, name, value)
CVMdl = crossval (obj) returns a cross-validated model
object, CVMdl, from a trained model, obj, using 10-fold
cross-validation by default.
CVMdl = crossval (obj, name, value)
specifies additional name-value pair arguments to customize the
cross-validation process.
| Name | Value | |
|---|---|---|
"KFold" | Specify the number of folds to use in
k-fold cross-validation. "KFold", k, where k is an
integer greater than 1. | |
"Holdout" | Specify the fraction of the data to
hold out for testing. "Holdout", p, where p is a
scalar in the range . | |
"Leaveout" | Specify whether to perform
leave-one-out cross-validation. "Leaveout", Value, where
Value is ’on’ or ’off’. | |
"CVPartition" | Specify a cvpartition
object used for cross-validation. "CVPartition", cv, where
isa (cv, "cvpartition") = 1. |
See also: fitcgam, ClassificationGAM, cvpartition, ClassificationPartitionedModel
ClassificationGAM: CVMdl = compact (obj)
CVMdl = compact (obj) creates a compact version of the
ClassificationGAM object, obj.
See also: fitcgam, ClassificationGAM, CompactClassificationGAM
ClassificationGAM: savemodel (obj, filename)
savemodel (obj, filename) saves each property of a
ClassificationGAM object into an Octave binary file, the name of which is
specified in filename, along with an extra variable, which defines
the type classification object these variables constitute. Use
loadmodel in order to load a classification object into Octave’s
workspace.
See also: loadmodel, fitcgam, ClassificationGAM
X = [1, 2; 2, 3; 3, 3; 4, 5; 5, 5; ...
6, 7; 7, 8; 8, 8; 9, 9; 10, 10];
Y = [0; 0; 0; 0; 0; ...
1; 1; 1; 1; 1];
## Train the GAM model
obj = fitcgam (X, Y, "Interactions", "all")
## Create a grid of values for prediction
x1 = [min(X(:,1)):0.1:max(X(:,1))];
x2 = [min(X(:,2)):0.1:max(X(:,2))];
[x1G, x2G] = meshgrid (x1, x2);
XGrid = [x1G(:), x2G(:)];
[labels, score] = predict (obj, XGrid); |
obj =
ClassificationGAM
ResponseName: 'Y'
ClassNames: {'0' '1'}
ScoreTransform: 'none'
NumObservations: 10
NumPredictors: 2
Interactions: 'all' |