ClassificationKNN
Create a ClassificationKNN
class object containing a k-Nearest
Neighbor classification model.
obj = ClassificationKNN (X, Y)
returns a
ClassificationKNN object, with X as the predictor data and Y
containing the class labels of observations in X.
X
must be a numeric matrix of input data where rows
correspond to observations and columns correspond to features or variables.
X will be used to train the kNN model.
Y
is matrix or cell matrix containing the class labels of
corresponding predictor data in X. Y can contain any type of
categorical data. Y must have same numbers of Rows as X.
obj = ClassificationKNN (…, name, value)
returns a ClassificationKNN object with parameters specified by
Name-Value
pair arguments. Type help fitcknn
for more info.
A ClassificationKNN
object, obj, stores the labelled training
data and various parameters for the k-Nearest Neighbor classification model,
which can be accessed in the following fields:
Field | Description | |
---|---|---|
X | Unstandardized predictor data, specified as a numeric matrix. Each column of X represents one predictor (variable), and each row represents one observation. | |
Y | Class labels, specified as a logical or numeric vector, or cell array of character vectors. Each value in Y is the observed class label for the corresponding row in X. | |
NumObservations | Number of observations used in
training the ClassificationKNN model, specified as a positive integer scalar.
This number can be less than the number of rows in the training data because
rows containing NaN values are not part of the fit. | |
RowsUsed | Rows of the original training data
used in fitting the ClassificationKNN model, specified as a numerical vector.
If you want to use this vector for indexing the training data in X, you
have to convert it to a logical vector, i.e
X = obj.X(logical (obj.RowsUsed), :); | |
Standardize | A boolean flag indicating whether the data in X have been standardized prior to training. | |
Sigma | Predictor standard deviations, specified
as a numeric vector of the same length as the columns in X. If the
predictor variables have not been standardized, then "obj.Sigma" is
empty. | |
Mu | Predictor means, specified as a numeric
vector of the same length as the columns in X. If the predictor
variables have not been standardized, then "obj.Mu" is empty. | |
NumPredictors | The number of predictors (variables) in X. | |
PredictorNames | Predictor variable names, specified as a cell array of character vectors. The variable names are in the same order in which they appear in the training data X. | |
ResponseName | Response variable name, specified as a character vector. | |
ClassNames | Names of the classes in the training data Y with duplicates removed, specified as a cell array of character vectors. | |
BreakTies | Tie-breaking algorithm used by predict
when multiple classes have the same smallest cost, specified as one of the
following character arrays: "smallest" (default), which favors the
class with the smallest index among the tied groups, i.e. the one that
appears first in the training labelled data. "nearest" , which favors
the class with the nearest neighbor among the tied groups, i.e. the class
with the closest member point according to the distance metric used.
"random" , which randomly picks one class among the tied groups. | |
Prior | Prior probabilities for each class,
specified as a numeric vector. The order of the elements in Prior
corresponds to the order of the classes in ClassNames . | |
Cost | Cost of the misclassification of a point,
specified as a square matrix. Cost(i,j) is the cost of classifying a
point into class j if its true class is i (that is, the rows
correspond to the true class and the columns correspond to the predicted
class). The order of the rows and columns in Cost corresponds to the
order of the classes in ClassNames . The number of rows and columns
in Cost is the number of unique classes in the response. By default,
Cost(i,j) = 1 if i != j , and Cost(i,j) = 0 if
i = j . In other words, the cost is 0 for correct classification and
1 for incorrect classification. | |
ScoreTransform | A function_handle which is used
for transforming the kNN prediction score into a posterior probability. By
default, it is 'none' , in which case the predict and
resubPredict methods return the prediction scores. | |
NumNeighbors | Number of nearest neighbors in X used to classify each point during prediction, specified as a positive integer value. | |
Distance | Distance metric, specified as a
character vector. The allowable distance metric names depend on the choice
of the neighbor-searcher method. See the available distance metrics in
knnseaarch for more info. | |
DistanceWeight | Distance weighting function, specified as a function handle, which accepts a matrix of nonnegative distances, and returns a matrix the same size containing nonnegative distance weights. | |
DistParameter | Parameter for the distance
metric, specified either as a positive definite covariance matrix (when the
distance metric is "mahalanobis" , or a positive scalar as the
Minkowski distance exponent (when the distance metric is "minkowski" ,
or a vector of positive scale values with length equal to the number of
columns of X (when the distance metric is "seuclidean" . For
any other distance metric, the value of DistParameter is empty. | |
NSMethod | Nearest neighbor search method,
specified as either "kdtree" , which creates and uses a Kd-tree to
find nearest neighbors, or "exhaustive" , which uses the exhaustive
search algorithm by computing the distance values from all points in X
to find nearest neighbors. | |
IncludeTies | A boolean flag indicating whether
prediction includes all the neighbors whose distance values are equal to the
smallest distance. If IncludeTies is true ,
prediction includes all of these neighbors. Otherwise, prediction uses
exactly neighbors. | |
BucketSize | Maximum number of data points in the
leaf node of the Kd-tree, specified as positive integer value. This argument
is meaningful only when NSMethod is "kdtree" . |
See also: fitcknn, knnsearch, rangesearch, pdist2
Source Code: ClassificationKNN
crossval
Cross Validate a ClassificationKNN object.
CVMdl = crossval (obj)
returns a cross-validated model
object, CVMdl, from a trained model, obj, using 10-fold
cross-validation by default.
CVMdl = crossval (obj, name, value)
specifies additional name-value pair arguments to customize the
cross-validation process.
Name | Value | |
---|---|---|
"KFold" | Specify the number of folds to use in
k-fold cross-validation. "KFold", k , where k is an
integer greater than 1. | |
"Holdout" | Specify the fraction of the data to
hold out for testing. "Holdout", p , where p is a
scalar in the range . | |
"Leaveout" | Specify whether to perform
leave-one-out cross-validation. "Leaveout", Value , where
Value is ’on’ or ’off’. | |
"CVPartition" | Specify a cvpartition
object used for cross-validation. "CVPartition", cv , where
isa (cv, "cvpartition") = 1. |
See also: fitcknn, ClassificationKNN, cvpartition, ClassificationPartitionedModel
loss
Compute loss for a trained ClassificationKNN object.
L = loss (obj, X, Y)
computes the loss,
L, using the default loss function 'mincost'
.
obj
is a ClassificationKNN object trained on X
and
Y
.
X
must be a numeric matrix of input data where rows
correspond to observations and columns correspond to features or
variables.
Y
is matrix or cell matrix containing the class labels
of corresponding predictor data in X. Y must have same
numbers of Rows as X.
L = loss (…, name, value)
allows
additional options specified by name-value pairs:
Name | Value | |
---|---|---|
"LossFun" | Specifies the loss function to use.
Can be a function handle with four input arguments (C, S, W, Cost)
which returns a scalar value or one of:
’binodeviance’, ’classifcost’, ’classiferror’, ’exponential’,
’hinge’, ’logit’,’mincost’, ’quadratic’.
| |
"Weights" | Specifies observation weights, must be
a numeric vector of length equal to the number of rows in X.
Default is ones (size (X, 1)) . loss normalizes the weights so that
observation weights in each class sum to the prior probability of that
class. When you supply Weights, loss computes the weighted
classification loss. |
See also: fitcknn, ClassificationKNN
margin
m = margin (obj, X, Y)
returns
the classification margins for obj with data X and
classification Y. m is a numeric vector of length size (X,1).
obj
is a ClassificationKNN object trained on X
and Y
.
X
must be a numeric matrix of input data where rows
correspond to observations and columns correspond to features or
variables.
Y
is matrix or cell matrix containing the class labels
of corresponding predictor data in X. Y must have same
numbers of Rows as X.
The classification margin for each observation is the difference between the classification score for the true class and the maximal classification score for the false classes.
See also: fitcknn, ClassificationKNN
partialDependence
Compute partial dependence for a trained ClassificationKNN object.
[pd, x, y] = partialDependence (obj, Vars,
Labels)
computes the partial dependence of the classification scores on the
variables Vars for the specified class Labels.
obj
is a trained ClassificationKNN object.
Vars
is a vector of positive integers, character vector,
string array, or cell array of character
vectors representing predictor variables (it can be indices of
predictor variables in obj.X).
Labels
is a character vector, logical vector, numeric vector,
or cell array of character vectors representing class
labels. (column vector)
[pd, x, y] = partialDependence (…, Data)
specifies new predictor data to use for computing the partial dependence.
[pd, x, y] = partialDependence (…, name,
value)
allows additional options specified by name-value pairs:
Name | Value | |
---|---|---|
"NumObservationsToSample" | Number of observations to sample. Must be a positive integer. Defaults to the number of observations in the training data. | |
"QueryPoints" | Points at which to evaluate the partial dependence. Must be a numeric column vector, numeric two-column matrix, or cell array of character column vectors. | |
"UseParallel" | Logical value indicating
whether to perform computations in parallel.
Defaults to false . |
pd
: Partial dependence values.
x
: Query points for the first predictor variable in Vars.
y
: Query points for the second predictor variable in
Vars (if applicable).
See also: fitcknn, ClassificationKNN
predict
Classify new data points into categories using the kNN algorithm from a k-Nearest Neighbor classification model.
labels = predict (obj, XC)
returns the matrix of
labels predicted for the corresponding instances in XC, using the
predictor data in obj.X
and corresponding labels, obj.Y
,
stored in the k-Nearest Neighbor classification model, obj.
ClassificationKNN
class object.
[labels, scores, cost] = predict (obj,
XC)
also returns scores, which contains the predicted class
scores or posterior probabilities for each instance of the corresponding
unique classes, and cost, which is a matrix containing the expected
cost of the classifications. By default, scores returns the
posterior probabilities for KNN models, unless a specific ScoreTransform
function has been specified. See fitcknn
for more info.
Note! predict
is explicitly using "exhaustive"
as the
nearest search method due to the very slow implementation of
"kdtree"
in the knnsearch
function.
See also: fitcknn, ClassificationKNN, knnsearch
savemodel
Save a ClassificationKNN object.
savemodel (obj, filename)
saves a ClassificationKNN
object into a file defined by filename.
See also: loadmodel, fitcknn, ClassificationKNN
## Create a k-nearest neighbor classifier for Fisher's iris data with k = 5. ## Evaluate some model predictions on new data. load fisheriris x = meas; y = species; xc = [min(x); mean(x); max(x)]; obj = fitcknn (x, y, "NumNeighbors", 5, "Standardize", 1); [label, score, cost] = predict (obj, xc) label = { [1,1] = versicolor [2,1] = versicolor [3,1] = virginica } score = 0.4000 0.6000 0 0 1.0000 0 0 0 1.0000 cost = 0.6000 0.4000 1.0000 1.0000 0 1.0000 1.0000 1.0000 0 |
load fisheriris x = meas; y = species; obj = fitcknn (x, y, "NumNeighbors", 5, "Standardize", 1); ## Create a cross-validated model CVMdl = crossval (obj) CVMdl = ClassificationPartitionedModel object with properties: BinEdges: [0x0 double] CategoricalPredictors: [0x0 double] ClassNames: [3x1 cell] Cost: [3x3 double] CrossValidatedModel: ClassificationKNN KFold: [1x1 double] ModelParameters: [1x1 struct] NumObservations: [1x1 double] Partition: [1x1 cvpartition] PredictorNames: [1x4 cell] Prior: [3x1 double] ResponseName: Y ScoreTransform: none Standardize: 1 Trained: [10x1 cell] X: [150x4 double] Y: [150x1 cell] |
load fisheriris x = meas; y = species; covMatrix = cov (x); ## Fit the k-NN model using the 'mahalanobis' distance ## and the custom covariance matrix obj = fitcknn(x, y, 'NumNeighbors', 5, 'Distance','mahalanobis', ... 'Cov', covMatrix); ## Create a partition model using cvpartition Partition = cvpartition (size (x, 1), 'kfold', 12); ## Create cross-validated model using 'cvPartition' name-value argument CVMdl = crossval (obj, 'cvPartition', Partition) ## Access the trained model from first fold of cross-validation CVMdl.Trained{1} CVMdl = ClassificationPartitionedModel object with properties: BinEdges: [0x0 double] CategoricalPredictors: [0x0 double] ClassNames: [3x1 cell] Cost: [3x3 double] CrossValidatedModel: ClassificationKNN KFold: [1x1 double] ModelParameters: [1x1 struct] NumObservations: [1x1 double] Partition: [1x1 cvpartition] PredictorNames: [1x4 cell] Prior: [3x1 double] ResponseName: Y ScoreTransform: none Standardize: 0 Trained: [12x1 cell] X: [150x4 double] Y: [150x1 cell] ans = ClassificationKNN object with properties: BreakTies: smallest BucketSize: [1x1 double] ClassNames: [3x1 cell] Cost: [3x3 double] DistParameter: [4x4 double] Distance: mahalanobis DistanceWeight: [1x1 function_handle] IncludeTies: 0 Mu: [0x0 double] NSMethod: exhaustive NumNeighbors: [1x1 double] NumObservations: [1x1 double] NumPredictors: [1x1 double] PredictorNames: [1x4 cell] Prior: [3x1 double] ResponseName: Y RowsUsed: [137x1 double] ScoreTransform: none Sigma: [0x0 double] Standardize: 0 X: [137x4 double] Y: [137x1 cell] |
X = [1, 2; 3, 4; 5, 6]; Y = {'A'; 'B'; 'A'}; model = fitcknn (X, Y); customLossFun = @(C, S, W, Cost) sum (W .* sum (abs (C - S), 2)); ## Calculate loss using custom loss function L = loss (model, X, Y, 'LossFun', customLossFun) L = 0 |
X = [1, 2; 3, 4; 5, 6]; Y = {'A'; 'B'; 'A'}; model = fitcknn (X, Y); ## Calculate loss using 'mincost' loss function L = loss (model, X, Y, 'LossFun', 'mincost') L = 0 |
X = [1, 2; 3, 4; 5, 6]; Y = ['1'; '2'; '3']; model = fitcknn (X, Y); X_test = [3, 3; 5, 7]; Y_test = ['1'; '2']; ## Specify custom Weights W = [1; 2]; L = loss (model, X_test, Y_test, 'LossFun', 'logit', 'Weights', W); |
load fisheriris mdl = fitcknn (meas, species); X = mean (meas); Y = {'versicolor'}; m = margin (mdl, X, Y) m = 1 |
X = [1, 2; 4, 5; 7, 8; 3, 2]; Y = [2; 1; 3; 2]; ## Train the model mdl = fitcknn (X, Y); ## Specify Vars and Labels Vars = 1; Labels = 2; ## Calculate partialDependence [pd, x, y] = partialDependence (mdl, Vars, Labels); |
X = [1, 2; 4, 5; 7, 8; 3, 2]; Y = [2; 1; 3; 2]; ## Train the model mdl = fitcknn (X, Y); ## Specify Vars and Labels Vars = 1; Labels = 1; queryPoints = [linspace(0, 1, 3)', linspace(0, 1, 3)']; ## Calculate partialDependence using queryPoints [pd, x, y] = partialDependence (mdl, Vars, Labels, 'QueryPoints', ... queryPoints) pd = 0.2500 0.2500 0.2500 x = 0 0 0.5000 0.5000 1.0000 1.0000 y = [](0x0) |