Statistics: ClassificationKNN

Class Definition: `ClassificationKNN`

statistics: obj = ClassificationKNN (X, Y)
statistics: obj = ClassificationKNN (…, name, value)

Create a ClassificationKNN class object containing a k-Nearest Neighbor classification model.

obj = ClassificationKNN (X, Y) returns a ClassificationKNN object, with X as the predictor data and Y containing the class labels of observations in X.

X must be a $N×P$ numeric matrix of input data where rows correspond to observations and columns correspond to features or variables. X will be used to train the kNN model.
Y is $N×1$ matrix or cell matrix containing the class labels of corresponding predictor data in X. Y can contain any type of categorical data. Y must have same numbers of Rows as X.

obj = ClassificationKNN (…, name, value) returns a ClassificationKNN object with parameters specified by Name-Value pair arguments. Type help fitcknn for more info.

A ClassificationKNN object, obj, stores the labelled training data and various parameters for the k-Nearest Neighbor classification model, which can be accessed in the following fields:

`Field`		`Description`
`obj.X`		Unstandardized predictor data, specified as a numeric matrix. Each column of `X` represents one predictor (variable), and each row represents one observation.
`obj.Y`		Class labels, specified as a logical or numeric vector, or cell array of character vectors. Each value in `Y` is the observed class label for the corresponding row in `X`.
`obj.NumObservations`		Number of observations used in training the ClassificationKNN model, specified as a positive integer scalar. This number can be less than the number of rows in the training data because rows containing `NaN` values are not part of the fit.
`obj.RowsUsed`		Rows of the original training data used in fitting the ClassificationKNN model, specified as a numerical vector. If you want to use this vector for indexing the training data in `X`, you have to convert it to a logical vector, i.e `X = obj.X(logical (obj.RowsUsed), :);`
`obj.Standardize`		A boolean flag indicating whether the data in `X` have been standardized prior to training.
`obj.Sigma`		Predictor standard deviations, specified as a numeric vector of the same length as the columns in `X`. If the predictor variables have not been standardized, then `"obj.Sigma"` is empty.
`obj.Mu`		Predictor means, specified as a numeric vector of the same length as the columns in `X`. If the predictor variables have not been standardized, then `"obj.Mu"` is empty.
`obj.NumPredictors`		The number of predictors (variables) in `X`.
`obj.PredictorNames`		Predictor variable names, specified as a cell array of character vectors. The variable names are in the same order in which they appear in the training data `X`.
`obj.ResponseName`		Response variable name, specified as a character vector.
`obj.ClassNames`		Names of the classes in the training data `Y` with duplicates removed, specified as a cell array of character vectors.
`obj.BreakTies`		Tie-breaking algorithm used by predict when multiple classes have the same smallest cost, specified as one of the following character arrays: `"smallest"` (default), which favors the class with the smallest index among the tied groups, i.e. the one that appears first in the training labelled data. `"nearest"`, which favors the class with the nearest neighbor among the tied groups, i.e. the class with the closest member point according to the distance metric used. `"nearest"`, which randomly picks one class among the tied groups.
`obj.Prior`		Prior probabilities for each class, specified as a numeric vector. The order of the elements in `Prior` corresponds to the order of the classes in `ClassNames`.
`obj.Cost`		Cost of the misclassification of a point, specified as a square matrix. `Cost(i,j)` is the cost of classifying a point into class `j` if its true class is `i` (that is, the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns in `Cost` corresponds to the order of the classes in `ClassNames`. The number of rows and columns in `Cost` is the number of unique classes in the response. By default, `Cost(i,j) = 1` if `i != j`, and `Cost(i,j) = 0` if `i = j`. In other words, the cost is 0 for correct classification and 1 for incorrect classification.
`obj.NumNeighbors`		Number of nearest neighbors in `X` used to classify each point during prediction, specified as a positive integer value.
`obj.Distance`		Distance metric, specified as a character vector. The allowable distance metric names depend on the choice of the neighbor-searcher method. See the available distance metrics in `knnseaarch` for more info.
`obj.DistanceWeight`		Distance weighting function, specified as a function handle, which accepts a matrix of nonnegative distances, and returns a matrix the same size containing nonnegative distance weights.
`obj.DistParameter`		Parameter for the distance metric, specified either as a positive definite covariance matrix (when the distance metric is `"mahalanobis"`, or a positive scalar as the Minkowski distance exponent (when the distance metric is `"minkowski"`, or a vector of positive scale values with length equal to the number of columns of `X` (when the distance metric is `"seuclidean"`. For any other distance metric, the value of `DistParameter` is empty.
`obj.NSMethod`		Nearest neighbor search method, specified as either `"kdtree"`, which creates and uses a Kd-tree to find nearest neighbors, or `"exhaustive"`, which uses the exhaustive search algorithm by computing the distance values from all points in `X` to find nearest neighbors.
`obj.IncludeTies`		A boolean flag indicating whether prediction includes all the neighbors whose distance values are equal to the $k^th$ smallest distance. If `IncludeTies` is `true`, prediction includes all of these neighbors. Otherwise, prediction uses exactly $k$ neighbors.
`obj.BucketSize`		Maximum number of data points in the leaf node of the Kd-tree, specified as positive integer value. This argument is meaningful only when `NSMethod` is `"kdtree"`.

See also: fitcknn, knnsearch, rangesearch, pdist2

Source Code: ClassificationKNN

Method: `predict`

ClassificationKNN: label = predict (obj, XC)
ClassificationKNN: [label, score, cost] = predict (obj, XC)

Classify new data points into categories using the kNN algorithm from a k-Nearest Neighbor classification model.

label = predict (obj, XC) returns the matrix of labels predicted for the corresponding instances in XC, using the predictor data in obj.X and corresponding labels, obj.Y, stored in the k-Nearest Neighbor classification model, obj.

XC must be an $M×P$ numeric matrix with the same number of features $P$ as the corresponding predictors of the kNN model in obj.

[label, score, cost] = predict (obj, XC) also returns score, which contains the predicted class scores or posterior probabilities for each instance of the corresponding unique classes, and cost, which is a matrix containing the expected cost of the classifications.

See also: fitcknn, ClassificationKNN

Example: 1


 ## Create a k-nearest neighbor classifier for Fisher's iris data with k = 5.
 ## Evaluate some model predictions on new data.

 load fisheriris
 x = meas;
 y = species;
 xc = [min(x); mean(x); max(x)];
 obj = fitcknn (x, y, "NumNeighbors", 5, "Standardize", 1);
 [label, score, cost] = predict (obj, xc)

label =
{
  [1,1] = versicolor
  [2,1] = versicolor
  [3,1] = virginica
}

score =

   0.4000   0.6000        0
        0   1.0000        0
        0        0   1.0000

cost =

   0.6000   0.4000   1.0000
   1.0000        0   1.0000
   1.0000   1.0000        0

Example: 2


 ## Train a k-nearest neighbor classifier for k = 10
 ## and plot the decision boundaries.

 load fisheriris
 idx = ! strcmp (species, "setosa");
 X = meas(idx,3:4);
 Y = cast (strcmpi (species(idx), "virginica"), "double");
 obj = fitcknn (X, Y, "Standardize", 1, "NumNeighbors", 10, "NSMethod", "exhaustive")
 x1 = [min(X(:,1)):0.03:max(X(:,1))];
 x2 = [min(X(:,2)):0.02:max(X(:,2))];
 [x1G, x2G] = meshgrid (x1, x2);
 XGrid = [x1G(:), x2G(:)];
 pred = predict (obj, XGrid);
 gidx = logical (str2num (cell2mat (pred)));

 figure
 scatter (XGrid(gidx,1), XGrid(gidx,2), "markerfacecolor", "magenta");
 hold on
 scatter (XGrid(!gidx,1), XGrid(!gidx,2), "markerfacecolor", "red");
 plot (X(Y == 0, 1), X(Y == 0, 2), "ko", X(Y == 1, 1), X(Y == 1, 2), "kx");
 xlabel ("Petal length (cm)");
 ylabel ("Petal width (cm)");
 title ("5-Nearest Neighbor Classifier Decision Boundary");
 legend ({"Versicolor Region", "Virginica Region", ...
         "Sampled Versicolor", "Sampled Virginica"}, ...
         "location", "northwest")
 axis tight
 hold off

obj =

  ClassificationKNN object with properties:

            BreakTies: smallest
           BucketSize: [1x1 double]
           ClassNames: [2x1 cell]
                 Cost: [2x2 double]
        DistParameter: [0x0 double]
             Distance: euclidean
       DistanceWeight: [1x1 function_handle]
          IncludeTies: 0
                   Mu: [1x2 double]
             NSMethod: exhaustive
         NumNeighbors: [1x1 double]
      NumObservations: [1x1 double]
        NumPredictors: [1x1 double]
       PredictorNames: [1x2 cell]
                Prior: [2x1 double]
         ResponseName: Y
             RowsUsed: [100x1 double]
                Sigma: [1x2 double]
          Standardize: 1
                    X: [100x2 double]
                    Y: [100x1 double]

Class Definition: ClassificationKNN

Method: predict

Example: 1

Example: 2

Class Definition: `ClassificationKNN`

Method: `predict`