Categories &

Functions List

Function Reference: knnsearch

statistics: idx = knnsearch (X, Y)
statistics: [idx, D] = knnsearch (X, Y)
statistics: […] = knnsearch (…, name, value)

Find k-nearest neighbors from input data.

idx = knnsearch (X, Y) finds K nearest neighbors in X for Y. It returns idx which contains indices of K nearest neighbors of each row of Y, If not specified, K = 1. X must be an N×P numeric matrix of input data, where rows correspond to observations and columns correspond to features or variables. Y is an M×P numeric matrix with query points, which must have the same numbers of column as X.

[idx, D] = knnsearch (X, Y) also returns the the distances, D, which correspond to the K nearest neighbour in X for each Y

Additional parameters can be specified by Name-Value pair arguments.

NameValue
"K"is the number of nearest neighbors to be found in the kNN search. It must be a positive integer value and by default it is 1.
"P"is the Minkowski distance exponent and it must be a positive scalar. This argument is only valid when the selected distance metric is "minkowski". By default it is 2.
"Scale"is the scale parameter for the standardized Euclidean distance and it must be a nonnegative numeric vector of equal length to the number of columns in X. This argument is only valid when the selected distance metric is "seuclidean", in which case each coordinate of X is scaled by the corresponding element of "scale", as is each query point in Y. By default, the scale parameter is the standard deviation of each coordinate in X.
"Cov"is the covariance matrix for computing the mahalanobis distance and it must be a positive definite matrix matching the the number of columns in X. This argument is only valid when the selected distance metric is "mahalanobis".
"BucketSize"is the maximum number of data points in the leaf node of the Kd-tree and it must be a positive integer. This argument is only valid when the selected search method is "kdtree".
"SortIndices"is a boolean flag to sort the returned indices in ascending order by distance and it is true by default. When the selected search method is "exhaustive" or the "IncludeTies" flag is true, knnsearch always sorts the returned indices.
"Distance"is the distance metric used by knnsearch as specified below:
"euclidean"Euclidean distance.
"seuclidean"standardized Euclidean distance. Each coordinate difference between the rows in X and the query matrix Y is scaled by dividing by the corresponding element of the standard deviation computed from X. To specify a different scaling, use the "Scale" name-value argument.
"cityblock"City block distance.
"chebychev"Chebychev distance (maximum coordinate difference).
"minkowski"Minkowski distance. The default exponent is 2. To specify a different exponent, use the "P" name-value argument.
"mahalanobis"Mahalanobis distance, computed using a positive definite covariance matrix. To change the value of the covariance matrix, use the "Cov" name-value argument.
"cosine"Cosine distance.
"correlation"One minus the sample linear correlation between observations (treated as sequences of values).
"spearman"One minus the sample Spearman’s rank correlation between observations (treated as sequences of values).
"hamming"Hamming distance, which is the percentage of coordinates that differ.
"jaccard"One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ.
@distfunCustom distance function handle. A distance function of the form function D2 = distfun (XI, YI), where XI is a 1×P vector containing a single observation in P-dimensional space, YI is an N×P matrix containing an arbitrary number of observations in the same P-dimensional space, and D2 is an N×P vector of distances, where (D2k) is the distance between observations XI and (YIk,:).
"NSMethod"is the nearest neighbor search method used by knnsearch as specified below.
"kdtree"Creates and uses a Kd-tree to find nearest neighbors. "kdtree" is the default value when the number of columns in X is less than or equal to 10, X is not sparse, and the distance metric is "euclidean", "cityblock", "manhattan", "chebychev", or "minkowski". Otherwise, the default value is "exhaustive". This argument is only valid when the distance metric is one of the four aforementioned metrics.
"exhaustive"Uses the exhaustive search algorithm by computing the distance values from all the points in X to each point in Y.
"IncludeTies"is a boolean flag to indicate if the returned values should contain the indices that have same distance as the K^th neighbor. When false, knnsearch chooses the observation with the smallest index among the observations that have the same distance from a query point. When true, knnsearch includes all nearest neighbors whose distances are equal to the K^th smallest distance in the output arguments. To specify K, use the "K" name-value pair argument.

See also: rangesearch, pdist2, fitcknn

Source Code: knnsearch

Example: 1

 

 ## find 10 nearest neighbour of a point using different distance metrics
 ## and compare the results by plotting
 load fisheriris
 X = meas(:,3:4);
 Y = species;
 point = [5, 1.45];

 ## calculate 10 nearest-neighbours by minkowski distance
 [id, d] = knnsearch (X, point, "K", 10);

 ## calculate 10 nearest-neighbours by minkowski distance
 [idm, dm] = knnsearch (X, point, "K", 10, "distance", "minkowski", "p", 5);

 ## calculate 10 nearest-neighbours by chebychev distance
 [idc, dc] = knnsearch (X, point, "K", 10, "distance", "chebychev");

 ## plotting the results
 gscatter (X(:,1), X(:,2), species, [.75 .75 0; 0 .75 .75; .75 0 .75], ".", 20);
 title ("Fisher's Iris Data - Nearest Neighbors with different types of distance metrics");
 xlabel("Petal length (cm)");
 ylabel("Petal width (cm)");

 line (point(1), point(2), "marker", "X", "color", "k", ...
       "linewidth", 2, "displayname", "query point")
 line (X(id,1), X(id,2), "color", [0.5 0.5 0.5], "marker", "o", ...
       "linestyle", "none", "markersize", 10, "displayname", "eulcidean")
 line (X(idm,1), X(idm,2), "color", [0.5 0.5 0.5], "marker", "d", ...
       "linestyle", "none", "markersize", 10, "displayname", "Minkowski")
 line (X(idc,1), X(idc,2), "color", [0.5 0.5 0.5], "marker", "p", ...
       "linestyle", "none", "markersize", 10, "displayname", "chebychev")
 xlim ([4.5 5.5]);
 ylim ([1 2]);
 axis square;

                    
plotted figure

Example: 2

 

 ## knnsearch on iris dataset using kdtree method
 load fisheriris
 X = meas(:,3:4);
 gscatter (X(:,1), X(:,2), species, [.75 .75 0; 0 .75 .75; .75 0 .75], ".", 20);
 title ("Fisher's iris dataset : Nearest Neighbors with kdtree search");

 ## new point to be predicted
 point = [5 1.45];

 line (point(1), point(2), "marker", "X", "color", "k", ...
       "linewidth", 2, "displayname", "query point")

 ## knnsearch using kdtree method
 [idx, d] = knnsearch (X, point, "K", 10, "NSMethod", "kdtree");

 ## plotting predicted neighbours
 line (X(idx,1), X(idx,2), "color", [0.5 0.5 0.5], "marker", "o", ...
       "linestyle", "none", "markersize", 10, ...
       "displayname", "nearest neighbour")
 xlim ([4 6])
 ylim ([1 3])
 axis square
 ## details of predicted labels
 tabulate (species(idx))

 ctr = point - d(end);
 diameter = 2 * d(end);
 ##  Draw a circle around the 10 nearest neighbors.
 h = rectangle ("position", [ctr, diameter, diameter], "curvature", [1 1]);

 ## here only 8 neighbours are plotted instead of 10 since the dataset
 ## contains duplicate values

       Value    Count    Percent
   virginica        2      20.00%
  versicolor        8      80.00%
                    
plotted figure