editDistance
"OutputAllIndices"
, value)Compute the edit (Levenshtein) distance between strings or documents.
d = editDistance (str)
takes a cell array of character vectors and computes the Levenshtein distance between each pair of strings in str as the lowest number of grapheme insertions, deletions, and substitutions required to convert string str{1}
to string str{2}
. If str is a cellstr
vector with elements, the returned distance d is an column vector of doubles. If str is an array (that is all (size (str) > 1) = true
), then it is transformed to a column vector as in str = str(:)
. editDistance
expects str to be a column vector, if it is row vector, it is transformed to a column vector.
d = editDistance (doc)
can also take a cell array containing cell arrays of character vectors, in which case each element of doc is regarded as a document, and the character vector in each element of the cell string array is regarded a token. editDistance
computes the Levenshtein distance between each pair of cell elements in doc as the lowest number of token insertions, deletions, and substitutions required to convert document doc{1}
to document doc{2}
. If doc is a cell
vector with elements, the distance d is an column vector of doubles. If doc is an array (that is all (size (doc) > 1) = true
), then it is converted to a column vector as in doc = doc(:)
.
C = editDistance (…, minDist)
specifies a minimum distance, minDist, which is regarded as a similarity threshold between each pair of strings or documents, defined in the previous syntaces. In this case, editDistance
resembles the functionality of the uniquetol
function and returns the unique strings or documents that are similar up to minDist distance. C is either a cellstring array or a cell array of cellstrings, depending on the first input argument.
[C, IA, IC] = editDistance (…, minDist)
also returns index vectors IA and IC. Assuming A contains either strings str or documents doc as defined above, IA is a column vector of indices to the first occurrence of similar elements such that C = A(IA)
, and IC is a column vector of indices such that A ~ C(IC)
where ~
means that the strings or documents are within the specified distance minDist of each other.
[C, IA, IC] = editDistance (…, minDist,
specifies the type of the second output index IA. value must be a logical scalar. When set to "OutputAllIndices"
, value)true
, IA is a cell array containing the vectors of indices for ALL elements in A that are within the specified distance minDist of each other. Each cell in IA corresponds to a value in C and the values in each cell correspond to locations in A. If value is set to false
, then IA is returned as an index vector described in the previous syntax.
d = editDistance (str1, str2)
can also take two character vectors, str1 and str2 and compute the Levenshtein distance d as the lowest number of grapheme insertions, deletions, and substitutions required to convert str1 to str2. str1 and str2 may also be cellstring arrays, in which case the pairwise distance is computed between str1{n}
and str1{n}
. The cellstring arrays must be of the same size or scalars, in which case the scalar is expanded to the size of the other cellstring input. The returned distance d is a column vector with the same number of elements as the cellstring arrays. If str1 or str2 is an array, then it is transformed to a column vector. editDistance
expects both str1 and str2 to be a column vectors, if not, they are transformed into column vectors.
d = editDistance (doc1, doc2)
can also take two cell array containing cell arrays of character vectors, in which case each element of doc1 and dos2 is regarded as a document, and the character vector in each element of the cell string array is regarded a token. editDistance
computes the pairwise Levenshtein distance between the of cell elements in doc1 and doc2 as the lowest number of token insertions, deletions, and substitutions required to convert document doc1{n}
to document doc1{n}
.
Source Code: editDistance