Categories &

Functions List

Function Reference: kstest

statistics: h = kstest (x)
statistics: h = kstest (x, name, value)
statistics: [h, p] = kstest (…)
statistics: [h, p, ksstat, cv] = kstest (…)

Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test.

h = kstest (x) performs a Kolmogorov-Smirnov (K-S) test to determine if a random sample x could have come from a standard normal distribution. h indicates the results of the null hypothesis test.

  • h = 0 => Do not reject the null hypothesis at the 5% significance
  • h = 1 => Reject the null hypothesis at the 5% significance

x is a vector representing a random sample from some unknown distribution with a cumulative distribution function F(X). Missing values declared as NaNs in x are ignored.

h = kstest (x, name, value) returns a test decision for a single-sample K-S test with additional options specified by one or more name-value pair arguments as shown below.

"alpha"A value alpha between 0 and 1 specifying the significance level. Default is 0.05 for 5% significance.
"CDF"CDF is the c.d.f. under the null hypothesis. It can be specified either as a function handle or a a function name of an existing cdf function or as a two-column matrix. If not provided, the default is the standard normal, N(0,1).
"tail"A string indicating the type of test:
"unequal""F(X) not equal to CDF(X)" (two-sided) (Default)
"larger""F(X) > CDF(X)" (one-sided)
"smaller""CDF(X) < F(X)" (one-sided)

Let S(X) be the empirical c.d.f. estimated from the sample vector x, F(X) be the corresponding true (but unknown) population c.d.f., and CDF be the known input c.d.f. specified under the null hypothesis. For tail = "unequal", "larger", and "smaller", the test statistics are max|S(X) - CDF(X)|, max[S(X) - CDF(X)], and max[CDF(X) - S(X)], respectively.

[h, p] = kstest (…) also returns the asymptotic p-value p.

[h, p, ksstat] = kstest (…) returns the K-S test statistic ksstat defined above for the test type indicated by the "tail" option

In the matrix version of CDF, column 1 contains the x-axis data and column 2 the corresponding y-axis c.d.f data. Since the K-S test statistic will occur at one of the observations in x, the calculation is most efficient when CDF is only specified at the observations in x. When column 1 of CDF represents x-axis points independent of x, CDF is linearly interpolated at the observations found in the vector x. In this case, the interval along the x-axis (the column 1 spread of CDF) must span the observations in x for successful interpolation.

The decision to reject the null hypothesis is based on comparing the p-value p with the "alpha" value, not by comparing the statistic ksstat with the critical value cv. cv is computed separately using an approximate formula or by interpolation using Miller’s approximation table. The formula and table cover the range 0.01 <= "alpha" <= 0.2 for two-sided tests and 0.005 <= "alpha" <= 0.1 for one-sided tests. CV is returned as NaN if "alpha" is outside this range. Since CV is approximate, a comparison of ksstat with cv may occasionally lead to a different conclusion than a comparison of p with "alpha".

See also: kstest2, cdfplot

Source Code: kstest