Statistics: kstest

Function Reference: `kstest`

statistics: h = kstest (x)
statistics: h = kstest (x, name, value)
statistics: [h, p] = kstest (…)
statistics: [h, p, ksstat, cv] = kstest (…)

Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test.

h = kstest (x) performs a Kolmogorov-Smirnov (K-S) test to determine if a random sample x could have come from a standard normal distribution. h indicates the results of the null hypothesis test.

h = 0 => Do not reject the null hypothesis at the 5% significance
h = 1 => Reject the null hypothesis at the 5% significance

x is a vector representing a random sample from some unknown distribution with a cumulative distribution function F(X). Missing values declared as NaNs in x are ignored.

h = kstest (x, name, value) returns a test decision for a single-sample K-S test with additional options specified by one or more Name-Value pair arguments as shown below.

Name		Value
`"alpha"`		A numeric scalar between 0 and 1 specifying th the significance level. Default is 0.05 for 5% significance.
`"CDF"`		The hypothesized CDF under the null hypothesis. It can be specified as a function handle of an existing cdf function, a character vector defining a probability distribution with default parameters, a probability distribution object, or a two-column matrix. If not provided, the default is the standard normal, $N(0,1)$ . The one-sample Kolmogorov-Smirnov test is only valid for continuous cumulative distribution functions, and requires the CDF to be predetermined. The result is not accurate if CDF is estimated from the data.
`"tail"`		A string indicating the type of test:

	`"unequal"`		"F(X) not equal to CDF(X)" (two-sided) (Default)
	`"larger"`		"F(X) > CDF(X)" (one-sided)
	`"smaller"`		"F(X) < CDF(X)" (one-sided)

Let S(X) be the empirical c.d.f. estimated from the sample vector x, F(X) be the corresponding true (but unknown) population c.d.f., and CDF be the known input c.d.f. specified under the null hypothesis. For tail = "unequal", "larger", and "smaller", the test statistics are max|S(X) - CDF(X)|, max[S(X) - CDF(X)], and max[CDF(X) - S(X)], respectively.

[h, p] = kstest (…) also returns the asymptotic p-value p.

[h, p, ksstat] = kstest (…) returns the K-S test statistic ksstat defined above for the test type indicated by the "tail" option

In the matrix version of CDF, column 1 contains the x-axis data and column 2 the corresponding y-axis c.d.f data. Since the K-S test statistic will occur at one of the observations in x, the calculation is most efficient when CDF is only specified at the observations in x. When column 1 of CDF represents x-axis points independent of x, CDF is linearly interpolated at the observations found in the vector x. In this case, the interval along the x-axis (the column 1 spread of CDF) must span the observations in x for successful interpolation.

The decision to reject the null hypothesis is based on comparing the p-value p with the "alpha" value, not by comparing the statistic ksstat with the critical value cv. cv is computed separately using an approximate formula or by interpolation using Miller’s approximation table. The formula and table cover the range 0.01 <= "alpha" <= 0.2 for two-sided tests and 0.005 <= "alpha" <= 0.1 for one-sided tests. CV is returned as NaN if "alpha" is outside this range. Since CV is approximate, a comparison of ksstat with cv may occasionally lead to a different conclusion than a comparison of p with "alpha".

See also: kstest2, cdfplot

Source Code: kstest

Example: 1


 ## Use the stock return data set to test the null hypothesis that the data
 ## come from a standard normal distribution against the alternative
 ## hypothesis that the population CDF of the data is larger that the
 ## standard normal CDF.

 load stockreturns;
 x = stocks(:,2);
 [h, p, k, c] = kstest (x, "Tail", "larger")

 ## Compute the empirical CDF and plot against the standard normal CDF
 [f, x_values] = ecdf (x);
 h1 = plot (x_values, f);
 hold on;
 h2 = plot (x_values, normcdf (x_values), 'r--');
 set (h1, "LineWidth", 2);
 set (h2, "LineWidth", 2);
 legend ([h1, h2], "Empirical CDF", "Standard Normal CDF", ...
         "Location", "southeast");
 title ("Empirical CDF of stock return data against standard normal CDF")

h = 1
p = 0.015286
k = 0.1428
c = 0.1207

Categories &

Functions List

Clustering

Clustering

Classification Classes

Classification Classes

Clustering Classes

Clustering Classes

Regression Classes

Regression Classes

Data Manipulation

Data Manipulation

Descriptive Statistics

Descriptive Statistics

Distribution Classes

Distribution Classes

Distribution Fitting

Distribution Fitting

Distribution Functions

Distribution Functions

Distribution Statistics

Distribution Statistics

Distribution Wrappers

Distribution Wrappers

Experimental Design

Experimental Design

Machine Learning

Machine Learning

Model Fitting

Model Fitting

Hypothesis Testing

Hypothesis Testing

I/O

I/O

Plotting

Plotting

Regression

Regression

Transforms

Transforms

Function Reference: kstest

Example: 1

Function Reference: `kstest`