Statistics: chi2gof

Function Reference: `chi2gof`

statistics: h = chi2gof (x)
statistics: [h, p] = chi2gof (x)
statistics: [p, h, stats] = chi2gof (x)
statistics: […] = chi2gof (x, Name, Value, …)

Chi-square goodness-of-fit test.

chi2gof performs a chi-square goodness-of-fit test for discrete or continuous distributions. The test is performed by grouping the data into bins, calculating the observed and expected counts for those bins, and computing the chi-square test statistic $$ \chi ^ 2 = \sum_{i=1}^N \left (O_i - E_i \right) ^ 2 / E_i $$ where O is the observed counts and E is the expected counts. This test statistic has an approximate chi-square distribution when the counts are sufficiently large.

Bins in either tail with an expected count less than 5 are pooled with neighboring bins until the count in each extreme bin is at least 5. If bins remain in the interior with counts less than 5, chi2gof displays a warning. In that case, you should use fewer bins, or provide bin centers or binedges, to increase the expected counts in all bins.

h = chi2gof (x) performs a chi-square goodness-of-fit test that the data in the vector X are a random sample from a normal distribution with mean and variance estimated from x. The result is h = 0 if the null hypothesis (that x is a random sample from a normal distribution) cannot be rejected at the 5% significance level, or h = 1 if the null hypothesis can be rejected at the 5% level. chi2gof uses by default 10 bins ("nbins"), and compares the test statistic to a chi-square distribution with nbins - 3 degrees of freedom, to take into account that two parameters were estimated.

[h, p] = chi2gof (x) also returns the p-value p, which is the probability of observing the given result, or one more extreme, by chance if the null hypothesis is true. If there are not enough degrees of freedom to carry out the test, p is NaN.

[h, p, stats] = chi2gof (x) also returns a stats structure with the following fields:

	"chi2stat"	Chi-square statistic
	"df"	Degrees of freedom
	"binedges"	Vector of bin binedges after pooling
	"O"	Observed count in each bin
	"E"	Expected count in each bin

[…] = chi2gof (x, Name, Value, …) specifies optional Name/Value pair arguments chosen from the following list.

	Name	Value
	`"nbins"`	The number of bins to use. Default is 10.
	`"binctrs"`	A vector of bin centers.
	`"binedges"`	A vector of bin binedges.
	`"cdf"`	A fully specified cumulative distribution function or a function handle provided in a cell array whose first element is a function handle, and all later elements are its parameter values. The function must take `x` values as its first argument, and other parameters as later arguments.
	`"expected"`	A vector with one element per bin specifying the expected counts for each bin.
	`"nparams"`	The number of estimated parameters; used to adjust the degrees of freedom to be `nbins - 1 - nparams`, where `nbins` is the number of bins.
	`"emin"`	The minimum allowed expected value for a bin; any bin in either tail having an expected value less than this amount is pooled with a neighboring bin. Use the value 0 to prevent pooling. Default is 5.
	`"frequency"`	A vector of the same length as `x` containing the frequency of the corresponding `x` values.
	`"alpha"`	An `alpha` value such that the hypothesis is rejected if `p < alpha`. Default is `alpha = 0.05`.

You should specify either "cdf" or "expected" parameters, but not both. If your "cdf" input contains extra parameters, these are accounted for automatically and there is no need to specify "nparams". If your "expected" input depends on estimated parameters, you should use the "nparams" parameter to ensure that the degrees of freedom for the test is correct.

Source Code: chi2gof

Example: 1


 x = normrnd (50, 5, 100, 1);
 [h, p, stats] = chi2gof (x)
 [h, p, stats] = chi2gof (x, "cdf", @(x)normcdf (x, mean(x), std(x)))
 [h, p, stats] = chi2gof (x, "cdf", {@normcdf, mean(x), std(x)})

h = 0
p = 0.5464
stats =

  scalar structure containing the fields:

    chi2stat = 4.0212
    df = 5
    edges =

       38.399   42.726   44.890   47.053   49.217   51.380   53.544   55.708   60.035

    O =

        9    7    9   22   17   14   10   12

    E =

        6.8588    8.2228   13.0313   16.8721   17.8471   15.4236   10.8899   10.8544


h = 0
p = 0.5464
stats =

  scalar structure containing the fields:

    chi2stat = 4.0212
    df = 5
    edges =

       38.399   42.726   44.890   47.053   49.217   51.380   53.544   55.708   60.035

    O =

        9    7    9   22   17   14   10   12

    E =

        6.8588    8.2228   13.0313   16.8721   17.8471   15.4236   10.8899   10.8544


h = 0
p = 0.5464
stats =

  scalar structure containing the fields:

    chi2stat = 4.0212
    df = 5
    edges =

       38.399   42.726   44.890   47.053   49.217   51.380   53.544   55.708   60.035

    O =

        9    7    9   22   17   14   10   12

    E =

        6.8588    8.2228   13.0313   16.8721   17.8471   15.4236   10.8899   10.8544

Example: 2


 x = rand (100,1 );
 n = length (x);
 binedges = linspace (0, 1, 11);
 expectedCounts = n * diff (binedges);
 [h, p, stats] = chi2gof (x, "binedges", binedges, "expected", expectedCounts)

h = 0
p = 0.9835
stats =

  scalar structure containing the fields:

    chi2stat = 2.4000
    df = 9
    edges =

     Columns 1 through 8:

       4.2756e-03   1.0230e-01   2.0032e-01   2.9835e-01   3.9637e-01   4.9439e-01   5.9242e-01   6.9044e-01

     Columns 9 through 11:

       7.8847e-01   8.8649e-01   9.8451e-01

    O =

       10    9   12    9    8   10   11    8   10   13

    E =

       10   10   10   10   10   10   10   10   10   10

Example: 3


 bins = 0:5;
 obsCounts = [6 16 10 12 4 2];
 n = sum(obsCounts);
 lambdaHat = sum(bins.*obsCounts) / n;
 expCounts = n * poisspdf(bins,lambdaHat);
 [h, p, stats] = chi2gof (bins, "binctrs", bins, "frequency", obsCounts, ...
                          "expected", expCounts, "nparams",1)

h = 0
p = 0.4654
stats =

  scalar structure containing the fields:

    chi2stat = 2.5550
    df = 3
    edges =

       4.9407e-324    8.3333e-01    1.6667e+00    2.5000e+00    3.3333e+00    5.0000e+00

    O =

        6   16   10   12    6

    E =

        7.0429   13.8041   13.5280    8.8383    6.0284

Categories &

Functions List

Clustering

Clustering

Classification Classes

Classification Classes

Clustering Classes

Clustering Classes

Regression Classes

Regression Classes

Data Manipulation

Data Manipulation

Descriptive Statistics

Descriptive Statistics

Distribution Classes

Distribution Classes

Distribution Fitting

Distribution Fitting

Distribution Functions

Distribution Functions

Distribution Statistics

Distribution Statistics

Distribution Wrappers

Distribution Wrappers

Experimental Design

Experimental Design

Machine Learning

Machine Learning

Model Fitting

Model Fitting

Hypothesis Testing

Hypothesis Testing

I/O

I/O

Plotting