Categories &

Functions List

Function Reference: chi2gof

statistics: h = chi2gof (x)
statistics: [h, p] = chi2gof (x)
statistics: [p, h, stats] = chi2gof (x)
statistics: […] = chi2gof (x, Name, Value, …)

Chi-square goodness-of-fit test.

chi2gof performs a chi-square goodness-of-fit test for discrete or continuous distributions. The test is performed by grouping the data into bins, calculating the observed and expected counts for those bins, and computing the chi-square test statistic $$ \chi ^ 2 = \sum_{i=1}^N \left (O_i - E_i \right) ^ 2 / E_i $$ where O is the observed counts and E is the expected counts. This test statistic has an approximate chi-square distribution when the counts are sufficiently large.

Bins in either tail with an expected count less than 5 are pooled with neighboring bins until the count in each extreme bin is at least 5. If bins remain in the interior with counts less than 5, chi2gof displays a warning. In that case, you should use fewer bins, or provide bin centers or binedges, to increase the expected counts in all bins.

h = chi2gof (x) performs a chi-square goodness-of-fit test that the data in the vector X are a random sample from a normal distribution with mean and variance estimated from x. The result is h = 0 if the null hypothesis (that x is a random sample from a normal distribution) cannot be rejected at the 5% significance level, or h = 1 if the nullhypothesis can be rejected at the 5% level. chi2gof uses by default 10 bins ("nbins"), and compares the test statistic to a chi-square distribution with nbins - 3 degrees of freedom, to take into account that two parameters were estimated.

[h, p] = chi2gof (x) also returns the p-value p, which is the probability of observing the given result, or one more extreme, by chance if the null hypothesis is true. If there are not enough degrees of freedom to carry out the test, p is NaN.

[h, p, stats] = chi2gof (x) also returns a stats structure with the following fields:

"chi2stat"Chi-square statistic
"df"Degrees of freedom
"binedges"Vector of bin binedges after pooling
"O"Observed count in each bin
"E"Expected count in each bin

[…] = chi2gof (x, Name, Value, …) specifies optional Name/Value pair arguments chosen from the following list.

NameValue
"nbins"The number of bins to use. Default is 10.
"binctrs"A vector of bin centers.
"binedges"A vector of bin binedges.
"cdf"A fully specified cumulative distribution function or a function handle provided in a cell array whose first element is a function handle, and all later elements are its parameter values. The function must take x values as its first argument, and other parameters as later arguments.
"expected"A vector with one element per bin specifying the expected counts for each bin.
"nparams"The number of estimated parameters; used to adjust the degrees of freedom to be nbins - 1 - nparams, where nbins is the number of bins.
"emin"The minimum allowed expected value for a bin; any bin in either tail having an expected value less than this amount is pooled with a neighboring bin. Use the value 0 to prevent pooling. Default is 5.
"frequency"A vector of the same length as x containing the frequency of the corresponding x values.
"alpha"An alpha value such that the hypothesis is rejected if p < alpha. Default is alpha = 0.05.

You should specify either "cdf" or "expected" parameters, but not both. If your "cdf" input contains extra parameters, these are accounted for automatically and there is no need to specify "nparams". If your "expected" input depends on estimated parameters, you should use the "nparams" parameter to ensure that the degrees of freedom for the test is correct.

Source Code: chi2gof

Example: 1

 

 x = normrnd (50, 5, 100, 1);
 [h, p, stats] = chi2gof (x)
 [h, p, stats] = chi2gof (x, "cdf", @(x)normcdf (x, mean(x), std(x)))
 [h, p, stats] = chi2gof (x, "cdf", {@normcdf, mean(x), std(x)})

h = 0
p = 0.5464
stats =

  scalar structure containing the fields:

    chi2stat = 4.0212
    df = 5
    edges =

       38.399   42.726   44.890   47.053   49.217   51.380   53.544   55.708   60.035

    O =

        9    7    9   22   17   14   10   12

    E =

        6.8588    8.2228   13.0313   16.8721   17.8471   15.4236   10.8899   10.8544


h = 0
p = 0.5464
stats =

  scalar structure containing the fields:

    chi2stat = 4.0212
    df = 5
    edges =

       38.399   42.726   44.890   47.053   49.217   51.380   53.544   55.708   60.035

    O =

        9    7    9   22   17   14   10   12

    E =

        6.8588    8.2228   13.0313   16.8721   17.8471   15.4236   10.8899   10.8544


h = 0
p = 0.5464
stats =

  scalar structure containing the fields:

    chi2stat = 4.0212
    df = 5
    edges =

       38.399   42.726   44.890   47.053   49.217   51.380   53.544   55.708   60.035

    O =

        9    7    9   22   17   14   10   12

    E =

        6.8588    8.2228   13.0313   16.8721   17.8471   15.4236   10.8899   10.8544


                    

Example: 2

 

 x = rand (100,1 );
 n = length (x);
 binedges = linspace (0, 1, 11);
 expectedCounts = n * diff (binedges);
 [h, p, stats] = chi2gof (x, "binedges", binedges, "expected", expectedCounts)

h = 0
p = 0.9835
stats =

  scalar structure containing the fields:

    chi2stat = 2.4000
    df = 9
    edges =

     Columns 1 through 7:

       4.2756e-03   1.0230e-01   2.0032e-01   2.9835e-01   3.9637e-01   4.9439e-01   5.9242e-01

     Columns 8 through 11:

       6.9044e-01   7.8847e-01   8.8649e-01   9.8451e-01

    O =

       10    9   12    9    8   10   11    8   10   13

    E =

       10   10   10   10   10   10   10   10   10   10


                    

Example: 3

 

 bins = 0:5;
 obsCounts = [6 16 10 12 4 2];
 n = sum(obsCounts);
 lambdaHat = sum(bins.*obsCounts) / n;
 expCounts = n * poisspdf(bins,lambdaHat);
 [h, p, stats] = chi2gof (bins, "binctrs", bins, "frequency", obsCounts, ...
                          "expected", expCounts, "nparams",1)

h = 0
p = 0.4654
stats =

  scalar structure containing the fields:

    chi2stat = 2.5550
    df = 3
    edges =

       4.9407e-324    8.3333e-01    1.6667e+00    2.5000e+00    3.3333e+00    5.0000e+00

    O =

        6   16   10   12    6

    E =

        7.0429   13.8041   13.5280    8.8383    6.0284