chi2gof
Chi-square goodness-of-fit test.
chi2gof
performs a chi-square goodness-of-fit test for discrete or
continuous distributions. The test is performed by grouping the data into
bins, calculating the observed and expected counts for those bins, and
computing the chi-square test statistic
$$ \chi ^ 2 = \sum_{i=1}^N \left (O_i - E_i \right) ^ 2 / E_i $$
where O is the observed counts and E is the expected counts. This test
statistic has an approximate chi-square distribution when the counts are
sufficiently large.
Bins in either tail with an expected count less than 5 are pooled with
neighboring bins until the count in each extreme bin is at least 5. If
bins remain in the interior with counts less than 5, chi2gof
displays
a warning. In that case, you should use fewer bins, or provide bin centers
or binedges, to increase the expected counts in all bins.
h = chi2gof (x)
performs a chi-square goodness-of-fit test
that the data in the vector X are a random sample from a normal distribution
with mean and variance estimated from x. The result is h = 0 if
the null hypothesis (that x is a random sample from a normal
distribution) cannot be rejected at the 5% significance level, or h = 1
if the nullhypothesis can be rejected at the 5% level. chi2gof
uses
by default 10 bins ("nbins"
), and compares the test statistic to a
chi-square distribution with nbins - 3
degrees of freedom, to
take into account that two parameters were estimated.
[h, p] = chi2gof (x)
also returns the p-value p,
which is the probability of observing the given result, or one more extreme,
by chance if the null hypothesis is true. If there are not enough degrees of
freedom to carry out the test, p is NaN.
[h, p, stats] = chi2gof (x)
also returns a
stats structure with the following fields:
"chi2stat" | Chi-square statistic | |
"df" | Degrees of freedom | |
"binedges" | Vector of bin binedges after pooling | |
"O" | Observed count in each bin | |
"E" | Expected count in each bin |
[…] = chi2gof (x, Name, Value, …)
specifies optional Name/Value pair arguments chosen from the following list.
Name | Value | |
---|---|---|
"nbins" | The number of bins to use. Default is 10. | |
"binctrs" | A vector of bin centers. | |
"binedges" | A vector of bin binedges. | |
"cdf" | A fully specified cumulative distribution function or a function handle provided in a cell array whose first element is a function handle, and all later elements are its parameter values. The function must take x values as its first argument, and other parameters as later arguments. | |
"expected" | A vector with one element per bin specifying the expected counts for each bin. | |
"nparams" | The number of estimated parameters; used to
adjust the degrees of freedom to be nbins - 1 - nparams ,
where nbins is the number of bins. | |
"emin" | The minimum allowed expected value for a bin; any bin in either tail having an expected value less than this amount is pooled with a neighboring bin. Use the value 0 to prevent pooling. Default is 5. | |
"frequency" | A vector of the same length as x containing the frequency of the corresponding x values. | |
"alpha" | An alpha value such that the hypothesis
is rejected if p < alpha . Default is
alpha = 0.05 . |
You should specify either "cdf"
or "expected"
parameters, but
not both. If your "cdf"
input contains extra parameters, these are
accounted for automatically and there is no need to specify "nparams"
.
If your "expected"
input depends on estimated parameters, you should
use the "nparams"
parameter to ensure that the degrees of freedom for
the test is correct.
Source Code: chi2gof
x = normrnd (50, 5, 100, 1); [h, p, stats] = chi2gof (x) [h, p, stats] = chi2gof (x, "cdf", @(x)normcdf (x, mean(x), std(x))) [h, p, stats] = chi2gof (x, "cdf", {@normcdf, mean(x), std(x)}) h = 0 p = 0.5464 stats = scalar structure containing the fields: chi2stat = 4.0212 df = 5 edges = 38.399 42.726 44.890 47.053 49.217 51.380 53.544 55.708 60.035 O = 9 7 9 22 17 14 10 12 E = 6.8588 8.2228 13.0313 16.8721 17.8471 15.4236 10.8899 10.8544 h = 0 p = 0.5464 stats = scalar structure containing the fields: chi2stat = 4.0212 df = 5 edges = 38.399 42.726 44.890 47.053 49.217 51.380 53.544 55.708 60.035 O = 9 7 9 22 17 14 10 12 E = 6.8588 8.2228 13.0313 16.8721 17.8471 15.4236 10.8899 10.8544 h = 0 p = 0.5464 stats = scalar structure containing the fields: chi2stat = 4.0212 df = 5 edges = 38.399 42.726 44.890 47.053 49.217 51.380 53.544 55.708 60.035 O = 9 7 9 22 17 14 10 12 E = 6.8588 8.2228 13.0313 16.8721 17.8471 15.4236 10.8899 10.8544 |
x = rand (100,1 ); n = length (x); binedges = linspace (0, 1, 11); expectedCounts = n * diff (binedges); [h, p, stats] = chi2gof (x, "binedges", binedges, "expected", expectedCounts) h = 0 p = 0.9835 stats = scalar structure containing the fields: chi2stat = 2.4000 df = 9 edges = Columns 1 through 7: 4.2756e-03 1.0230e-01 2.0032e-01 2.9835e-01 3.9637e-01 4.9439e-01 5.9242e-01 Columns 8 through 11: 6.9044e-01 7.8847e-01 8.8649e-01 9.8451e-01 O = 10 9 12 9 8 10 11 8 10 13 E = 10 10 10 10 10 10 10 10 10 10 |
bins = 0:5; obsCounts = [6 16 10 12 4 2]; n = sum(obsCounts); lambdaHat = sum(bins.*obsCounts) / n; expCounts = n * poisspdf(bins,lambdaHat); [h, p, stats] = chi2gof (bins, "binctrs", bins, "frequency", obsCounts, ... "expected", expCounts, "nparams",1) h = 0 p = 0.4654 stats = scalar structure containing the fields: chi2stat = 2.5550 df = 3 edges = 4.9407e-324 8.3333e-01 1.6667e+00 2.5000e+00 3.3333e+00 5.0000e+00 O = 6 16 10 12 6 E = 7.0429 13.8041 13.5280 8.8383 6.0284 |