Performs Bayesian nonparametric bootstrap and calculates posterior statistics for the mean, or regression coefficients from a linear model. -- Function File: bootbayes (Y) -- Function File: bootbayes (Y, X) -- Function File: bootbayes (Y, X, CLUSTID) -- Function File: bootbayes (Y, X, BLOCKSZ) -- Function File: bootbayes (Y, X, ..., NBOOT) -- Function File: bootbayes (Y, X, ..., NBOOT, PROB) -- Function File: bootbayes (Y, X, ..., NBOOT, PROB, PRIOR) -- Function File: bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED) -- Function File: bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED, L) -- Function File: STATS = bootbayes (Y, ...) -- Function File: [STATS, BOOTSTAT] = bootbayes (Y, ...) 'bootbayes (Y)' performs Bayesian nonparametric bootstrap [1] to create 1999 bootstrap statistics, each representing the weighted mean(s) of the column vector (or column-major matrix), Y, using a vector of weights randomly generated from a symmetric Dirichlet distribution. The resulting bootstrap (or posterior [1,2]) distribution(s) is/are summarised with the following statistics printed to the standard output: - original: the mean(s) of the data column(s) of Y - bias: bootstrap bias estimate(s) - median: the median of the posterior distribution(s) - stdev: the standard deviation of the posterior distribution(s) - CI_lower: lower bound(s) of the 95% credible interval - CI_upper: upper bound(s) of the 95% credible interval By default, the credible intervals are shortest probability intervals, which represent a more computationally stable version of the highest posterior density interval [3]. 'bootbayes (Y, X)' also specifies the design matrix (X) for least squares regression of Y on X. X should be a column vector or matrix the same number of rows as Y. If the X input argument is empty, the default for X is a column of ones (i.e. intercept only) and thus the statistic computed reduces to the mean (as above). The statistics calculated and returned in the output then relate to the coefficients from the regression of Y on X. Y must be a column vector (not matrix) for regression. 'bootbayes (Y, X, CLUSTID)' specifies a vector or cell array of numbers or strings respectively to be used as cluster labels or identifiers. Rows in Y (and X) with the same CLUSTID value are treated as clusters with dependent errors. Rows of Y (and X) assigned to a particular cluster will have identical weights during Bayesian bootstrap. If empty (default), no clustered resampling is performed and all errors are treated as independent. 'bootbayes (Y, X, BLOCKSZ)' specifies a scalar, which sets the block size for bootstrapping when the residuals have serial dependence. Identical weights are assigned within each (consecutive) block of length BLOCKSZ during Bayesian bootstrap. Rows of Y (and X) within the same block are treated as having dependent errors. If empty (default), no block resampling is performed and all errors are treated as independent. 'bootbayes (Y, X, ..., NBOOT)' specifies the number of bootstrap resamples, where NBOOT must be a positive integer. If empty, the default value of NBOOT is 1999. 'bootbayes (Y, X, ..., NBOOT, PROB)' where PROB is numeric and sets the lower and upper bounds of the credible interval(s). The value(s) of PROB must be between 0 and 1. PROB can either be: <> scalar: To set the central mass of shortest probability intervals (SPI) to 100*(1-PROB)% <> vector: A pair of probabilities defining the lower and upper percentiles of the credible interval(s) as 100*(PROB(1))% and 100*(PROB(2))% respectively. Credible intervals are not calculated when the value(s) of PROB is/are NaN. The default value of PROB is 0.95. 'bootbayes (Y, X, ..., NBOOT, PROB, PRIOR)' accepts a positive real numeric scalar to parametrize the form of the symmetric Dirichlet distribution. The Dirichlet distribution is the conjugate PRIOR used to randomly generate weights for linear least squares fitting of the observed data, and subsequently to estimate the posterior for the regression coefficients by Bayesian bootstrap. If PRIOR is not provided or is empty, and the model is not intercept -only, then the default value of PRIOR is 1, which corresponds to Bayes rule: a uniform (or flat) Dirichlet distribution (over all points in its support). Otherwise, the value of PRIOR is set to 'auto'. The value 'auto' sets a value for PRIOR that effectively incorporates Bessel's correction a priori. Thus, for a sample size of N and PRIOR set to 'auto', the variance of the posterior (i.e. BOOTSTAT) becomes an unbiased estimator of the sampling variance. For example, when the PRIOR is 1, the prior is flat over the range of the data Y, approximated by the interval +/- 2 * std (Y, 1), which is 4 * std (Y, 1) wide according to the range rule of thumb for a normal distribution. Therefore, a PRIOR set to 'auto' is flat over the approximate interval +/- 2 * std (Y, 0). The calculation used for 'auto' is as follows: PRIOR = 1 - 2 / N For block or cluster bootstrap, N corresponds to the number of blocks or clusters (i.e. the number of independent sampling units). When N = 2, the PRIOR is equal to 0, which is the Haldane prior, in which case: std (BOOTSTAT, 1, 2) ~ std (Y, 1) == std (Y, 0) / sqrt (N) Note that in this particular case, intervals will be computed using the standard deviation of the posterior distribution and quantiles from a standard normal distribution. 'bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED)' initialises the Mersenne Twister random number generator using an integer SEED value so that 'bootbayes' results are reproducible. 'bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED, L)' multiplies the regression coefficients by the hypothesis matrix L. If L is not provided or is empty, it will assume the default value of 1 (i.e. no change to the design). Otherwise, L must have the same number of rows as the number of columns in X. 'STATS = bootbayes (...) returns a structure with the following fields: original, bias, median, stdev, CI_lower, CI_upper & prior. '[STATS, BOOTSTAT] = bootbayes (...) also returns the a vector (or matrix) of bootstrap statistics (BOOTSTAT) calculated over the bootstrap resamples. Bibliography: [1] Rubin (1981) The Bayesian Bootstrap. Ann. Statist. 9(1):130-134 [2] Weng (1989) On a Second-Order Asymptotic property of the Bayesian Bootstrap Mean. Ann. Statist. 17(2):705-710 [3] Liu, Gelman & Zheng (2015). Simulation-efficient shortest probability intervals. Statistics and Computing, 25(4), 809–819. bootbayes (version 2024.05.17) Author: Andrew Charles Penn https://www.researchgate.net/profile/Andrew_Penn/ Copyright 2019 Andrew Charles Penn This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/
The following code
% Input univariate dataset heights = [183, 192, 182, 183, 177, 185, 188, 188, 182, 185].'; % 95% credible interval for the mean bootbayes (heights); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of Bayesian bootstrap estimates of bias and precision for linear models ******************************************************************************* Bootstrap settings: Function: sum (W * Y) Resampling method: Bayesian bootstrap Prior: Symmetric Dirichlet distribution (a = 0.8) Number of resamples: 1999 Credible interval (CI) type: Shortest probability interval Credible interval: 95% Posterior Statistics: original bias median stdev CI_lower CI_upper +184.5 -0.02286 +184.5 1.240 +182.1 +186.9
The following code
% Input bivariate dataset X = [ones(43,1),... [01,02,03,04,05,06,07,08,09,10,11,... 12,13,14,15,16,17,18,19,20,21,22,... 23,25,26,27,28,29,30,31,32,33,34,... 35,36,37,38,39,40,41,42,43,44]']; y = [188.0,170.0,189.0,163.0,183.0,171.0,185.0,168.0,173.0,183.0,173.0,... 173.0,175.0,178.0,183.0,192.4,178.0,173.0,174.0,183.0,188.0,180.0,... 168.0,170.0,178.0,182.0,180.0,183.0,178.0,182.0,188.0,175.0,179.0,... 183.0,192.0,182.0,183.0,177.0,185.0,188.0,188.0,182.0,185.0]'; % 95% credible interval for the regression coefficents bootbayes (y, X); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of Bayesian bootstrap estimates of bias and precision for linear models ******************************************************************************* Bootstrap settings: Function: pinv (X' * W * X) * (X' * W * y) Resampling method: Bayesian bootstrap Prior: Symmetric Dirichlet distribution (a = 1) Number of resamples: 1999 Credible interval (CI) type: Shortest probability interval Credible interval: 95% Posterior Statistics: original bias median stdev CI_lower CI_upper +175.5 -0.05767 +175.4 2.381 +171.3 +180.3 +0.1904 +0.001144 +0.1941 0.07872 +0.03729 +0.3456
Package: statistics-resampling