bootbayes

 Performs Bayesian nonparametric bootstrap and calculates posterior statistics 
 for the mean, or regression coefficients from a linear model.


 -- Function File: bootbayes (Y)
 -- Function File: bootbayes (Y, X)
 -- Function File: bootbayes (Y, X, CLUSTID)
 -- Function File: bootbayes (Y, X, BLOCKSZ)
 -- Function File: bootbayes (Y, X, ..., NBOOT)
 -- Function File: bootbayes (Y, X, ..., NBOOT, PROB)
 -- Function File: bootbayes (Y, X, ..., NBOOT, PROB, PRIOR)
 -- Function File: bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED)
 -- Function File: bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED, L)
 -- Function File: STATS = bootbayes (Y, ...)
 -- Function File: [STATS, BOOTSTAT] = bootbayes (Y, ...)

     'bootbayes (Y)' performs Bayesian nonparametric bootstrap [1] to create
     1999 bootstrap statistics, each representing the weighted mean(s) of the
     column vector (or column-major matrix), Y, using a vector of weights
     randomly generated from a symmetric Dirichlet distribution. The resulting
     bootstrap (or posterior [1,2]) distribution(s) is/are summarised with the
     following statistics printed to the standard output:
        - original: the mean(s) of the data column(s) of Y
        - bias: bootstrap bias estimate(s)
        - median: the median of the posterior distribution(s)
        - stdev: the standard deviation of the posterior distribution(s)
        - CI_lower: lower bound(s) of the 95% credible interval
        - CI_upper: upper bound(s) of the 95% credible interval
          By default, the credible intervals are shortest probability
          intervals, which represent a more computationally stable version
          of the highest posterior density interval [3].

     'bootbayes (Y, X)' also specifies the design matrix (X) for least squares
     regression of Y on X. X should be a column vector or matrix the same
     number of rows as Y. If the X input argument is empty, the default for X
     is a column of ones (i.e. intercept only) and thus the statistic computed
     reduces to the mean (as above). The statistics calculated and returned in
     the output then relate to the coefficients from the regression of Y on X.
     Y must be a column vector (not matrix) for regression.

     'bootbayes (Y, X, CLUSTID)' specifies a vector or cell array of numbers
     or strings respectively to be used as cluster labels or identifiers.
     Rows in Y (and X) with the same CLUSTID value are treated as clusters with
     dependent errors. Rows of Y (and X) assigned to a particular cluster
     will have identical weights during Bayesian bootstrap. If empty (default),
     no clustered resampling is performed and all errors are treated as
     independent.

     'bootbayes (Y, X, BLOCKSZ)' specifies a scalar, which sets the block size
     for bootstrapping when the residuals have serial dependence. Identical
     weights are assigned within each (consecutive) block of length BLOCKSZ
     during Bayesian bootstrap. Rows of Y (and X) within the same block are
     treated as having dependent errors. If empty (default), no block
     resampling is performed and all errors are treated as independent.

     'bootbayes (Y, X, ..., NBOOT)' specifies the number of bootstrap resamples,
     where NBOOT must be a positive integer. If empty, the default value of
     NBOOT is 1999.

     'bootbayes (Y, X, ..., NBOOT, PROB)' where PROB is numeric and sets the
     lower and upper bounds of the credible interval(s). The value(s) of PROB
     must be between 0 and 1. PROB can either be:
       <> scalar: To set the central mass of shortest probability
                  intervals (SPI) to 100*(1-PROB)%
       <> vector: A pair of probabilities defining the lower and upper
                  percentiles of the credible interval(s) as 100*(PROB(1))%
                  and 100*(PROB(2))% respectively. 
          Credible intervals are not calculated when the value(s) of PROB
          is/are NaN. The default value of PROB is 0.95.

     'bootbayes (Y, X, ..., NBOOT, PROB, PRIOR)' accepts a positive real
     numeric scalar to parametrize the form of the symmetric Dirichlet
     distribution. The Dirichlet distribution is the conjugate PRIOR used to
     randomly generate weights for linear least squares fitting of the observed
     data, and subsequently to estimate the posterior for the regression
     coefficients by Bayesian bootstrap.
        If PRIOR is not provided or is empty, and the model is not intercept
     -only, then the default value of PRIOR is 1, which corresponds to Bayes
     rule: a uniform (or flat) Dirichlet distribution (over all points in its
     support). Otherwise, the value of PRIOR is set to 'auto'.
        The value 'auto' sets a value for PRIOR that effectively incorporates
     Bessel's correction a priori. Thus, for a sample size of N and PRIOR set
     to 'auto', the variance of the posterior (i.e. BOOTSTAT) becomes an
     unbiased estimator of the sampling variance. For example, when the PRIOR
     is 1, the prior is flat over the range of the data Y, approximated by the
     interval +/- 2 * std (Y, 1), which is 4 * std (Y, 1) wide according to
     the range rule of thumb for a normal distribution. Therefore, a PRIOR
     set to 'auto' is flat over the approximate interval +/- 2 * std (Y, 0).
     The calculation used for 'auto' is as follows:

          PRIOR = 1 - 2 / N

        For block or cluster bootstrap, N corresponds to the number of blocks
     or clusters (i.e. the number of independent sampling units). When N = 2,
     the PRIOR is equal to 0, which is the Haldane prior, in which case:

         std (BOOTSTAT, 1, 2) ~ std (Y, 1) == std (Y, 0) / sqrt (N) 

     Note that in this particular case, intervals will be computed using
     the standard deviation of the posterior distribution and quantiles
     from a standard normal distribution.

     'bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED)' initialises the
     Mersenne Twister random number generator using an integer SEED value so
     that 'bootbayes' results are reproducible.

     'bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED, L)' multiplies the
     regression coefficients by the hypothesis matrix L. If L is not provided
     or is empty, it will assume the default value of 1 (i.e. no change to
     the design). Otherwise, L must have the same number of rows as the number
     of columns in X.

     'STATS = bootbayes (...) returns a structure with the following fields:
     original, bias, median, stdev, CI_lower, CI_upper & prior.

     '[STATS, BOOTSTAT] = bootbayes (...)  also returns the a vector (or
     matrix) of bootstrap statistics (BOOTSTAT) calculated over the bootstrap
     resamples.

  Bibliography:
  [1] Rubin (1981) The Bayesian Bootstrap. Ann. Statist. 9(1):130-134
  [2] Weng (1989) On a Second-Order Asymptotic property of the Bayesian
        Bootstrap Mean. Ann. Statist. 17(2):705-710
  [3] Liu, Gelman & Zheng (2015). Simulation-efficient shortest probability
        intervals. Statistics and Computing, 25(4), 809–819. 

  bootbayes (version 2024.05.17)
  Author: Andrew Charles Penn
  https://www.researchgate.net/profile/Andrew_Penn/

  Copyright 2019 Andrew Charles Penn
  This program is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see http://www.gnu.org/licenses/

Demonstration 1

The following code


 % Input univariate dataset
 heights = [183, 192, 182, 183, 177, 185, 188, 188, 182, 185].';

 % 95% credible interval for the mean 
 bootbayes (heights);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of Bayesian bootstrap estimates of bias and precision for linear models
*******************************************************************************

Bootstrap settings: 
 Function: sum (W * Y)
 Resampling method: Bayesian bootstrap
 Prior: Symmetric Dirichlet distribution (a = 0.8)
 Number of resamples: 1999 
 Credible interval (CI) type: Shortest probability interval
 Credible interval: 95%

Posterior Statistics: 
 original     bias         median       stdev       CI_lower      CI_upper
 +184.5       -0.02286     +184.5       1.240       +182.1        +186.9

Demonstration 2

The following code


 % Input bivariate dataset
 X = [ones(43,1),...
     [01,02,03,04,05,06,07,08,09,10,11,...
      12,13,14,15,16,17,18,19,20,21,22,...
      23,25,26,27,28,29,30,31,32,33,34,...
      35,36,37,38,39,40,41,42,43,44]'];
 y = [188.0,170.0,189.0,163.0,183.0,171.0,185.0,168.0,173.0,183.0,173.0,...
     173.0,175.0,178.0,183.0,192.4,178.0,173.0,174.0,183.0,188.0,180.0,...
     168.0,170.0,178.0,182.0,180.0,183.0,178.0,182.0,188.0,175.0,179.0,...
     183.0,192.0,182.0,183.0,177.0,185.0,188.0,188.0,182.0,185.0]';

 % 95% credible interval for the regression coefficents
 bootbayes (y, X);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of Bayesian bootstrap estimates of bias and precision for linear models
*******************************************************************************

Bootstrap settings: 
 Function: pinv (X' * W * X) * (X' * W * y)
 Resampling method: Bayesian bootstrap
 Prior: Symmetric Dirichlet distribution (a = 1)
 Number of resamples: 1999 
 Credible interval (CI) type: Shortest probability interval
 Credible interval: 95%

Posterior Statistics: 
 original     bias         median       stdev       CI_lower      CI_upper
 +175.5       -0.05767     +175.4       2.381       +171.3        +180.3    
 +0.1904      +0.001144    +0.1941      0.07872     +0.03729      +0.3456

Package: statistics-resampling