bootclust

 Performs balanced bootstrap (or bootknife) resampling of clusters or blocks of
 data and calculates bootstrap bias, standard errors and confidence intervals.

 -- Function File: bootclust (DATA)
 -- Function File: bootclust (DATA, NBOOT)
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN)
 -- Function File: bootclust ({D1, D2, ...}, NBOOT, BOOTFUN)
 -- Function File: bootclust (DATA, NBOOT, {BOOTFUN, ...})
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA)
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID)
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, BLOCKSZ)
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO)
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO, SEED)
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO, SEED, NPROC)
 -- Function File: STATS = bootclust (...)
 -- Function File: [STATS, BOOTSTAT] = bootclust (...)

     'bootclust (DATA)' uses nonparametric balanced bootstrap resampling
     to generate 1999 resamples from clusters or contiguous blocks of rows of
     the DATA (column vector or matrix) [1]. By default, each row is it's own
     cluster/block (i.e. no clustering or blocking). The means of the resamples
     are then computed and the following statistics are displayed:
        - original: the original estimate(s) calculated by BOOTFUN and the DATA
        - bias: bootstrap estimate of the bias of the sampling distribution(s)
        - std_error: bootstrap estimate(s) of the standard error(s)
        - CI_lower: lower bound(s) of the 95% bootstrap confidence interval(s)
        - CI_upper: upper bound(s) of the 95% bootstrap confidence interval(s)

     'bootclust (DATA, NBOOT)' specifies the number of bootstrap resamples,
     where NBOOT is a scalar, positive integer corresponding to the number
     of bootstrap resamples. The default value of NBOOT is the scalar: 1999.

     'bootclust (DATA, NBOOT, BOOTFUN)' also specifies BOOTFUN: the function
     calculated on the original sample and the bootstrap resamples. BOOTFUN
     must be either a:
       <> function handle, function name or an anonymous function,
       <> string of a function name, or
       <> a cell array where the first cell is one of the above function
          definitions and the remaining cells are (additional) input arguments 
          to that function (after the data arguments).
        In all cases BOOTFUN must take DATA for the initial input argument(s).
        BOOTFUN can return a scalar or any multidimensional numeric variable,
        but the output will be reshaped as a column vector. BOOTFUN must
        calculate a statistic representative of the finite data sample; it
        should NOT be an estimate of a population parameter (unless they are
        one of the same). If BOOTFUN is @mean or 'mean', narrowness bias of
        the confidence intervals for single bootstrap are reduced by expanding
        the probabilities of the percentiles using Student's t-distribution
        [2]. By default, BOOTFUN is @mean.

     'bootclust ({D1, D2, ...}, NBOOT, BOOTFUN)' resamples from the clusters
     or blocks of rows of the data vectors D1, D2 etc and the resamples are
     passed onto BOOTFUN as multiple data input arguments. All data vectors
     and matrices (D1, D2 etc) must have the same number of rows.

     'bootclust (DATA, NBOOT, BOOTFUN, ALPHA)', where ALPHA is numeric
     and sets the lower and upper bounds of the confidence interval(s). The
     value(s) of ALPHA must be between 0 and 1. ALPHA can either be:
       <> scalar: To set the (nominal) central coverage of equal-tailed
                  percentile confidence intervals to 100*(1-ALPHA)%.
       <> vector: A pair of probabilities defining the (nominal) lower and
                  upper percentiles of the confidence interval(s) as
                  100*(ALPHA(1))% and 100*(ALPHA(2))% respectively. The
                  percentiles are bias-corrected and accelerated (BCa) [3].
        The default value of ALPHA is the vector: [.025, .975], for a 95%
        BCa confidence interval.

     'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID)' also sets CLUSTID,
     which are identifiers that define the grouping of the DATA rows for
     cluster bootstrap resampling. CLUSTID should be a column vector or
     cell array with the same number of rows as the DATA. Rows in DATA with
     the same CLUSTID value are treated as clusters of observations that are
     resampled together.

     'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, BLOCKSZ)' groups consecutive
     DATA rows into non-overlapping blocks of length BLOCKSZ for simple block
     bootstrap resampling [4]. Note that this variation of block bootstrap is
     a special case of resampling clustered data. By default, BLOCKSZ is 1.

     'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO)' sets the resampling
     method. If LOO is false, the resampling method used is balanced bootstrap
     resampling. If LOO is true, the resampling method used is balanced
     bootknife resampling [5]. Where N is the number of clusters or blocks,
     bootknife cluster or block resampling involves creating leave-one-out
     jackknife samples of size N - 1, and then drawing resamples of size N with
     replacement from the jackknife samples, thereby incorporating Bessel's
     correction into the resampling procedure. LOO must be a scalar logical
     value. The default value of LOO is false.

     'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO, SEED)' initialises
     the Mersenne Twister random number generator using an integer SEED value
     so that bootclust results are reproducible.

     'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO, SEED, NPROC)' also
     sets the number of parallel processes to use for jackknife computations
     and non-vectorized function evaluations during bootstrap and on multicore
     machines. This feature requires the Parallel package (in Octave), or the
     Parallel Computing Toolbox (in Matlab). This option is ignored during
     bootstrap function evaluations when BOOTFUN is vectorized.

     'STATS = bootclust (...)' returns a structure with the following fields
     (defined above): original, bias, std_error, CI_lower, CI_upper.

     '[STATS, BOOTSTAT] = bootclust (...)' returns BOOTSTAT, a vector or matrix
     of bootstrap statistics calculated over the bootstrap resamples.

     '[STATS, BOOTSTAT, BOOTDATA] = bootclust (...)' returns BOOTDATA, a 1-by-
     NBOOT cell array of datasets generated by cluster or block bootstrap
     resampling.

  BIBLIOGRAPHY:
  [1] Davison and Hinkley (1997). Bootstrap methods and their application
        (Vol. 1). New York, NY: Cambridge University Press.
  [2] Hesterberg, Tim (2014), What Teachers Should Know about the 
        Bootstrap: Resampling in the Undergraduate Statistics Curriculum, 
        http://arxiv.org/abs/1411.5279
  [3] Efron and Tibshirani (1993) An Introduction to the Bootstrap. 
        New York, NY: Chapman & Hall
  [4] Carlstein (1986) The use of subseries values for estimating the
        variance of a general statistic from a stationary sequence. 
        Ann. Statist. 14, 1171-9
  [5] Hesterberg (2004) Unbiasing the Bootstrap—Bootknife Sampling 
        vs. Smoothing; Proceedings of the Section on Statistics & the 
        Environment. Alexandria, VA: American Statistical Association.

  bootclust (version 2024.05.16)
  Author: Andrew Charles Penn
  https://www.researchgate.net/profile/Andrew_Penn/

  Copyright 2019 Andrew Charles Penn
  This program is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see http://www.gnu.org/licenses/

Demonstration 1

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';

 % 95% expanded BCa bootstrap confidence intervals for the mean
 bootclust (data, 1999, @mean);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of nonparametric block bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: mean
 Resampling method: Balanced, block bootstrap resampling 
 Number of resamples: 1999 
 Number of data rows in each block: 1 
 Confidence interval (CI) type: Expanded bias-corrected and accelerated (BCa) 
 Nominal coverage (and the percentiles used): 95% (1.2%, 97.5%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +29.65       -1.066e-14   +2.620       +23.66       +34.69

Demonstration 2

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';
 clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ...
            'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'};

 % 95% expanded BCa bootstrap confidence intervals for the mean with
 % cluster resampling
 bootclust (data, 1999, @mean, [0.025,0.975], clustid);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: mean
 Resampling method: Balanced, cluster bootstrap resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Expanded bias-corrected and accelerated (BCa) 
 Nominal coverage (and the percentiles used): 95% (1.1%, 98.8%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +29.65       -0.02581     +2.950       +22.84       +36.04

Demonstration 3

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';

 % 90% equal-tailed percentile bootstrap confidence intervals for
 % the variance
 bootclust (data, 1999, {@var, 1}, 0.1);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of nonparametric block bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: var
 Resampling method: Balanced, block bootstrap resampling 
 Number of resamples: 1999 
 Number of data rows in each block: 1 
 Confidence interval (CI) type: Percentile (equal-tailed)
 Nominal coverage (and the percentiles used): 90% (5.0%, 95.0%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +171.5       -6.543       +41.08       +98.85       +235.1

Demonstration 4

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';
 clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ...
            'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'};

 % 90% equal-tailed percentile bootstrap confidence intervals for
 % the variance
 bootclust (data, 1999, {@var, 1}, 0.1, clustid);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: var
 Resampling method: Balanced, cluster bootstrap resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Percentile (equal-tailed)
 Nominal coverage (and the percentiles used): 90% (5.0%, 95.0%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +171.5       -9.544       +33.87       +102.0       +214.8

Demonstration 5

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';

 % 90% BCa bootstrap confidence intervals for the variance
 bootclust (data, 1999, {@var, 1}, [0.05 0.95]);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of nonparametric block bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: var
 Resampling method: Balanced, block bootstrap resampling 
 Number of resamples: 1999 
 Number of data rows in each block: 1 
 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) 
 Nominal coverage (and the percentiles used): 90% (11.7%, 98.6%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +171.5       -6.634       +42.13       +114.1       +258.7

Demonstration 6

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';
 clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ...
            'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'};

 % 90% BCa bootstrap confidence intervals for the variance
 bootclust (data, 1999, {@var, 1}, [0.05 0.95], clustid);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: var
 Resampling method: Balanced, cluster bootstrap resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) 
 Nominal coverage (and the percentiles used): 90% (15.1%, 99.1%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +171.5       -10.30       +34.38       +125.0       +235.8

Demonstration 7

The following code


 % Input dataset
 y = randn (20,1); x = randn (20,1); X = [ones(20,1), x];

 % 90% BCa confidence interval for regression coefficients 
 bootclust ({X,y}, 1999, @mldivide, [0.05 0.95]);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of nonparametric block bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: mldivide
 Resampling method: Balanced, block bootstrap resampling 
 Number of resamples: 1999 
 Number of data rows in each block: 1 
 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) 
 Nominal coverage: 90%

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 -0.05462     -0.009144    +0.2292      -0.4038      +0.3533    
 +0.4724      +0.01875     +0.2151      +0.09272     +0.7899

Demonstration 8

The following code


 % Input dataset
 y = randn (20,1); x = randn (20,1); X = [ones(20,1), x];
 clustid = [1;1;1;1;2;2;2;3;3;3;3;4;4;4;4;4;5;5;5;6];

 % 90% BCa confidence interval for regression coefficients 
 bootclust ({X,y}, 1999, @mldivide, [0.05 0.95], clustid);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: mldivide
 Resampling method: Balanced, cluster bootstrap resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) 
 Nominal coverage: 90%

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +0.0009749   +0.03331     +0.1387      -0.2001      +0.2117    
 -0.1078      -0.04871     +0.3960      -0.9451      +0.3961

Demonstration 9

The following code


 % Input bivariate dataset
 x = [576 635 558 578 666 580 555 661 651 605 653 575 545 572 594].';
 y = [3.39 3.3 2.81 3.03 3.44 3.07 3 3.43 ...
      3.36 3.13 3.12 2.74 2.76 2.88 2.96].';
 clustid = [1;1;3;1;1;2;2;2;2;3;1;3;3;3;2];

 % 95% BCa bootstrap confidence intervals for the correlation coefficient
 bootclust ({x, y}, 1999, @cor, [], clustid);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: cor
 Resampling method: Balanced, cluster bootstrap resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) 
 Nominal coverage (and the percentiles used): 95% (2.8%, 97.8%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +0.7764      -0.02442     +0.1446      +0.4355      +1.014

Demonstration 10

The following code


 % The following dataset represents lutenizing hormone levels measured in a
 % healthy women every 10 minutes over an 8 hour period. The dataset was the
 % example tabulated on page 92 of Efron and Tibshirani (1993) An Introduction
 % to the Bootstrap.
 y=[2.4;2.4;2.4;2.2;2.1;1.5;2.3;2.3; 2.5;2.0;1.9;1.7;2.2;1.8;3.2;3.2;...
    2.7;2.2;2.2;1.9;1.9;1.8;2.7;3.0;2.3;2.0;2.0;2.9;2.9;2.7;2.7;2.3;...
    2.6;2.4;1.8;1.7;1.5;1.4;2.1;3.3;3.5;3.5;3.1;2.6;2.1;3.4;3.0;2.9];

 % Calculation of the standardized lutenizing hormone levels is as follows
 z = y - mean(y);

 % Let us then calculate the coefficient for a first order autoregressive
 % model, AR(1), which can be used to make future predictions of the level
 % of lutenizing hormone from the last measurement taken. We will use block
 % bootstrap using a block size of 3 to obtain an estimate of the standard
 % error and 95% confidence intervals around the regression coefficient
 % estimate.
 betafunc = @(y) (y(1:end-1) - mean(y)) \ (y(2:end) - mean(y));
 blocksz = 3;
 seed = 2;
 bootclust(y,1999,betafunc,[0.025,0.975],blocksz,true,seed);

 % The estimate of beta here is 0.586 and the standard error is about 0.13.
 % The coefficient indicates that we can predict that standardized hormone
 % levels to change by a factor of 0.586 from the previous timepoint.

Produces the following output

Summary of nonparametric block bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: @(y) (y (1:end - 1) - mean (y)) \ (y (2:end) - mean (y))
 Resampling method: Balanced, block bootknife resampling 
 Number of resamples: 1999 
 Number of data rows in each block: 3 
 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) 
 Nominal coverage (and the percentiles used): 95% (73.8%, 100.0%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +0.5858      -0.1686      +0.1319      +0.5024      +0.8876

Package: statistics-resampling