Performs balanced bootstrap (or bootknife) resampling of clusters or blocks of data and calculates bootstrap bias, standard errors and confidence intervals. -- Function File: bootclust (DATA) -- Function File: bootclust (DATA, NBOOT) -- Function File: bootclust (DATA, NBOOT, BOOTFUN) -- Function File: bootclust ({D1, D2, ...}, NBOOT, BOOTFUN) -- Function File: bootclust (DATA, NBOOT, {BOOTFUN, ...}) -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA) -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID) -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, BLOCKSZ) -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO) -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO, SEED) -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO, SEED, NPROC) -- Function File: STATS = bootclust (...) -- Function File: [STATS, BOOTSTAT] = bootclust (...) 'bootclust (DATA)' uses nonparametric balanced bootstrap resampling to generate 1999 resamples from clusters or contiguous blocks of rows of the DATA (column vector or matrix) [1]. By default, each row is it's own cluster/block (i.e. no clustering or blocking). The means of the resamples are then computed and the following statistics are displayed: - original: the original estimate(s) calculated by BOOTFUN and the DATA - bias: bootstrap estimate of the bias of the sampling distribution(s) - std_error: bootstrap estimate(s) of the standard error(s) - CI_lower: lower bound(s) of the 95% bootstrap confidence interval(s) - CI_upper: upper bound(s) of the 95% bootstrap confidence interval(s) 'bootclust (DATA, NBOOT)' specifies the number of bootstrap resamples, where NBOOT is a scalar, positive integer corresponding to the number of bootstrap resamples. The default value of NBOOT is the scalar: 1999. 'bootclust (DATA, NBOOT, BOOTFUN)' also specifies BOOTFUN: the function calculated on the original sample and the bootstrap resamples. BOOTFUN must be either a: <> function handle, function name or an anonymous function, <> string of a function name, or <> a cell array where the first cell is one of the above function definitions and the remaining cells are (additional) input arguments to that function (after the data arguments). In all cases BOOTFUN must take DATA for the initial input argument(s). BOOTFUN can return a scalar or any multidimensional numeric variable, but the output will be reshaped as a column vector. BOOTFUN must calculate a statistic representative of the finite data sample; it should NOT be an estimate of a population parameter (unless they are one of the same). If BOOTFUN is @mean or 'mean', narrowness bias of the confidence intervals for single bootstrap are reduced by expanding the probabilities of the percentiles using Student's t-distribution [2]. By default, BOOTFUN is @mean. 'bootclust ({D1, D2, ...}, NBOOT, BOOTFUN)' resamples from the clusters or blocks of rows of the data vectors D1, D2 etc and the resamples are passed onto BOOTFUN as multiple data input arguments. All data vectors and matrices (D1, D2 etc) must have the same number of rows. 'bootclust (DATA, NBOOT, BOOTFUN, ALPHA)', where ALPHA is numeric and sets the lower and upper bounds of the confidence interval(s). The value(s) of ALPHA must be between 0 and 1. ALPHA can either be: <> scalar: To set the (nominal) central coverage of equal-tailed percentile confidence intervals to 100*(1-ALPHA)%. <> vector: A pair of probabilities defining the (nominal) lower and upper percentiles of the confidence interval(s) as 100*(ALPHA(1))% and 100*(ALPHA(2))% respectively. The percentiles are bias-corrected and accelerated (BCa) [3]. The default value of ALPHA is the vector: [.025, .975], for a 95% BCa confidence interval. 'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID)' also sets CLUSTID, which are identifiers that define the grouping of the DATA rows for cluster bootstrap resampling. CLUSTID should be a column vector or cell array with the same number of rows as the DATA. Rows in DATA with the same CLUSTID value are treated as clusters of observations that are resampled together. 'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, BLOCKSZ)' groups consecutive DATA rows into non-overlapping blocks of length BLOCKSZ for simple block bootstrap resampling [4]. Note that this variation of block bootstrap is a special case of resampling clustered data. By default, BLOCKSZ is 1. 'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO)' sets the resampling method. If LOO is false, the resampling method used is balanced bootstrap resampling. If LOO is true, the resampling method used is balanced bootknife resampling [5]. Where N is the number of clusters or blocks, bootknife cluster or block resampling involves creating leave-one-out jackknife samples of size N - 1, and then drawing resamples of size N with replacement from the jackknife samples, thereby incorporating Bessel's correction into the resampling procedure. LOO must be a scalar logical value. The default value of LOO is false. 'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO, SEED)' initialises the Mersenne Twister random number generator using an integer SEED value so that bootclust results are reproducible. 'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, ..., LOO, SEED, NPROC)' also sets the number of parallel processes to use for jackknife computations and non-vectorized function evaluations during bootstrap and on multicore machines. This feature requires the Parallel package (in Octave), or the Parallel Computing Toolbox (in Matlab). This option is ignored during bootstrap function evaluations when BOOTFUN is vectorized. 'STATS = bootclust (...)' returns a structure with the following fields (defined above): original, bias, std_error, CI_lower, CI_upper. '[STATS, BOOTSTAT] = bootclust (...)' returns BOOTSTAT, a vector or matrix of bootstrap statistics calculated over the bootstrap resamples. '[STATS, BOOTSTAT, BOOTDATA] = bootclust (...)' returns BOOTDATA, a 1-by- NBOOT cell array of datasets generated by cluster or block bootstrap resampling. BIBLIOGRAPHY: [1] Davison and Hinkley (1997). Bootstrap methods and their application (Vol. 1). New York, NY: Cambridge University Press. [2] Hesterberg, Tim (2014), What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum, http://arxiv.org/abs/1411.5279 [3] Efron and Tibshirani (1993) An Introduction to the Bootstrap. New York, NY: Chapman & Hall [4] Carlstein (1986) The use of subseries values for estimating the variance of a general statistic from a stationary sequence. Ann. Statist. 14, 1171-9 [5] Hesterberg (2004) Unbiasing the Bootstrap—Bootknife Sampling vs. Smoothing; Proceedings of the Section on Statistics & the Environment. Alexandria, VA: American Statistical Association. bootclust (version 2024.05.16) Author: Andrew Charles Penn https://www.researchgate.net/profile/Andrew_Penn/ Copyright 2019 Andrew Charles Penn This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; % 95% expanded BCa bootstrap confidence intervals for the mean bootclust (data, 1999, @mean); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of nonparametric block bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: mean Resampling method: Balanced, block bootstrap resampling Number of resamples: 1999 Number of data rows in each block: 1 Confidence interval (CI) type: Expanded bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 95% (1.2%, 97.5%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +29.65 -1.066e-14 +2.620 +23.66 +34.69
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ... 'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'}; % 95% expanded BCa bootstrap confidence intervals for the mean with % cluster resampling bootclust (data, 1999, @mean, [0.025,0.975], clustid); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: mean Resampling method: Balanced, cluster bootstrap resampling Number of resamples: 1999 Confidence interval (CI) type: Expanded bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 95% (1.1%, 98.8%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +29.65 -0.02581 +2.950 +22.84 +36.04
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; % 90% equal-tailed percentile bootstrap confidence intervals for % the variance bootclust (data, 1999, {@var, 1}, 0.1); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of nonparametric block bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, block bootstrap resampling Number of resamples: 1999 Number of data rows in each block: 1 Confidence interval (CI) type: Percentile (equal-tailed) Nominal coverage (and the percentiles used): 90% (5.0%, 95.0%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -6.543 +41.08 +98.85 +235.1
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ... 'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'}; % 90% equal-tailed percentile bootstrap confidence intervals for % the variance bootclust (data, 1999, {@var, 1}, 0.1, clustid); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, cluster bootstrap resampling Number of resamples: 1999 Confidence interval (CI) type: Percentile (equal-tailed) Nominal coverage (and the percentiles used): 90% (5.0%, 95.0%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -9.544 +33.87 +102.0 +214.8
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; % 90% BCa bootstrap confidence intervals for the variance bootclust (data, 1999, {@var, 1}, [0.05 0.95]); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of nonparametric block bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, block bootstrap resampling Number of resamples: 1999 Number of data rows in each block: 1 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 90% (11.7%, 98.6%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -6.634 +42.13 +114.1 +258.7
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ... 'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'}; % 90% BCa bootstrap confidence intervals for the variance bootclust (data, 1999, {@var, 1}, [0.05 0.95], clustid); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, cluster bootstrap resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 90% (15.1%, 99.1%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -10.30 +34.38 +125.0 +235.8
The following code
% Input dataset y = randn (20,1); x = randn (20,1); X = [ones(20,1), x]; % 90% BCa confidence interval for regression coefficients bootclust ({X,y}, 1999, @mldivide, [0.05 0.95]); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of nonparametric block bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: mldivide Resampling method: Balanced, block bootstrap resampling Number of resamples: 1999 Number of data rows in each block: 1 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage: 90% Bootstrap Statistics: original bias std_error CI_lower CI_upper -0.05462 -0.009144 +0.2292 -0.4038 +0.3533 +0.4724 +0.01875 +0.2151 +0.09272 +0.7899
The following code
% Input dataset y = randn (20,1); x = randn (20,1); X = [ones(20,1), x]; clustid = [1;1;1;1;2;2;2;3;3;3;3;4;4;4;4;4;5;5;5;6]; % 90% BCa confidence interval for regression coefficients bootclust ({X,y}, 1999, @mldivide, [0.05 0.95], clustid); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: mldivide Resampling method: Balanced, cluster bootstrap resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage: 90% Bootstrap Statistics: original bias std_error CI_lower CI_upper +0.0009749 +0.03331 +0.1387 -0.2001 +0.2117 -0.1078 -0.04871 +0.3960 -0.9451 +0.3961
The following code
% Input bivariate dataset x = [576 635 558 578 666 580 555 661 651 605 653 575 545 572 594].'; y = [3.39 3.3 2.81 3.03 3.44 3.07 3 3.43 ... 3.36 3.13 3.12 2.74 2.76 2.88 2.96].'; clustid = [1;1;3;1;1;2;2;2;2;3;1;3;3;3;2]; % 95% BCa bootstrap confidence intervals for the correlation coefficient bootclust ({x, y}, 1999, @cor, [], clustid); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: cor Resampling method: Balanced, cluster bootstrap resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 95% (2.8%, 97.8%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +0.7764 -0.02442 +0.1446 +0.4355 +1.014
The following code
% The following dataset represents lutenizing hormone levels measured in a % healthy women every 10 minutes over an 8 hour period. The dataset was the % example tabulated on page 92 of Efron and Tibshirani (1993) An Introduction % to the Bootstrap. y=[2.4;2.4;2.4;2.2;2.1;1.5;2.3;2.3; 2.5;2.0;1.9;1.7;2.2;1.8;3.2;3.2;... 2.7;2.2;2.2;1.9;1.9;1.8;2.7;3.0;2.3;2.0;2.0;2.9;2.9;2.7;2.7;2.3;... 2.6;2.4;1.8;1.7;1.5;1.4;2.1;3.3;3.5;3.5;3.1;2.6;2.1;3.4;3.0;2.9]; % Calculation of the standardized lutenizing hormone levels is as follows z = y - mean(y); % Let us then calculate the coefficient for a first order autoregressive % model, AR(1), which can be used to make future predictions of the level % of lutenizing hormone from the last measurement taken. We will use block % bootstrap using a block size of 3 to obtain an estimate of the standard % error and 95% confidence intervals around the regression coefficient % estimate. betafunc = @(y) (y(1:end-1) - mean(y)) \ (y(2:end) - mean(y)); blocksz = 3; seed = 2; bootclust(y,1999,betafunc,[0.025,0.975],blocksz,true,seed); % The estimate of beta here is 0.586 and the standard error is about 0.13. % The coefficient indicates that we can predict that standardized hormone % levels to change by a factor of 0.586 from the previous timepoint.
Produces the following output
Summary of nonparametric block bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: @(y) (y (1:end - 1) - mean (y)) \ (y (2:end) - mean (y)) Resampling method: Balanced, block bootknife resampling Number of resamples: 1999 Number of data rows in each block: 3 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 95% (73.8%, 100.0%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +0.5858 -0.1686 +0.1319 +0.5024 +0.8876
Package: statistics-resampling