The statistics-resampling package manual

bootstrp

 Bootstrap: Resample with replacement to generate new samples and return the
 statistic(s) calculated by evaluating the specified function on each resample.


 -- Function File: BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D)
 -- Function File: BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D1, ..., DN)
 -- Function File: BOOTSTAT = bootstrp (..., D1, ..., DN, 'match', MATCH)
 -- Function File: BOOTSTAT = bootstrp (..., 'Options', PAROPT)
 -- Function File: BOOTSTAT = bootstrp (..., 'Weights', WEIGHTS)
 -- Function File: BOOTSTAT = bootstrp (..., 'loo', LOO)
 -- Function File: BOOTSTAT = bootstrp (..., 'seed', SEED)
 -- Function File: [BOOTSTAT, BOOTSAM] = bootstrp (...)
 -- Function File: [BOOTSTAT, BOOTSAM, STATS] = bootstrp (...)

     'BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D)' draws NBOOT bootstrap resamples
     with replacement from the rows of the data D and returns the statistic
     computed by BOOTFUN in BOOTSTAT [1]. BOOTFUN is a function handle (e.g.
     specified with @) or name, a string indicating the function name, or a
     cell array, where the first cell is one of the above function definitions
     and the remaining cells are (additional) input arguments to that function
     (after the data argument(s)). The third input argument is the data
     (column vector, matrix or cell array), which is supplied to BOOTFUN. This
     function is the only function in the statistics-resampling package to also
     accept cell arrays for the data arguments. The simulation method used by
     default is bootstrap resampling with first order balance [2-3]; see help
     for the 'boot' function for more information.

     'BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D1,...,DN)' is as above except 
     that the third and subsequent input arguments are multiple data objects,
     (column vectors, matrices or cell arrays,) which are used as input for
     BOOTFUN.

     'BOOTSTAT = bootstrp (..., D1, ..., DN, 'match', MATCH)' controls the
     resampling strategy when multiple data arguments are provided. When MATCH
     is true, row indices of D1 to DN are the same (i.e. matched) for each
     resample. This is the default strategy when D1 to DN all have the same
     number of rows. If MATCH is set to false, then row indices are resampled
     independently for D1 to DN in each of the resamples. When any of the data
     D1 to DN, have a different number of rows, this input argument is ignored
     and MATCH is enforced to have a value of false. Note that the MATLAB
     bootstrp function only operates in a mode equivalent to MATCH = true.
     One application of setting MATCH to false is to perform stratified
     bootstrap resampling.

     'BOOTSTAT = bootstrp (..., 'Options', PAROPT)' specifies options that
     govern if and how to perform bootstrap iterations using multiple
     processors (if the Parallel Computing Toolbox or Octave Parallel package).
     is available This argument is a structure with the following recognised
     fields:
        o 'UseParallel': If true, use parallel processes to accelerate
                         bootstrap computations on multicore machines. 
                         Default is false for serial computation. In MATLAB,
                         the default is true if a parallel pool
                         has already been started. 
        o 'nproc':       nproc sets the number of parallel processes (optional)

     'BOOTSTAT = bootstrp (..., D, 'weights', WEIGHTS)' sets the resampling
     weights. WEIGHTS must be a column vector with the same number of rows as
     the data, D. If WEIGHTS is empty or not provided, the default is a vector
     of length N with uniform weighting 1/N. 

     'BOOTSTAT = bootstrp (..., D1, ... DN, 'weights', WEIGHTS)' as above if
     MATCH is true. If MATCH is false, a 1-by-N cell array of column vectors
     can be provided to specify independent resampling weights for D1 to DN.

     'BOOTSTAT = bootstrp (..., 'loo', LOO)' sets the simulation method. If 
     LOO is false, the resampling method used is balanced bootstrap resampling.
     If LOO is true, the resampling method used is balanced bootknife
     resampling [4]. The latter involves creating leave-one-out (jackknife)
     samples of size N - 1, and then drawing resamples of size N with
     replacement from the jackknife samples, thereby incorporating Bessel's
     correction into the resampling procedure. LOO must be a scalar logical
     value. The default value of LOO is false.

     'BOOTSTAT = bootstrp (..., 'seed', SEED)' initialises the Mersenne Twister
     random number generator using an integer SEED value so that bootci results
     are reproducible.

     '[BOOTSTAT, BOOTSAM] = bootstrp (...)' also returns indices used for
     bootstrap resampling. If MATCH is true or only one data argument is
     provided, BOOTSAM is a matrix. If multiple data arguments are provided
     and MATCH is false, BOOTSAM is returned in a 1-by-N cell array of
     matrices, where each cell corresponds to the respective data argument
     D1 to DN.  To get the output samples BOOTSAM without applying a function,
     set BOOTFUN to empty (i.e. []).

     '[BOOTSTAT, BOOTSAM, STATS] = bootstrp (...)' also calculates and returns
     the following basic statistics relating to each column of BOOTSTAT: 
        - original: the original estimate(s) calculated by BOOTFUN and the DATA
        - mean: the mean of the bootstrap distribution(s)
        - bias: bootstrap estimate of the bias of the sampling distribution(s)
        - bias_corrected: original estimate(s) after subtracting the bias
        - var: bootstrap variance of the original estimate(s)
        - std_error: bootstrap estimate(s) of the standard error(s)
     If BOOTSTAT is not numeric, STATS only returns the 'original' field. If
     BOOTFUN is empty, then the value of the 'original' field is also empty.

  Bibliography:
  [1] Efron, and Tibshirani (1993) An Introduction to the
        Bootstrap. New York, NY: Chapman & Hall
  [2] Davison et al. (1986) Efficient Bootstrap Simulation.
        Biometrika, 73: 555-66
  [3] Booth, Hall and Wood (1993) Balanced Importance Resampling
        for the Bootstrap. The Annals of Statistics. 21(1):286-298
  [4] Hesterberg T.C. (2004) Unbiasing the Bootstrap—Bootknife Sampling 
        vs. Smoothing; Proceedings of the Section on Statistics & the 
        Environment. Alexandria, VA: American Statistical Association.

  bootstrp (version 2024.05.24)
  Author: Andrew Charles Penn
  https://www.researchgate.net/profile/Andrew_Penn/

  Copyright 2019 Andrew Charles Penn
  This program is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see http://www.gnu.org/licenses/

Demonstration 1

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41]';

 % Compute 500 bootstrap statistics for the mean and calculate the bootstrap
 % standard error of the mean
 bootstat = bootstrp (500, @mean, data, 'seed', 1);
 % Or equivalently
 bootstat = bootstrp (500, @mean, data, 'seed', 1, 'loo', false);
 std (bootstat)

Produces the following output

ans = 2.5977

Demonstration 2

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41]';

 % Compute 500 bootknife statistics for the mean and calculate the unbiased
 % bootstrap standard error of the mean
 bootstat = bootstrp (500, @mean, data, 'seed', 1, 'loo', true);
 std (bootstat)

Produces the following output

ans = 2.6441

Demonstration 3

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41]';
 % Split data into consecutive blocks of two data observations per cell
 data_blocks = mat2cell (data, 2 * (ones (13, 1)), 1);

 % Compute 500 bootknife statistics for the mean and calculate the unbiased
 % bootstrap standard error of the mean
 bootstat = bootstrp (500, @(x) mean (cell2mat (x)), data_blocks, 'seed', 1, ...
                                                                 'loo', true);
 std (bootstat)

Produces the following output

ans = 3.045

Demonstration 4

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41]';

 % Compute 500 bootknife statistics for the variance and calculate the
 % unbiased standard error of the variance
 bootstat = bootstrp (500, {@var, 1}, data, 'loo', true);
 std (bootstat)

Produces the following output

ans = 42.137

Demonstration 5

The following code


 % Input two-sample dataset
 X = [212 435 339 251 404 510 377 335 410 335 ...
      415 356 339 188 256 296 249 303 266 300]';
 Y = [247 461 526 302 636 593 393 409 488 381 ...
      474 329 555 282 423 323 256 431 437 240]';

 % Compute 500 bootknife statistics for the mean difference between X and Y
 % and calculate the unbiased bootstrap standard error of the mean difference
 bootstat = bootstrp (500, @(x, y) mean (x - y), X, Y, 'loo', true);
 % Or equivalently
 bootstat = bootstrp (500, @(x, y) mean (x - y), X, Y, 'loo', true, ...
                                                      'match', true);
 std (bootstat)

Produces the following output

ans = 18.185

Demonstration 6

The following code


 % Input two-sample dataset
 X = [212 435 339 251 404 510 377 335 410 335 ...
      415 356 339 188 256 296 249 303 266 300]';
 Y = [247 461 526 302 636 593 393 409 488 381 ...
      474 329 555 282 423 323 256 431 437 240]';

 % Compute 500 bootknife statistics for the difference in mean between
 % between independent samples X and Y and calculate the unbiased bootstrap
 % standard error of the difference in mean
 bootstat = bootstrp (500, @(x, y) mean (x) - mean(y), X, Y, 'loo', true, ...
                                                            'match', false);
 std (bootstat)

Produces the following output

ans = 31.797

Demonstration 7

The following code


 % Input bivariate dataset
 X = [212 435 339 251 404 510 377 335 410 335 ...
      415 356 339 188 256 296 249 303 266 300]';
 Y = [247 461 526 302 636 593 393 409 488 381 ...
      474 329 555 282 423 323 256 431 437 240]';

 % Compute 500 bootstrap statistics for the correlation coefficient and
 % calculate the bootstrap standard error of the correlation coefficient
 bootstat = bootstrp (500, @cor, X, Y);
 std (bootstat)

Produces the following output

ans = 0.10017

Demonstration 8

The following code


 % Input bivariate dataset
 X = [212 435 339 251 404 510 377 335 410 335 ...
      415 356 339 188 256 296 249 303 266 300]';
 Y = [247 461 526 302 636 593 393 409 488 381 ...
      474 329 555 282 423 323 256 431 437 240]';

 % Compute 500 bootstrap statistics for the coefficient of determination and
 % calculate it's bootstrap standard error
 bootstat = bootstrp (500, {@cor,'squared'}, X, Y);
 std (bootstat)

Produces the following output

ans = 0.12767

Demonstration 9

The following code


 % Input bivariate dataset
 X = [212 435 339 251 404 510 377 335 410 335 ...
      415 356 339 188 256 296 249 303 266 300]';
 Y = [247 461 526 302 636 593 393 409 488 381 ...
      474 329 555 282 423 323 256 431 437 240]';

 % Compute 4999 bootstrap statistics for the coefficient of determination and
 % calculate 95% percentile confidence intervals
 bootstat = bootstrp (4999, {@cor,'squared'}, X, Y);
 bootint (bootstat)

Produces the following output

ans =

      0.25642        0.743

Demonstration 10

The following code


 % Input bivariate dataset
 X = [212 435 339 251 404 510 377 335 410 335 ...
      415 356 339 188 256 296 249 303 266 300]';
 Y = [247 461 526 302 636 593 393 409 488 381 ...
      474 329 555 282 423 323 256 431 437 240]';

 % Compute 500 bootstrap statistics for the slope and intercept of a linear
 % regression and calculate their bootstrap standard errors
 bootstat = bootstrp (500, @mldivide, cat (2, ones (20, 1), X), Y);
 std (bootstat)

Produces the following output

ans =

       63.468      0.18955

Package: statistics-resampling