bootwild

 Performs wild bootstrap and calculates bootstrap-t confidence intervals and 
 p-values for the mean, or the regression coefficients from a linear model.

 -- Function File: bootwild (y)
 -- Function File: bootwild (y, X)
 -- Function File: bootwild (y, X, CLUSTID)
 -- Function File: bootwild (y, X, BLOCKSZ)
 -- Function File: bootwild (y, X, ..., NBOOT)
 -- Function File: bootwild (y, X, ..., NBOOT, ALPHA)
 -- Function File: bootwild (y, X, ..., NBOOT, ALPHA, SEED)
 -- Function File: bootwild (y, X, ..., NBOOT, ALPHA, SEED, L)
 -- Function File: STATS = bootwild (y, ...)
 -- Function File: [STATS, BOOTSTAT] = bootwild (y, ...)
 -- Function File: [STATS, BOOTSTAT, BOOTSSE] = bootwild (y, ...)
 -- Function File: [STATS, BOOTSTAT, BOOTSSE, BOOTFIT] = bootwild (y, ...)

     'bootwild (y)' performs a null hypothesis significance test for the
     mean of y being equal to 0. This function implements wild bootstrap-t
     resampling of Webb's 6-point distribution of the residuals and computes
     confidence intervals and p-values [1-4]. The following statistics are
     printed to the standard output:
        - original: the mean of the data vector y
        - std_err: heteroscedasticity-consistent standard error(s) (HC1)
        - CI_lower: lower bound(s) of the 95% bootstrap-t confidence interval
        - CI_upper: upper bound(s) of the 95% bootstrap-t confidence interval
        - tstat: Student's t-statistic
        - pval: two-tailed p-value(s) for the parameter(s) being equal to 0
        - fpr: minimum false positive risk for the corresponding p-value
          By default, the confidence intervals are symmetric bootstrap-t
          confidence intervals. The p-values are computed following both of
          the guidelines by Hall and Wilson [5]. The minimum false positive
          risk (FPR) is computed according to the Sellke-Berger approach as
          as described in [6,7].

     'bootwild (y, X)' also specifies the design matrix (X) for least squares
     regression of y on X. X should be a column vector or matrix the same
     number of rows as y. If the X input argument is empty, the default for X
     is a column of ones (i.e. intercept only) and thus the statistic computed
     reduces to the mean (as above). The statistics calculated and returned in
     the output then relate to the coefficients from the regression of y on X.

     'bootwild (y, X, CLUSTID)' specifies a vector or cell array of numbers
     or strings respectively to be used as cluster labels or identifiers.
     Rows in y (and X) with the same CLUSTID value are treated as clusters
     with dependent errors. Rows of y (and X) assigned to a particular
     cluster will have identical resampling during wild bootstrap. If empty
     (default), no clustered resampling is performed and all errors are
     treated as independent. The standard errors computed are cluster robust.

     'bootwild (y, X, BLOCKSZ)' specifies a scalar, which sets the block size
     for bootstrapping when the residuals have serial dependence. Identical
     resampling occurs within each (consecutive) block of length BLOCKSZ
     during wild bootstrap. Rows of y (and X) within the same block are
     treated as having dependent errors. If empty (default), no block
     resampling is performed and all errors are treated as independent.
     The standard errors computed are cluster robust.

     'bootwild (y, X, ..., NBOOT)' specifies the number of bootstrap resamples,
     where NBOOT must be a positive integer. If empty, the default value of
     NBOOT is 1999.

     'bootwild (y, X, ..., NBOOT, ALPHA)' is numeric and sets the lower and
     upper bounds of the confidence interval(s). The value(s) of ALPHA must
     be between 0 and 1. ALPHA can either be:
        o scalar: To set the (nominal) central coverage of SYMMETRIC
                  bootstrap-t confidence interval(s) to 100*(1-ALPHA)%.
                  For example, 0.05 for a 95% confidence interval.
        o vector: A pair of probabilities defining the (nominal) lower and
                  upper bounds of ASYMMETRIC bootstrap-t confidence interval(s)
                  as 100*(ALPHA(1))% and 100*(ALPHA(2))% respectively. For
                  example, [.025, .975] for a 95% confidence interval.
        The default value of ALPHA is the scalar: 0.05, for symmetric 95% 
        bootstrap-t confidence interval(s).

     'bootwild (y, X, ..., NBOOT, ALPHA, SEED)' initialises the Mersenne
     Twister random number generator using an integer SEED value so that
     'bootwild' results are reproducible.

     'bootwild (y, X, ..., NBOOT, ALPHA, SEED, L)' multiplies the regression
     coefficients by the hypothesis matrix L. If L is not provided or is empty,
     it will assume the default value of 1 (i.e. no change to the design). 

     'STATS = bootwild (...) returns a structure with the following fields:
     original, std_err, CI_lower, CI_upper, tstat, pval, fpr and the sum-of-
     squared error (sse).

     '[STATS, BOOTSTAT] = bootwild (...)  also returns a vector (or matrix) of
     bootstrap statistics (BOOTSTAT) calculated over the bootstrap resamples
     (before studentization).

     '[STATS, BOOTSTAT, BOOTSSE] = bootwild (...)  also returns a vector
     containing the sum-of-squared error for the fit on each bootstrap 
     resample.

     '[STATS, BOOTSTAT, BOOTSSE, BOOTFIT] = bootwild (...)  also returns an
     N-by-NBOOT matrix containing the N fitted values for each of the NBOOT
     bootstrap resamples.

  Bibliography:
  [1] Wu (1986). Jackknife, bootstrap and other resampling methods in
        regression analysis (with discussions). Ann Stat.. 14: 1261–1350. 
  [2] Cameron, Gelbach and Miller (2008) Bootstrap-based Improvements for
        Inference with Clustered Errors. Rev Econ Stat. 90(3), 414-427
  [3] Webb (2023) Reworking wild bootstrap-based inference for clustered
        errors. Can J Econ. https://doi.org/10.1111/caje.12661
  [4] Cameron and Miller (2015) A Practitioner’s Guide to Cluster-Robust
        Inference. J Hum Resour. 50(2):317-372
  [5] Hall and Wilson (1991) Two Guidelines for Bootstrap Hypothesis Testing.
        Biometrics, 47(2), 757-762
  [6] Colquhoun (2019) The False Positive Risk: A Proposal Concerning What
        to Do About p-Values, Am Stat. 73:sup1, 192-201
  [7] Sellke, Bayarri and Berger (2001) Calibration of p-values for Testing
        Precise Null Hypotheses. Am Stat. 55(1), 62-71

  bootwild (version 2024.05.23)
  Author: Andrew Charles Penn
  https://www.researchgate.net/profile/Andrew_Penn/

  Copyright 2019 Andrew Charles Penn
  This program is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see http://www.gnu.org/licenses/

Demonstration 1

The following code


 % Input univariate dataset
 heights = [183, 192, 182, 183, 177, 185, 188, 188, 182, 185].';

 % Compute test statistics, confidence intervals and p-values (H0 = 0)
 bootwild (heights);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of wild bootstrap null hypothesis significance tests for linear models
*******************************************************************************

Bootstrap settings: 
 Function: pinv (X) * y
 Resampling method: Wild bootstrap-t
 Number of resamples: 1999 
 Standard error calculations: Heteroscedasticity-Consistent (HC1)
 Confidence interval (CI) type: Symmetric bootstrap-t interval
 Nominal central coverage: 95%
 Null value (H0) used for hypothesis testing (p-values): 0 

Test Statistics: 
 original     std_err      CI_lower     CI_upper     t-stat      p-val     FPR
 +184.5       1.310        +181.6       +187.4       +141.       <.001    .010

Demonstration 2

The following code


 % Input bivariate dataset
 X = [ones(43,1),...
     [01,02,03,04,05,06,07,08,09,10,11,...
      12,13,14,15,16,17,18,19,20,21,22,...
      23,25,26,27,28,29,30,31,32,33,34,...
      35,36,37,38,39,40,41,42,43,44]'];
 y = [188.0,170.0,189.0,163.0,183.0,171.0,185.0,168.0,173.0,183.0,173.0,...
     173.0,175.0,178.0,183.0,192.4,178.0,173.0,174.0,183.0,188.0,180.0,...
     168.0,170.0,178.0,182.0,180.0,183.0,178.0,182.0,188.0,175.0,179.0,...
     183.0,192.0,182.0,183.0,177.0,185.0,188.0,188.0,182.0,185.0]';

 % Compute test statistics, confidence intervals and p-values (H0 = 0)
 bootwild (y, X);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of wild bootstrap null hypothesis significance tests for linear models
*******************************************************************************

Bootstrap settings: 
 Function: pinv (X) * y
 Resampling method: Wild bootstrap-t
 Number of resamples: 1999 
 Standard error calculations: Heteroscedasticity-Consistent (HC1)
 Confidence interval (CI) type: Symmetric bootstrap-t interval
 Nominal central coverage: 95%
 Null value (H0) used for hypothesis testing (p-values): 0 

Test Statistics: 
 original     std_err      CI_lower     CI_upper     t-stat      p-val     FPR
 +175.5       2.563        +169.8       +181.2       +68.5       <.001    .010
 +0.1904      0.08460      +0.003534    +0.3773      +2.25        .047    .280

Package: statistics-resampling