Statistics: parseWilkinsonFormula

Function Reference: `parseWilkinsonFormula`

statistics: terms = parseWilkinsonFormula (formula)
statistics: result = parseWilkinsonFormula (formula, mode)
statistics: [X, y, names] = parseWilkinsonFormula (formula, "model_matrix", data)

Parse and expand statistical model formulae using the Wilkinson notation.

This function implements the recursive-descent parser and expansion logic described by Wilkinson & Rogers (1973) for factorial models. It allows the symbolic specification of analysis of variance and regression models, converting strings into computational schemas or design matrices. It also supports multi-variable response specification on the Left-Hand Side (LHS) using lists or ranges.

parseWilkinsonFormula accepts as its first input argument a Wilkinson notation string specified by formula either as a character vector or a string scalar with the following list of valid symbols:

LHS (Response) Operators:
- , : List separator for selecting multiple responses.
- - : Range operator for selecting multiple responses.
RHS (Model) Operators:
- + : Term addition (Union of terms).
- - : Term deletion (Difference of terms).
- * : Crossing (Expands to Main Effects + Interaction).
- / : Nesting (Hierarchical relationship).
- : : Direct interaction.
- ^ : Crossing expansion limit.

parseWilkinsonFormula (formula, mode) further specifies how to process the Wilkinson notation specified by formula. mode must be a character vector or a string scalar with any of the following acceptable values.

'expand' (default) : Returns a cell array of character vectors containing the expanded model terms (e.g., {"A", "B", "A:B"}).
'matrix' : Returns a schema structure containing a binary matrix defining term membership.
'model_matrix' : Constructs the full Design Matrix (X) and Response Matrix (y) based on the provided data. Uses corner-point (reference) coding for categoricals.
'parse' : Returns the raw Abstract Syntax Tree (AST).
'tokenize' : Returns the list of tokens generated by the lexer (useful only for debugging).

[X, y, names] = parseWilkinsonFormula (formula, "model_matrix", data) will also accept a structure or a table containing the data variables. Required only when mode is "model_matrix".

Field names must match variables in the formula.
Response variables (LHS) must be numeric.
Rows containing NaN are automatically removed.

Outputs

terms/result

The processed model structure depending on the selected mode.

X

The numeric design matrix (observations x parameters).

y

The response matrix (observations x K).

names

A cell array of column names corresponding to X.

References

Wilkinson, G. N. and Rogers, C. E. (1973). Symbolic Description of Factorial Models for Analysis of Variance. Applied Statistics, 22, 392-399.

Source Code: parseWilkinsonFormula

Example: 1


 ## Demo : Tokenizer Mode
 ## Inspects the raw tokens generated from a formula string.
 formula = "y ~ A * (B + c)";
 tokens = parseWilkinsonFormula (formula, "tokenize");
 display (tokens);

tokens =

  1x10 struct array containing the fields:

    type
    value
    pos

Example: 2


 ## Demo : Parser Mode (AST generation)
 ## Returns the Abstract Syntax Tree (AST) structure.
 formula = "A / B";
 tree = parseWilkinsonFormula (formula, "parse");
 display (tree);

tree =

  scalar structure containing the fields:

    type = OPERATOR
    value = /
    left =

      scalar structure containing the fields:

        type = IDENTIFIER
        value = A
        left = [](0x0)
        right = [](0x0)

    right =

      scalar structure containing the fields:

        type = IDENTIFIER
        value = B
        left = [](0x0)
        right = [](0x0)

Example: 3


 ## Demo : Expansion Mode (Crossings)
 ## Demonstrates standard Wilkinson expansion for interactions.
 formula = "A * B * C";
 terms = parseWilkinsonFormula (formula, "expand");
 disp (terms);

  1x7 cell array

    {1x1 cell}    {1x1 cell}    {1x2 cell}    {1x1 cell}    {1x2 cell}    {1x2 cell}    {1x3 cell}

Example: 4


 ## Demo : Expansion Mode (Nesting)
 ## Demonstrates hierarchical nesting logic.
 formula = "Block / Plot / Subplot";
 terms = parseWilkinsonFormula (formula, "expand");
 disp (terms);

  1x3 cell array

    {1x1 cell}    {1x2 cell}    {1x3 cell}

Example: 5


 ## Demo : Matrix Schema Mode
 ## Generates the binary terms matrix (Row = Term, Col = Variable).
 formula = "y ~ Age + Height + Age:Height";
 schema = parseWilkinsonFormula (formula, "matrix");
 disp (schema.VariableNames);
 disp (schema.Terms);

  1x3 cell array

    {'Age'}    {'Height'}    {'y'}    

   0   0   0
   0   1   0
   1   0   0
   1   1   0

Example: 6


 ## Demo : Model Matrix (Regression / Continuous)
 ## Builds the Design Matrix (X) and Response (y) for numeric data.
 d_reg.BP = [120; 122; 128; 130; 125];
 d_reg.Age = [25; 30; 35; 40; 32];
 d_reg.Weight = [70; 75; 80; 85; 78];
 [X, y, names] = parseWilkinsonFormula ("BP ~ Age * Weight", "model_matrix", d_reg);
 disp (names);
 disp (X);

  4x1 cell array

    {'(Intercept)'}    
    {'Weight'     }    
    {'Age'        }    
    {'Age:Weight' }    

      1     70     25   1750
      1     75     30   2250
      1     80     35   2800
      1     85     40   3400
      1     78     32   2496

Example: 7


 ## Demo : Model Matrix (ANOVA / Categorical)
 ## Automatically handles categorical variables (dummy coding).
 d_cat.Yield = [10; 12; 15; 14; 11; 13];
 d_cat.Variety = {"A"; "A"; "B"; "B"; "C"; "C"};
 [X, y, names] = parseWilkinsonFormula ("Yield ~ Variety", "model_matrix", d_cat);
 disp (names);
 disp (X);

  3x1 cell array

    {'(Intercept)'}    
    {'Variety_B'  }    
    {'Variety_C'  }    

   1   0   0
   1   0   0
   1   1   0
   1   1   0
   1   0   1
   1   0   1

Example: 8


 ## Demo : Model Matrix (Mixed Numeric & Categorical)
 ## Demonstrates Analysis of Covariance (ANCOVA) structures.
 d_mix.Growth = [1.2; 1.4; 1.1; 1.8];
 d_mix.Fertilizer = {"Old"; "Old"; "New"; "New"};
 d_mix.Dose = [10; 20; 10; 20];
 [X, ~, names] = parseWilkinsonFormula ("Growth ~ Fertilizer * Dose", "model_matrix", d_mix);
 disp (names);
 disp (X);

  4x1 cell array

    {'(Intercept)'        }    
    {'Fertilizer_Old'     }    
    {'Dose'               }    
    {'Dose:Fertilizer_Old'}    

    1    1   10   10
    1    1   20   20
    1    0   10    0
    1    0   20    0

Example: 9


 ## Demo : Multi-Response
 ## Selects specific response variables using comma.
 d_list = struct ();
 d_list.Yield_A = [10; 12; 11; 14];
 d_list.Yield_B = [20; 22; 21; 24];
 d_list.Rain    = [100; 110; 105; 120];
 formula = "Yield_A, Yield_B ~ Rain";
 [X, y, names] = parseWilkinsonFormula (formula, "model_matrix", d_list);
 disp (names);
 disp (y);
 disp (X);

  2x1 cell array

    {'(Intercept)'}    
    {'Rain'       }    

   10   20
   12   22
   11   21
   14   24
     1   100
     1   110
     1   105
     1   120

Example: 10


 ## Demo : Multi-Response
 ## Selects a contiguous range of variables using the hyphen.
 d_rng.Y_Jan = rand (4, 1);
 d_rng.Y_Feb = rand (4, 1);
 d_rng.Y_Mar = rand (4, 1);
 d_rng.Trt   = {"A"; "B"; "A"; "B"};
 formula = "Y_Jan - Y_Mar ~ Trt";
 [X, y, names] = parseWilkinsonFormula (formula, "model_matrix", d_rng);
 disp (names);
 disp (y);
 disp (X);

  2x1 cell array

    {'(Intercept)'}    
    {'Trt_B'      }    

   0.7084   0.7574   0.1893
   0.5557   0.8113   0.8787
   0.2801   0.8206   0.8471
   0.9481   0.3679   0.4640
   1   0
   1   1
   1   0
   1   1

Categories &

Functions List

Clustering

Clustering

Classification Classes

Classification Classes

Clustering Classes

Clustering Classes

Regression Classes

Regression Classes

Data Manipulation

Data Manipulation

Descriptive Statistics

Descriptive Statistics

Distribution Classes

Distribution Classes

Distribution Fitting

Distribution Fitting

Distribution Functions

Distribution Functions

Distribution Statistics

Distribution Statistics

Distribution Wrappers

Distribution Wrappers

Experimental Design

Experimental Design

Machine Learning

Machine Learning

Model Fitting

Model Fitting

Hypothesis Testing

Hypothesis Testing

I/O

I/O

Interpolation

Interpolation

Plotting

Plotting

Regression

Regression

Transforms

Transforms

Function Reference: parseWilkinsonFormula

statistics: terms = parseWilkinsonFormula (formula)

statistics: result = parseWilkinsonFormula (formula, mode)

statistics: [X, y, names] = parseWilkinsonFormula (formula, "model_matrix", data)