Multiple linear regression in Mathematica

Continuing my slow journey through The Art of Computer Systems Performance Analysis, I've reached chapter 15, "Other Regression Models". I'll cover the first part of the chapter on multiple linear regression models. The definition of multiple linear regression from the textbook:

A multiple linear regression model allows one to predict a response variable yy as a function of kk predictor variables x1,x2,,xkx_1,x_2,\ldots,x_k using a linear model of the following form:

y=b0+b1x1+b2x2++bkxk+e y = b_0 +b_1 x_1 + b_2 x_2 + \cdots + b_k x_k + e

Given a sample (x11,x21,,xk1,y1),,(x1n,x2n,,xkn,yn)(x_{11}, x_{21}, \ldots, x_{k1}, y_1),\ldots,(x_{1n},x_{2n},\ldots,x_{kn},y_n) of nn observations, the model consists of the following nn equations:

y1=b0+b1x11+b2x21++bkxk1+e1y2=b0+b1x12+b2x22++bkxk2+e2yn=b0+b1x1n+bnx2n++bkxkn+en \begin{aligned} y_1 &= b_0 +b_1 x_{11} + b_2 x_{21} + \cdots + b_k x_{k1} + e_1 \\ y_2 &= b_0 +b_1 x_{12} + b_2 x_{22} + \cdots + b_k x_{k2} + e_2 \\ &\vdots \\ y_n &= b_0 +b_1 x_{1n} + b_n x_{2n} + \cdots + b_k x_{kn} + e_n \\ \end{aligned}

In vector notation, we have:

[y1y2yn]=[1x11x21xk11x12x22xk21x1nx2nxkn][b0b1bk]+[e1e2en] \begin{bmatrix} y_1 \\ y_2 \\ \cdot \\ \cdot \\ \cdot \\ y_n \\ \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & x_{21} & \cdots & x_{k1} \\ 1 & x_{12} & x_{22} & \cdots & x_{k2} \\ \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot \\ 1 & x_{1n} & x_{2n} & \cdots & x_{kn} \\ \end{bmatrix} \begin{bmatrix} b_0 \\ b_1 \\ \cdot \\ \cdot \\ \cdot \\ b_k \\ \end{bmatrix} + \begin{bmatrix} e_1 \\ e_2 \\ \cdot \\ \cdot \\ \cdot \\ e_n \\ \end{bmatrix}

or

y=Xb+e \mathbf{y = Xb + e}

Box 15.1 contains the formulas relevant to multiple linear regression:

  1. Model: yi=b0+b1x1i+b2x2i++bkxki+eiy_i = b_0 +b_1x_{1i} + b_2 x_{2i} + \cdots + b_k x_{ki} + e_i or in matrix notation y=Xb+e\mathbf{y = Xb + e} where:

    • bb is a column vector with k+1k+1 elements b={b0,b1,,bk}b = \{b_0,b_1,\ldots,b_k\}
    • yy is a column vector of nn observed values of y={y1,y2,,yn}y = \{y_1,y_2,\ldots,y_n\}
    • XX is an nn row by k+1k+1 column matrix whose (i,j+1)(i,j+1)th element Xi,j+1=1X_{i,j+1}=1 if j=0j=0 else xijx_{ij}
  2. Parameter estimation: b=(XTX)1(XTy)\mathbf{b = (X^TX)^{-1}(X^Ty)}

  3. Allocation of variation:

    • SSY=i=1nyi2SSY = \sum_{i=1}^n y^2_i
    • SS0=nyˉ2SS0 = n \bar y^2
    • SST=SSTSS0SST = SST - SS0
    • SSE={yTybTXTy}SSE = \{\mathbf{y^Ty - b^TX^Ty}\}
    • SSR=SSTSSESSR = SST - SSE
  4. Coefficient of determination: R2=SSRSST=SSTSSESSTR^2 = \frac{SSR}{SST} = \frac{SST - SSE}{SST}

  5. Coefficient of multiple correlation R=SSRSSTR = \sqrt{\frac{SSR}{SST}}

  6. Degrees of freedom:

    SST=SSYSS0=SSR+  SSEn1=n  1=  k+  (nk1) \begin{aligned} SST &= SSY &- &SS0 &= &SSR &+ &\;SSE \\ n-1 &= n &- &\; 1 &= &\; k &+ &\;(n - k - 1) \\ \end{aligned}

  7. Analysis of variance (ANOVA): MSR=SSRkMSR=\frac{SSR}{k}; MSE=SSEnk1MSE=\frac{SSE}{n-k-1}

    Regression is significant if MSR/MSEMSR/MSE is greater than F[1α;k;nk1]F_{[1-\alpha;k;n-k-1]}.

  8. Standard deviation of errors: se=MSEs_e = \sqrt{MSE}.

  9. Standard error of regression parameters: sbj=seCjjs_{b_j} = s_e \sqrt{C_{jj}} where CjjC_{jj} is the jthj^{th} diagonal term of C=(XTX)1\mathbf{C = (X^TX)^{-1}}

  10. Prediction: Mean of mm future observations:

    y^p=b0+b1x1+b2x2p++bkxkp\hat{y}_p = b_0 + b_1 x_1 +b_2 x_{2p} + \cdots + b_k x_{kp}

    Or in vector notation: y^p=xpTb\hat{y}_p = \mathbf{x}^T_p\mathbf{b}; here xpT=(1,x1p,x2p,,xkp)\mathbf{x}^T_p = (1,x_{1p},x_{2p},\ldots,x_{kp})

  11. Standard deviation of predictions: sy^p=se1m+xpT(XTX)1xps_{\hat y_p} = s_e \sqrt{ \frac{1}{m} + \mathbf{x}^T_p(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_p}

  12. All confidence intervals are computed using t[1α/2;nk1]t{[1-\alpha/2;n-k-1]}.

  13. Correlation among predictions:

    Rx1,x2=x1ix2inx1ˉx2ˉ[x1i2nxˉ12]1/2[x2i2nxˉ22]1/2 R_{x_1,x_2} = \frac{\sum x_{1i} x_{2i} - n \bar{x_1} \bar{x_2} } {[ \sum x^2_{1i} - n \bar{x}^2_1 ]^{1/2} [\sum x^2_{2i} - n \bar{x}^2_2]^{1/2}}

  14. Model assumptions

    • Errors are independent and identically distributed normal variations with zero mean.
    • Errors have the same variance for all values of the predictors.
    • Errors are additive.
    • xix_i's and yy are linearly related.
    • xix_i's are nonstochastic (i.e not random) and are measured without error.
  15. Visual tests:

    • Scatter plot of errors versus predicted response should not have any trend.
    • The normal quantile-quantile plot of errors should be linear.

Mathematica commands

Define our data. Each tuple in the list has form (x1,x2,y)(x_1, x_2, y):

data = {
  {14, 70, 2},
  {16, 75, 5},
  {27, 144, 7},
  {42, 190, 9},
  {39, 210, 10},
  {50, 235, 13},
  {83, 400, 20}
}

Create a linear model:

lm = LinearModelFit[data, {x1, x2}, {x1, x2}]
  • ANOVA table:

     lm["ANOVATable"]
    
  • Mean of response variable, yˉ\bar y:

     N[Mean[lm["Response"]]] == 9.42857
    
  • SSR, sum of squares due to regression:

     lm["SequentialSumOfSquares"] == {199.845, 0.568796}
    
  • Coefficient of determination R2R^2:

     lm["RSquard"] == 0.974236
    
  • The predicted response variables for each sample, y^i\hat y_i:

     lm["PredictedResponse"] == {3.3, 3.7, 6.8, 9.8, 10.0, 11.9, 20.2}
    
  • The residual (error) for each data point in the sample, eie_i:

     lm["FitResiduals"] == {-1.3, 1.2, 0.15, -0.84, -0.015, 1.02, -0.25}
    
  • Regression parameters {b0,b1,,bk}\{b_0, b_1, \ldots, b_k\}:

     lm["BestFitParameters"] == {-0.161, 0.118, 0.0265}
    
  • The standard error of the regression parameters:

     lm["ParameterErrors"] == {0.91, 0.19, 0.040}
    
  • b0tsb0b_0 \mp ts_{b_0} and b1tsb1b_1 \mp ts_{b_1}, the confidence intervals for regression parameters:

    (* Default confidence interval is 0.95. *)
    lm = LinearModelFit[data, {x1, x2}, {x1, x2}, ConfidenceLevel -> 0.95]
    lm["ParameterConfidenceIntervals"]
    (* {{-2.69759, 2.37469}, {-0.416515, 0.652989}, {-0.0858019, 0.138805}} *)
    
  • Prediction for observations at 90% confidence level for both many observations and a single observation.

    lm = LinearModelFit[data, {x1, x2}, {x1, x2}, ConfidenceLevel -> 0.9]
    (* for many observations *)
    lm["MeanPredictionBands"] /. {x1 -> 100, x2 -> 550}
    (* {19.5571, 32.919} *)
    
    (* for a single observation *)
    lm["SinglePredictionBands"] /. {x1 -> 100, x2 -> 550}
    (* {19.1207, 33.3554} *)
    
  • Mean square values for all regression parameters and the error:

    lm["ANOVATableMeanSquares"] == {199.845, 0.568796, 1.325}
    
  • Plot the mean regression response with the confidence intervals:

    planes = Append[lm["MeanPredictionBands"], lm["BestFit"]];
    Plot3D[planes, {x1, 1, 20}, {x2, 1, 1000}]
    
    Mathematica plot of multiple linear regression prediction with confidence bands
    Figure: Regression prediction graph with 95% confidence interval graphs.
  • The correlation matrix, showing RxixjR_{x_ix_j} for every pair of (i,j)<k(i,j) < k

    lm["CorrelationMatrix"] // MatrixForm