8 Regression using Matrices
Everything we’ve done so far can be written in matrix form. Though it might seem no more efficient to use matrices with simple linear regression, it will become clear that with multiple linear regression, matrices can be very powerful. ALSM chapter 5 contains a lot of matrix theory; the main take away points from the chapter have to do with the matrix theory applied to the regression setting. Please make sure that you read the chapters / examples having to do with the regression examples.
8.1 Special Matrices
1 = \(\begin{pmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{pmatrix}\) |
I = \(\begin{pmatrix} 1 & 0 & \cdots & 0\\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{pmatrix}\) |
J = \(\begin{pmatrix} 1 & 1 & \cdots & 1\\ 1 & 1 & \cdots & 1 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \cdots & 1 \end{pmatrix}\) |
0 = \(\begin{pmatrix} 0 & 0 & \cdots & 0\\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \end{pmatrix}\) |
8.2 Regression Model
8.2.1 Matrix Addition
\[\begin{eqnarray*} Y_i &=& E[Y_i] + \epsilon_i\\ &&\\ \begin{pmatrix} Y_1\\ Y_2 \\ \vdots \\ Y_n \end{pmatrix} &=& \begin{pmatrix} E[Y_1]\\ E[Y_2] \\ \vdots \\ E[Y_n] \end{pmatrix} + \begin{pmatrix} \epsilon_1\\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{pmatrix}\\ &&\\ \underline{Y} &=& E[\underline{Y}] + \underline{\epsilon}\\ \end{eqnarray*}\] Similarly, \[\begin{eqnarray*} Y_i &=& \hat{Y}_i + e_i\\ &&\\ \begin{pmatrix} Y_1\\ Y_2 \\ \vdots \\ Y_n \end{pmatrix} &=& \begin{pmatrix} \hat{Y}_1\\ \hat{Y}_2 \\ \vdots \\ \hat{Y}_n \end{pmatrix} + \begin{pmatrix} e_1\\ e_2 \\ \vdots \\ e_n \end{pmatrix}\\ &&\\ \underline{Y} &=& \underline{\hat{Y}} + \underline{e}\\ \end{eqnarray*}\]
8.3 Matrix Multiplication
8.3.1 Example
Consider multiplying an \(r \times c\) matrix with a \(c \times s\) matrix. The interior dimensions must always be the same. The resulting matrix will always be \(r \times s\). The element in the \(i^{th}\) row and \(j^{th}\) column is given by: \[\begin{eqnarray*} \sum_{k=1}^c a_{ik} b_{kj}\\ AB &=& \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \end{pmatrix} \begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \\ b_{31} & b_{32} \end{pmatrix}\\ &=& \begin{pmatrix} a_{11} b_{11} + a_{12} b_{21} + a_{13}b_{31} & a_{11} b_{12} + a_{12} b_{22} + a_{13}b_{32} \\ a_{21} b_{11} + a_{22} b_{21} + a_{23}b_{31} & a_{21} b_{12} + a_{22} b_{22} + a_{23}b_{32} \end{pmatrix}\\ &&\\ AB &=& \begin{pmatrix} 3 & -1 & 0 \\ 0 & 1 & 1 \\ -2 & 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 1\\ -1 & 2 \\ 0 & -1 \end{pmatrix}\\ &=& \begin{pmatrix} 4 & 1\\ -1 & 1 \\ -2 & -3 \end{pmatrix} \end{eqnarray*}\]
and in the linear regression context…
\[\begin{eqnarray*} E[Y_i] &=& \beta_0 + \beta_1 X_i\\ &&\\ \begin{pmatrix} E[Y_1]\\ E[Y_2] \\ \vdots \\ E[Y_n] \end{pmatrix} &=& \begin{pmatrix} 1 & X_1\\ 1 & X_2 \\ \vdots \\ 1 & X_n \end{pmatrix} \begin{pmatrix} \beta_0 \\ \beta_1 \end{pmatrix}\\ &&\\ E[\underline{Y}] &=& X \underline{\beta}\\ \end{eqnarray*}\]
\[\begin{eqnarray*} \underline{Y}^t \underline{Y} &=& \begin{pmatrix} Y_1 & Y_2 & \cdots & Y_n \end{pmatrix} \begin{pmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{pmatrix} = \sum_{i=1}^n Y_i^2\\ &&\\ X^t X &=& \begin{pmatrix} 1 & 1 & \cdots & 1\\ X_1 & X_2 & \cdots & X_n \end{pmatrix} \begin{pmatrix} 1 & X_1 \\ 1 & X_2 \\ \vdots & \vdots \\ 1 & X_n \end{pmatrix}= \begin{pmatrix} n & \sum_{i=1}^n X_i \\ \sum_{i=1}^n X_i & \sum_{i=1}^n X_i^2 \end{pmatrix}\\ &&\\ X^t \underline{Y} &=& \begin{pmatrix} 1 & 1 & \cdots & 1\\ X_1 & X_2 & \cdots & X_n \end{pmatrix} \begin{pmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{pmatrix} = \begin{pmatrix} \sum_{i=1}^n Y_i \\ \sum_{i=1}^n X_i Y_i \end{pmatrix} \end{eqnarray*}\]
8.4 Matrix Inverses in the Regression Context
8.4.1 Matrix Inverses
An \(n \times n\) matrix \(A\) is called invertible if there exists an \(n \times n\) matrix \(B\) such that \[\begin{eqnarray*} A B = B A = I_n \end{eqnarray*}\] \(B\) is called the inverse of \(A\) and is typically denoted by \(B = A^{-1}\). (Note, inverses only exist for square matrices with non-zero determinants.)
8.4.2 Example
\[A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\]
\[ A^{-1} = \begin{pmatrix} d / (ad - bc) & -b / (ad - bc) \\ -c / (ad - bc) & a / (ad - bc) \end{pmatrix}\]
where the determinant is given by \(D = ad - bc\). \[\begin{eqnarray*} A^{-1} A &=& \begin{pmatrix} d / (ad - bc) & -b / (ad - bc) \\ -c / (ad - bc) & a / (ad - bc) \end{pmatrix} \begin{pmatrix} a & b \\ c & d \end{pmatrix}\\ &=& \begin{pmatrix} (ad - bc) / (ad - bc) & (bd - bd) / (ad - bc) \\ (-ac + ac) / (ad - bc) & (ad - bc) / (ad - bc) \end{pmatrix}\\ &=& I_2 \end{eqnarray*}\]
\[A = \begin{pmatrix} 3 & 2 \\ 1 & 6 \end{pmatrix}\]
\[ A^{-1} = \begin{pmatrix} 6 / 16 & -2 / 16 \\ -1 / 16 & 3 / 16 \end{pmatrix} \]
\[\begin{eqnarray*} A^{-1} A &=& \begin{pmatrix} 6 / 16 & -2 / 16 \\ -1 / 16 & 3 / 16 \end{pmatrix} \begin{pmatrix} 3 & 2 \\ 1 & 6 \end{pmatrix}\\ &=& \begin{pmatrix} (18-2)/16 & (12-12)/16 \\ (-3+3)/16 & (-2+18)/16 \end{pmatrix}\\ &=& I_2 \end{eqnarray*}\]
8.4.3 Variance of Coefficients
What is a Variance Matrix?
Consider two random variables, \(U\) and \(W\). We’d typically have \(n\) observations of each variable, resulting in data vectors \(\underline{U}\) and \(\underline{W}\). The variance-covariance matrix describing the variability relationship between \(U\) and \(W\) is given by: \[\begin{eqnarray*} \mbox{var}\{\underline{U},\underline{W} \} &=& \begin{pmatrix} \sigma_U^2 & \sigma_{U,W} \\ \sigma_{W,U} & \sigma_W^2 \end{pmatrix}\\ \mbox{where } \sigma_U^2 &=& \mbox{ variance of } U \mbox{ (a number)}\\ \sigma_W^2 &=& \mbox{ variance of } V \mbox{ (a number)}\\ \sigma_{U,W} &=& \sigma_{W,U} = \mbox{ covariance of } U \mbox{ and } W \mbox{ (a number)} \\ \end{eqnarray*}\] In order to find the variance matrix for the regression coefficients, we use matrix notation. \[\begin{eqnarray*} \mbox{Recall: } X^t X &=& \begin{pmatrix} n & \sum_{i=1}^n X_i \\ \sum_{i=1}^n X_i & \sum_{i=1}^n X_i^2 \end{pmatrix}\\ &&\\ \mbox{So, } D &=& n \sum X_i^2 - (\sum X_i)^2 = n \sum(X_i - \overline{X})^2\\ &&\\ (X^t X)^{-1} &=& \begin{pmatrix} \frac{\sum X_i^2}{n \sum(X_i - \overline{X})^2} & \frac{-\sum_{i=1}^n X_i}{n \sum(X_i - \overline{X})^2} \\ \frac{-\sum_{i=1}^n X_i}{n \sum(X_i - \overline{X})^2} & \frac{n}{n \sum(X_i - \overline{X})^2} \end{pmatrix}\\ &&\\ &=& \begin{pmatrix} \frac{1}{n} + \frac{\overline{X}^2}{\sum(X_i - \overline{X})^2} & \frac{-\overline{X}}{\sum(X_i - \overline{X})^2} \\ \frac{-\overline{X}}{ \sum(X_i - \overline{X})^2} & \frac{1}{ \sum(X_i - \overline{X})^2} \end{pmatrix}\\ &&\\ \mbox{var}\{\underline{b}\} &=& \sigma^2 \cdot (X^t X)^{-1}\\ &=& \begin{pmatrix} \sigma^2_{b_0} & \sigma_{b_0, b_1} \\ \sigma_{b_1, b_0} & \sigma^2_{b_1} \end{pmatrix}\\ SE^2\{\underline{b}\} &=& MSE \cdot (X^t X)^{-1}\\ \end{eqnarray*}\]
Estimating Coefficients
Recall the equations that come from differentiating the sum of squared residuals with respect to both \(\beta_0\) and \(\beta_1\):
\[\begin{eqnarray*} n b_0 + b_1 \sum X_i &=& \sum Y_i\\ b_0 \sum X_i + b_1 \sum X_i^2 &=& \sum X_i Y_i\\ &&\\ \begin{pmatrix} n & \sum X_i \\ \sum X_i & \sum X_i^2 \end{pmatrix} \begin{pmatrix} b_0 \\ b_1 \end{pmatrix} &=& \begin{pmatrix} \sum Y_i \\ \sum X_i Y_i \end{pmatrix}\\ &&\\ (X^t X) \underline{b} &=& X^t \underline{Y}\\ \underline{b} &=& (X^t X)^{-1} (X^t \underline{Y})\\ &&\\ \mbox{var}\{\underline{b} \} &=& (X^t X)^{-1} X^t \sigma^2 I X (X^t X)^{-1}\\ &=& \sigma^2 \cdot (X^t X)^{-1}\\ &&\\ \mbox{checking:}&&\\ (X^t X)^{-1} (X^t \underline{Y}) &=& \begin{pmatrix} \frac{1}{n} + \frac{\overline{X}^2}{\sum(X_i - \overline{X})^2} & \frac{-\overline{X}}{\sum(X_i - \overline{X})^2} \\ \frac{-\overline{X}}{ \sum(X_i - \overline{X})^2} & \frac{1}{ \sum(X_i - \overline{X})^2} \end{pmatrix} \begin{pmatrix} \sum_{i=1}^n Y_i \\ \sum_{i=1}^n X_i Y_i \end{pmatrix}\\ &&\\ &=& \begin{pmatrix} \frac{\sum Y_i}{n} + \frac{\sum Y_i \overline{X}^2}{\sum(X_i - \overline{X})^2} + \frac{-\sum X_i Y_i (\overline{X})}{\sum(X_i - \overline{X})^2} \\ \frac{-\sum Y_i (\overline{X})}{ \sum(X_i - \overline{X})^2} + \frac{\sum X_i Y_i}{ \sum(X_i - \overline{X})^2} \end{pmatrix}\\ &&\\ &=& \begin{pmatrix} \overline{Y} - b_1 \overline{X} \\ \frac{\sum Y_i (X_i - \overline{X})}{\sum(X_i - \overline{X})^2} \end{pmatrix} = \begin{pmatrix} b_0 \\ b_1 \end{pmatrix} \end{eqnarray*}\]
8.5 Fitted Values
\[\begin{eqnarray*} \hat{Y}_i &=& b_0 + b_1 X_i\\ &&\\ \begin{pmatrix} \hat{Y}_1 \\ \hat{Y}_2 \\ \vdots \\ \hat{Y}_n \end{pmatrix} &=& \begin{pmatrix} 1 & X_1 \\ 1 & X_2 \\ \vdots & \vdots \\ 1 & X_n \end{pmatrix} \begin{pmatrix} b_0 \\ b_1 \end{pmatrix}\\ &&\\ \underline{\hat{Y}} &=& X \underline{b}\\ &=& X (X^t X)^{-1} (X^t \underline{Y})\\ &=& H \underline{Y}\\ \mbox{"hat" matrix: } H &=& X (X^t X)^{-1} X^t \end{eqnarray*}\] We call \(H\) the hat matrix because it takes \(\underline{Y}\) and puts a hat on it. Note that the predicted values are simply a linear combinations of the response variable (\(Y\)) with coefficients of the explanatory variables (\(X\)).
8.6 Residuals
\[\begin{eqnarray*} e_i &=& Y_i - \hat{Y}_i\\ \begin{pmatrix} e_1 \\ e_2 \\ \vdots \\ e_n \end{pmatrix} &=& \begin{pmatrix} Y_1 \\ Y_2 \\ \vdots \\Y_n \end{pmatrix} - \begin{pmatrix} \hat{Y}_1 \\ \hat{Y}_2 \\ \vdots \\ \hat{Y}_n \end{pmatrix}\\ \underline{e} &=& \underline{Y} - \hat{\underline{Y}}\\ &=& \underline{Y} - X \underline{b}\\ &=& \underline{Y} - H \underline{Y}\\ &=& (I - H) \underline{Y}\\ &&\\ \mbox{var}\{ \underline{e} \} &=& \mbox{var}\{ (I-H) \underline{Y} \}\\ &=& (I - H) \mbox{var}\{ \underline{Y} \}\\ &=& (I - H) \cdot \sigma^2 \cdot (I - H)^t\\ &=& \sigma^2 \cdot (I - H) (I-H^t)\\ &=& \sigma^2 \cdot (I - H - H^t + HH^t)\\ &=& \sigma^2 \cdot (I-H)\\ SE^2(\underline{e}) &=& MSE \cdot (I-H) \end{eqnarray*}\]
8.7 ANalysis Of VAriance
\[\begin{eqnarray*} SSTO &=& \underline{Y}^t \underline{Y} - \bigg(\frac{1}{n} \bigg) \underline{Y}^t J \underline{Y}\\ &=& \sum Y_i ^2 - \frac{1}{n} \begin{pmatrix} Y_1 & Y_2 & \cdots & Y_n \end{pmatrix} \begin{pmatrix} 1 & 1 & \cdots & 1\\ 1 & 1 & \cdots & 1 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \cdots & 1 \end{pmatrix} \begin{pmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{pmatrix} \\ &=& \sum Y_i ^2 - \frac{1}{n} \begin{pmatrix} Y_1 & Y_2 & \cdots & Y_n \end{pmatrix} \begin{pmatrix} \sum Y_i \\ \sum Y_i \\ \vdots \\ \sum Y_i \end{pmatrix} \\ &=& \sum Y_i ^2 - \frac{1}{n} \sum Y_i \sum Y_i\\ &=& \sum Y_i ^2 - n \overline{Y}^2\\ \mbox{note: } \sum(Y_i - \overline{Y})^2 &=& \sum (Y_i^2 - 2Y_i \overline{Y} + \overline{Y}^2)\\ &=& \sum Y_i^2 -2\overline{Y}\sum Y_i + n \overline{Y}^2\\ &=& \sum Y_i^2 - n \overline{Y}^2\\ &&\\ SSE &=& \underline{Y}^t \underline{Y} - \underline{b}^t X^t \underline{Y}\\ &&\\ SSR &=& \underline{b}^t X^t \underline{Y} - \bigg(\frac{1}{n} \bigg) \underline{Y}^t J \underline{Y} \end{eqnarray*}\]
8.8 Prediction of New Observations
\(\underline{X}_h = \begin{pmatrix} 1\\ X_h \end{pmatrix}\) | \(\underline{X}_h^t = \begin{pmatrix} 1 & X_h \end{pmatrix}\) |
---|
\[\begin{eqnarray*} \hat{Y}_h &=& \underline{X}^t_h \underline{b}\\ \mbox{var}\{ \hat{Y}_h \} &=& \underline{X}_h^t \mbox{var}\{\underline{b} \} \underline{X}_h\\ &=& \sigma^2 \cdot \underline{X}_h^t (X^t X)^{-1} \underline{X}_h\\ SE^2\{\hat{Y}_h \} &=& MSE \cdot \underline{X}_h^t (X^t X)^{-1} \underline{X}_h\\ &&\\ &&\\ SE^2 \{\mbox{pred} \} = SE^2\{\hat{Y}_{h(new)} \} &=& MSE \cdot (1 + \underline{X}_h^t (X^t X)^{-1} \underline{X}_h)\\ \end{eqnarray*}\]
8.9 Reflection Questions
- How is the normal errors linear model written in matrix form?
- Why does the X matrix have an added column of ones?
- Using matrix notation, how are the normal equations used to solve for the least squares estimates of \(\beta_0\) and \(\beta_1\)?
- What is a covariance term? What does it mean for variables to be correlated (e.g., \(b_0\) and \(b_1\))?
- Why is the “hat” matrix (\(H\)) so named?
8.10 Ethics Considerations
- Why are the sampling distributions of \(b_0\) and \(b_1\) correlated?
- In what ways does writing the linear model in matrix notation make the extension to more explanatory variables easier?
- Are there any fundamental differences to the SLR model written in matrix notation (as opposed to indexing the elements for each equation)?
8.11 R: Matrices
8.11.1 Addition
Adding matrices gives you what you’d likely expect.
matrix1 <- matrix(c(1:12),ncol=4, byrow=T)
matrix2 <- matrix(seq(2,24,by=2),ncol=4, byrow=T)
matrix1
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
matrix2
## [,1] [,2] [,3] [,4]
## [1,] 2 4 6 8
## [2,] 10 12 14 16
## [3,] 18 20 22 24
matrix1 + matrix2
## [,1] [,2] [,3] [,4]
## [1,] 3 6 9 12
## [2,] 15 18 21 24
## [3,] 27 30 33 36
8.11.2 Multiplication
The *
produces element by element multiplication.
matrix1 * matrix2
## [,1] [,2] [,3] [,4]
## [1,] 2 8 18 32
## [2,] 50 72 98 128
## [3,] 162 200 242 288
The %*%
does matrix multiplication. Note the error here because you can’t multiply a \(3 \times 4\) by a \(3 \times4\) matrix.
matrix1 %*% matrix2
## Error in matrix1 %*% matrix2: non-conformable arguments
If either of the matrices are transposed, they will be conformable, and we will be able to multiply them.
## [,1] [,2] [,3]
## [1,] 60 140 220
## [2,] 140 348 556
## [3,] 220 556 892
## [,1] [,2] [,3] [,4]
## [1,] 214 244 274 304
## [2,] 244 280 316 352
## [3,] 274 316 358 400
## [4,] 304 352 400 448
Note that the products are only symmetric because matrix1
= \(2 \cdot\) matrix2
.
8.11.3 Taking Inverses
The function for inverting matrices in R is solve()
. Remember that solve()
can only works on square matrices with non-zero determinants.
## [,1] [,2] [,3]
## [1,] 5 7 1
## [2,] 4 3 6
## [3,] 2 0 8
solve(matrix3)
## [,1] [,2] [,3]
## [1,] -0.923 2.154 -1.5
## [2,] 0.769 -1.462 1.0
## [3,] 0.231 -0.538 0.5
Multiplying a matrix by its inverse results in the identity matrix.
## [,1] [,2] [,3]
## [1,] 1.00e+00 -2.22e-16 1.11e-16
## [2,] 4.44e-16 1.00e+00 0.00e+00
## [3,] 4.44e-16 0.00e+00 1.00e+00
8.11.4 Concatenating 1s for the intercept
Let’s say we have an explanatory variable called xvar
## [1] 47.3 43.1 47.1 48.0 49.7 45.2 47.7 47.1 47.9 48.9 43.4 44.1
rep(1,12) # repeats the number 1, 12 times
## [1] 1 1 1 1 1 1 1 1 1 1 1 1
## int xvar
## [1,] 1 47.3
## [2,] 1 43.1
## [3,] 1 47.1
## [4,] 1 48.0
## [5,] 1 49.7
## [6,] 1 45.2
## [7,] 1 47.7
## [8,] 1 47.1
## [9,] 1 47.9
## [10,] 1 48.9
## [11,] 1 43.4
## [12,] 1 44.1