0:00 / 0:00

Estimating the Coefficients of the Linear Regression Model


As you know, the simple linear regression equation is:

y^=bo+b1x\displaystyle\boxed{\hat{y}=b_o+b_1x}
We use the statistics from our sample to infer about the parameter in the population.

PAGE BREAK

The least-squares regression line is an estimate of the true population regression line, which is represented by this formal model:

E(Y)=βo+β1X+ε\displaystyle\boxed{E\left(Y\right)=\beta_o+\beta_1X+\varepsilon}

YY is the unknown dependent variable.
  • All YisY_i^{'}s are independent of one another.
  • YY is assumed to be normally distributed with mean E(Y)=βo+β1XiE\left(Y\right)=\beta_o+\beta_1X_i and standard deviation σY\sigma_Y is constant, regardless of what XX is.
What is ε?\colorThree{\varepsilon?}

The notion ε\varepsilon, the residual or error, is the deviation of the actual values of YY and from their means E(Y)E(Y).
  • The error term includes everything that separates your model from actual reality. This includes:
  • Other explanatory variables that are not included in the model.
  • Poor fit (e.g. a linear model doesn't fit a quadratic relationship)
  • Unpredictable effects
  • Random error
  • We assume that ε\varepsilon normally distributed with mean 0 and standard deviation σε\sigma_{\varepsilon}

PAGE BREAK
The regression line shows how Y changes with X:
XX is the known independent variable
βo\beta_o is the true intercept of the population regression line
β1\beta_1 is the true slope of the population regression line

Example
Unlike the other variables above (i.e. βo+β1Xi\beta_o+\beta_1X_i), which are all constant variables, ε\varepsilon a random variable.
  • The average values of all the εis=0\varepsilon_i^{'}s=0