Wize University Statistics Textbook > Inference for Linear Regression
Simple Linear Regression Analysis
Popular Courses
COMM 214
Concordia University
AP Statistics Exam Prep Course
AP Exam Prep
Statistics
General Course
Intro to Statistics
University Study Guides
COMM 215
Concordia University
COMM 191
University of British Columbia
STA 100
University of California - Davis
STATS 2244
Western University
Intro to Statistics
University Study Guides
STATS 2035
Western University
QMS 210
Toronto Metropolitan University
STAT 263
Queen's University
STAT 251
University of British Columbia
MGCR 271
McGill University
STATS 2B03
McMaster University
STAT 217
University of Calgary
COMM 162
Queen's University
MGTSC 212
University of Alberta
STAT 1060
Dalhousie University
ECON 227
McGill University

0:00 / 0:00
Estimating the Coefficients of the Linear Regression Model

As you know, the simple linear regression equation is:
We use the statistics from our sample to infer about the parameter in the population.
The least-squares regression line is an estimate of the true population regression line, which is represented by this formal model:
is the unknown dependent variable.
- All are independent of one another.
- is assumed to be normally distributed with mean and standard deviation is constant, regardless of what is.
What is
The notion , the residual or error, is the deviation of the actual values of and from their means .
- The error term includes everything that separates your model from actual reality. This includes:
- Other explanatory variables that are not included in the model.
- Poor fit (e.g. a linear model doesn't fit a quadratic relationship)
- Unpredictable effects
- Random error
- We assume that normally distributed with mean 0 and standard deviation
The regression line shows how Y changes with X:
is the known independent variable
is the true intercept of the population regression line
is the true slope of the population regression line
Example
Unlike the other variables above (i.e. ), which are all constant variables, a random variable.
- The average values of all the

0:00 / 0:00
Measures of Variation in Regression
The coefficient of determination (R Squared) measures how close the data are to the regression model or how much of the variation in the response variable could be explained by the explanatory variable .
Example
length of a movie (minutes)
time it takes to edit a movie (days)
If : "About 63% of the variation in the time it takes to edit a movie can be explained by the length of a movie ."
The variation of can be broken down by three measures:
- SSR: Sum of Squares (Regression)
- SSE: Sum of Squares (Error)
- SST: Sum of Squares (Total)
Watch Out!
This part can be confusing for some students, but it is very important and useful!
SSR: Sum of Squares (Regression)
SSR is the quantified measure of the variation that is attributed to the relationship between and .
- In other words, SSR measures the explained variability in the regression model.
- The explained variation of y is the vertical distance between the predicted value and the sample mean .
- Therefore, the explained variation for each is:
- Therefore, the explained variation for each is:
This is known as the regression.
Wize Concept
Some textbooks use (Sum of Squares Model) instead of (Sum of Squares Regression). They both refer to how much the Regression Model can explain so they are the same thing.
SSE: Sum of Squares (Error)
SSE is the quantified measure of the variation that is not attributed to the relationship between and . It may be due to:
- Other explanatory variables that are not included in the model.
- Random error
- In other words, SSE measures the unexplained variability in the regression model.
- The unexplained variation of y is the vertical distance between the actual value and the predicted value .
- Therefore, the unexplained variation for each is:
This is known as the error.
Watch Out!
SSR does not mean "Sum of Squares Residuals" (that is incorrect and is not a real term)! SSR is the Sum of Squares Regression.
SST: Sum of Squares (Total)
SST is the quantified measure of the variation that is attributed to the relationship between and plus what is not attributed to that relationship.
- In other words, SST measures the explained variability in the regression model PLUS the unexplained variability in the regression model.
- The total variation of y is the sum of the squares of the differences between all actual values and the sample mean :
- Then, for each , we get:
Thus:
Notice that the at the right side of the equation cancels each other out. What is left is .
Square and sum them all, and we get:
[Total variation of y] = [Total variation of y] + [Total unexplained variation of y]
This can be rewritten as:
[Sum of Squares (Total)] = [Sum of Squares (Regression)] + [Sum of Squares (Error)]
or
Example

Practice: Measures of Variation in Regression
Find .