0:00 / 0:00

Solving for the Regression Line (r, Sy, Sx Method)



Given a bunch of data coordinates (X,Y), we can generate a scatterplot. Then, we determine the best-fit line to represent the relationship between two quantitative variables: the explanatory variable XX and the response variable YY.

y^=bo+b1x\boxed{\hat{y}=b_o+b_1x}

The slope b1\colorFour{b_1}, tells us how much yy changes for every one unit increase in xx.

PAGE BREAK

Let’s prove this using calculus:

dy^dx=b1\displaystyle\boxed{\frac{d\hat{y}}{dx}=b_1}

“A unit increase in xx will change yy by b1b_1.”


Wize Concept
The slope b1b_1 and correlation coefficient rr always have the same sign!


PAGE BREAK
The intercept bo\colorFour{b_o}, tells us the value of yy when x=0x=0. It is the point where the line crosses the y-axis.



0:00 / 0:00

Example: Solving for the Regression Line (r, Sy, Sx Method)


We want to see if there is a relationship between the number of hours a student studies the day before the exam and the exam grade. We randomly sample 34 students:



The explanatory variable (X) is:
Study (hours)

The response variable (Y) is:
Grade

PAGE BREAK

Scatterplot


We see a positive correlation. In fact, r=0.54r=0.54.

This means that there is a weak, positive correlation between hours studied and exam grade.

What does r2 tell us?

r2=(0.54)2=0.2916r^2=\left(0.54\right)^2=0.2916
About 29% of exam grade is explained by how many hours you study.


Portions of information contained in this publication/book are printed with permission of Minitab, LLC. All such material remains the exclusive property and copyright of Minitab, LLC. All rights reserved.

PAGE BREAK

Suppose you are given r, sx, sy, x, yr,\ s_x,\ s_y,\ \overline{x},\ \overline{y}:

x=7.794, y=77.824, sx=3.906, sy=18.235, r=0.54\overline{x}=7.794,\ \overline{y}=77.824,\ s_x=3.906,\ s_y=18.235,\ r=0.54

Step 1: Find the slope
b1=r(sysx)\displaystyle\boxed{b_1=r\left(\frac{s_y}{s_x}\right)}

b1=0.54(18.2353.906) =2.52\displaystyle{b_1=0.54\left(\frac{18.235}{3.906}\right)\ =2.52}



Step 2: Find the intercept
bo=yb1x\displaystyle\boxed{b_o=\overline{y}-b_1\overline{x}}

bo=77.824(2.52)(7.794)=58.18b_o=77.824-\left(2.52\right)\left(7.794\right)=58.18



Step 3: Show the full linear equation

y^=bo+b1x\displaystyle\boxed{\hat{y}=b_o+b_1x}
y^=58.18+2.52x\hat{y}=58.18+2.52x



Using the data set below, determine the correlation, slope, and intercept of the least squares regression line.



xˉ\bar{x}= 60 yˉ\bar{y}= 4.8

sxs_{x}= 38.08 sys_{y}= 2.59

r=???r=???

0:00 / 0:00

Solving for the Regression Line (Least Squares Method)


Given a bunch of data coordinates (X,Y), we can generate a scatterplot. Then, we determine the best-fit line to represent the relationship between two quantitative variables: the explanatory variable XX and the response variable YY.

y^=bo+b1x\boxed{\hat{y}=b_o+b_1x}

The slope b1\colorFour{b_1}, tells us how much yy changes for every one unit increase in xx.

Let’s prove this using calculus:

dy^dx=b1\displaystyle\boxed{\frac{d\hat{y}}{dx}=b_1}

“A unit increase in xx will change yy by b1b_1.”

Wize Concept
The slope b1b_1 and correlation coefficient rr always have the same sign!

PAGE BREAK
The intercept bo\colorFour{b_o}, tells us the value of yy when x=0x=0. It is the point where the line crosses the y-axis.




0:00 / 0:00

Example: Solving for the Regression Line (Sxy, Sxx Method)


We want to see if there is a relationship between the number of hours a student studies the day before the exam (XX) and the exam grade (YY). We randomly sample 34 students:

Note: raw data has been truncated


n=34n=34

Important: (xi)(yi)xiyi\left(\sum_{ }^{ }x_i\right)\left(\sum_{ }^{ }y_i\right)\neq\sum x_iy_i


PAGE BREAK
Scatterplot


Portions of information contained in this publication/book are printed with permission of Minitab, LLC. All such material remains the exclusive property and copyright of Minitab, LLC. All rights reserved.


PAGE BREAK



Step 1: Find the slope

b1=SSxySSxx\displaystyle\boxed{b_1=\frac{SS_{xy}}{SS_{xx}}}

SSxy=xiyi(xi)(yi)n\displaystyle{SS_{xy}=\sum_{ }^{ }x_iy_i-\frac{\left(\sum_{ }^{ }x_i\right)\left(\sum_{ }^{ }y_i\right)}{n}}

SSxy=21,892(265)(2,646)34=1,268.76\displaystyle{SS_{xy}=21,892-\frac{\left(265\right)\left(2,646\right)}{34}=1,268.76}


SSxx=xi2(xi)2n\displaystyle{SS_{xx}=\sum_{ }^{ }x_i^2-\frac{\left(\sum_{ }^{ }x_i\right)^2}{n}}


SSxx=2,569(265)234=503.56\displaystyle{SS_{xx}=2,569-\frac{\left(265\right)^2}{34}=503.56}

Therefore:

b1=SSxySSxx=1,268.76503.56=2.52\displaystyle{b_1=\frac{SS_{xy}}{SS_{xx}}=\frac{1,268.76}{503.56}=2.52}


PAGE BREAK

Step 2: Find the intercept
bo=yb1x\displaystyle\boxed{b_o=\overline{y}-b_1\overline{x}}

y=yin\displaystyle{\overline{y}=\frac{\sum_{ }^{ }y_i}{n}}

y=2,64634=77.824\displaystyle{\overline{y}=\frac{2,646}{34}=77.824}

x=xin\displaystyle{\overline{x}=\frac{\sum_{ }^{ }x_i}{n}}

x=26534=7.794\overline{x}=\frac{265}{34}=7.794

Therefore:

bo=77.824(2.52)(7.794)=58.18b_o=77.824-\left(2.52\right)\left(7.794\right)=58.18


Step 3: Show the full linear equation

y^=bo+b1x\boxed{\hat{y}=b_o+b_1x}
y^=58.18+2.52x\hat{y}=58.18+2.52x
Using the information provided below, solve for the slope and intercept of the linear regression equation.

x=702y=461xy=19,871x2=31396y2=13,449\begin{array}{ll}\sum x=702\\\sum y=461\\\sum xy=19,871\\\sum x^2=31396\\\sum y^2=13,449\end{array}
n=18n=18
Click on 'HINT' if you are stuck!