Wize University Statistics Textbook > Multiple Regression
Hypothesis Testing for Multiple Regression
Popular Courses
COMM 214
Concordia University
Statistics
General Course
Intro to Statistics
University Study Guides
COMM 215
Concordia University
COMM 191
University of British Columbia
STA 100
University of California - Davis
Intro to Statistics
University Study Guides
STATS 2035
Western University
QMS 210
Toronto Metropolitan University
MGCR 271
McGill University
STATS 2B03
McMaster University
STAT 217
University of Calgary
COMM 162
Queen's University
MGTSC 212
University of Alberta
ECON 227
McGill University
STAT-2910
University of Windsor
Stats 8
Oregon State
STAT 1000
University of Manitoba
STAT 252
University of Victoria
COMM 1503
Dalhousie University

0:00 / 0:00
Hypothesis Testing for Multiple Regression
In a multiple regression model, there are more than one explanatory variables used to explain or predict one response variable .
- We conduct an F-test to see if the overall model is significant in predicting .
- Specifically, we are assessing how good all the explanatory variables are, collectively, at predicting .
Hypotheses for F-test:
("The overall model is not significant.”)
at least one ("The overall model is significant.”)
"F" for "full" model
where,
- # of explanatory variables,
- df numerator
- df denominator
Important
- The F-stat only tells us if the overall model is sufficient.
- It does not tell us which individual explanatory variables are significant!
- Each explanatory variable will have its own t-score so you will be able to assess the significance of each one by running t-tests.
- There is only one F-score in a regression model.
F-Distribution
The F-distribution is one-sided and skewed to the right. It start at 0 and goes to infinity. The larger the F score, the better the model is overall.

0:00 / 0:00
Example: Hypothesis Testing for Multiple Regression
We wish to predict grade using 4 predictor variables:
We randomly sampled 16 students. Results:

We test if the overall model is appropriate to predict grade. Hypotheses:
:
: at least one
We conduct an F-test to test if the overall model is sufficient:
df numerator
df denominator

(i) What percent of grade is explained by the model?
97.33% of grade is explained by the model.
(ii) Based on the F-stat and its p-value, how do you conclude?
(a) The overall model is sufficient.
(b) The overall model is not sufficient.
(c) All the explanatory variables in the model are significant.
(d) None of the explanatory variables in the model are significant.
The F-score is 100.32. The p-value is = 0.00
The F-stat only tells us if the overall model is sufficient. It does not tell us which individual explanatory variables are significant!
(iii) What does the coefficient (i.e. hours spent playing video games) tell us?
For each hour spent playing video games, your grade is increase/reduced by 1.45 percent, all else equal. Better grade arises if a student spends more/less time playing video games.
Reduced; less
(iv) Which of the following must be true about ?
(a) It is a significant explanatory variable on its own.
(b) It is a significant explanatory variable in this multiple regression model.
(c) It is a not significant explanatory variable on its own.
(d) It is a not significant explanatory variable in this multiple regression model.
To only assess one explanatory variable in the multiple regression model, we look at its t-score:
The p-value is 0.0036, which is less than 1%, indicating that "Games (hrs)" is a very significant explanatory variable.
b4 is a significant explanatory variable as a "team member" in this multiple regression model. We don’t know if it is by itself; we will need to plot it alone in a simple linear regression model - it very well could be significant but we don't know with the information provided.
"Beyonce"
(v) Sammy studied for 40 hours, has an IQ of 130, has a GPA of 3.50, and played 3 hours of Mario Kart. Predict his grade.
=76.008
We predict his grade will be 76%.
(vi) At the 5% significance level, test if GPA should be included in the model.
The t-score is less than 2 so we can already tell it is not significant. But let's verify that using the t-table.
CV(0.05,2-tail,11)=2.201. The t-score (1.1049) is less than CV (2.201), so we fail to reject .
Given the t-score (1.1049) and df (11), the p-value is between 0.20 and 0.30, which means the p-value is greater than the significance level (0.05), so we fail to reject .

[Software] Exact p-value =TDIST(1.1049,11,2)=
GPA is/is not a significant variable for explaining grade; it should/should not be removed from the model.
is not; should be removed
(vii) Given the t-stats for each explanatory variable, what do you recommend for the model?
Hours of study: KEEP REMOVE
IQ: KEEP REMOVE
GPA: KEEP REMOVE
Hours of playing video games: KEEP REMOVE
Keep: hours study, games
Remove: IQ, GPA

0:00 / 0:00
Example: Hypothesis Testing for Multiple Regression
We want to predict the earnings of an Instagram "Influencer" based on three explanatory variables:
number of followers (in thousands)
hours of volunteer work
grade
("The overall model is not significant.”)
at least one ("The overall model is significant.”)
PARTIAL OUTPUT
(a) Determine the F-statistic:
(b) What are the degrees of freedoms? (For the F-stat, there are two df's.)
df numerator =
df denominator =
The sample size n is not 14!!! The df (total) = n-1.
(c) At the 5% significance level, what is the critical value for F? [Use F-table]
The F-score must be larger than 3.5874 to reject .

(d) At the 1% significance level, what is the critical value for F? [Use F-table]
The F-score must be larger than 6.2167 to reject Ho.

(e) What is the p-value for the F-statistic? [Use F-table]
6.2167 < [F=74.43]
CV(0.01) > p-value
Therefore, p-value is less than 0.01.
You can also see the p-value value in the ANOVA table under "Significance F", which simply means the "p-value of the F-stat".
We can reject Ho at the 1% significance level (and certainly at the 5% significance level).
Wow! The F-score is huge! Does that mean all the explanatory variables are significant?
NO!
Recall:
- The F-stat only tells us if the overall model is sufficient. It does not tell us which individual explanatory variables are significant!
- Each explanatory variable will have its own t-score so you will be able to assess the significance of each one by running t-tests.
- There is only one F-score in a regression model.
Let's see the full ANOVA table:
- You see that only Followers is a significant explanatory variable because its p-value is low. How much time a Instagram "Influencer" volunteers and their grade are not good predictors of their earnings, based on their large p-value.
- Also notice that only the confidence interval for the Followers coefficient does not contain 0.
("The overall model is not significant.”)
at least one ("The overall model is significant.”)
It is true: at least one explanatory variable is significant. In this example, it's just Followers. That is enough to reject the null hypothesis and conclude that the overall model is significant.
Finally, notice how high is. This suggests that the one significant explanatory variable, Followers, is doing almost all the work in explaining earnings.
Michaela is good at statistics but is famous for her cooking website. She believes that the number of new membership subscriptions per month depends on money spent on advertising, number of new recipes posted, number of times her page is shared on social media, and number of guest appearances she makes on TV. She randomly samples 18 months and applies multiple regression:
(i) How is the model overall for explaining new membership subscriptions? Select the null hypothesis. (Check all that applies.)
An account manager's salary () is estimated using a regression model. There are 3 explanatory variables: years of experience, number of complaints, and height (cm). Salary is in $'000.
Use the partial Excel output provided below to answer the series of questions.
(i) Which is the correct regression equation?