Popular Courses
COMM 214
Concordia University
STAT 151
University of Alberta
AP Statistics Exam Prep Course
AP Exam Prep
Statistics
General Course
Intro to Statistics
University Study Guides
COMM 215
Concordia University
Algebra I
US High School
COMM 191
University of British Columbia
STAT 213
University of Calgary
DATASCI 1000
Western University
STA 100
University of California - Davis
Grade 12 Data Management
Ontario High School
High School Statistics
US High School
STAT 200
University of British Columbia
Intro to Statistics
University Study Guides
STATS 2035
Western University
STAT 161
University of Alberta
QMS 210
Toronto Metropolitan University
STAT 263
Queen's University
ECON 221
Concordia University

0:00 / 0:00
Contingency Table
A contingency table, also known as cross tabulation or crosstab, is a matrix format that displays the frequency distribution of variables. It is a two-way table that displays the frequency counts of two categorical variables.
Example
There are two ways to treat Idiocy, an epidemic disease: medication or natural remedies. The contingency table below summarizes the success rate of each treatment. Which treatment is better?
To get a better idea about the rate of success for each treatment, we need to convert this table from counts to percentages.
Do we convert to percentages of the row totals or the column totals?
It depends on the context.
Row Totals

Column Totals

Here, the column totals are much more useful than the row totals because you can compare success rates:
- 83.3% of those who received medication were successfully treated.
- Only 14.6% of those who received natural remedies were successfully treated.

0:00 / 0:00
Simpson’s Paradox
Simpson's Paradox is the effect that occurs in which there appears to be a certain trend in multiple groups; however, this trend disappears or is the complete opposite when the data of these groups are combined or aggregated. This happens when you have a lurking variable, which is a variable that is not included in an experiment or observation but it does truly affect the variables of interest.
Watch Out!
Aggregating can be dangerous!
Example:
Suppose there are two instructors for COMM 291: Dr. Simpson and Dr. Griffin. Based on the grades from the pervious term, 78 out of 240 students got A’s in Dr. Simpson’s classes; 36 out of 150 students got A’s in Dr. Griffin’s classes. In percentages:
Does this mean you have a better chance of getting an A in Dr. Simpson’s class? Is Dr. Griffin a hard or bad instructor?
Not so fast! You should be aware that they each teach three class sections, which we have aggregated.
Let’s break them down into sections:
- You see that Dr. Simpson teaches all morning classes; Dr. Griffin teaches one morning class and two evening classes.
- You also see that more students get A’s in morning classes, including the morning class that Dr. Griffin teaches.
- In fact, more than half of his morning section got A’s – more than any COMM 291 section.
What is the lurking variable in this example?
The sections: morning and evening.
Practice: Contingency Tables
We asked 240 randomly selected people in the province asked how satisfied they are with their premier. Results:
The survey was conducted via stratified sampling based on three age groups: 40 millennials, 100 adults, and 100 seniors.
(a) What percent of people surveyed is at least “Somewhat Satisfied” with the premier? (Enter answer in decimal form e.g. 0.356)
(b) What percent of seniors surveyed is at least “Somewhat Satisfied” with the premier? (Enter answer in decimal form e.g. 0.16)
(c) Almost all the seniors and more than half of adults surveyed are at least “Somewhat Dissatisfied” with the premier. Why does the data contradict with what we observe in part (a)? [Hint: Answer is a two-word term.]