0:00 / 0:00

Sampling Variability & Sampling Distribution

There are two main population parameters and sample statistics that we are very interested in:
  1. For numerical data: we want to estimate the population mean μ\mu using the sample mean x\overline{x}
  • Example We want to know the mean height of all students attending WIZE university, so we calculate the mean height of samples of students.
  1. For categorical data: we want to estimate the population proportion pp using the sample proportion p^\hat p
  • Example We want to know the proportion of all students attending WIZE university who are vegetarians, so we calculate the proportion of vegetarians in samples of students.
PAGE BREAK

Sampling Variability

If we keep on taking samples of size nn from our population of size NN, the sample statistic that we calculate from sample to sample will vary -- this is known as sampling variability.

Results are more reliable when the sampling variability is small.


Size Matters
When we draw a sample from a population and then draw another sample, the two samples’ statistics may vary. How much it varies depends on the sample size.

Wize Concept
The larger the sample size, the smaller the sampling variability (see: Standard Error).

This is because as the sample size increases, the effect of extreme values (outliers) decreases and the observed values for the statistic will group more closely (see: Central Limit Theorem).



PAGE BREAK

Sampling Distribution

The distribution of the sample statistics that we calculate from all possible samples of size nn is known as the sampling distribution.


PAGE BREAK

Standard Error

The standard error is a very important but confusing term. It is also known as:
  • The estimate of the standard deviation of the sampling distribution.
  • The standard deviation of the sample mean (or standard deviation of the sample proportion).

Wize Concept
The larger the sample size, the better the estimate. In other words, the larger than sample size, the lower the standard error. (A low standard error is good!)

The standard error is used in confidence intervals and hypothesis testing where we use our sample statistics to make inferences about the population parameters.


PAGE BREAK

Example

We want to estimate the average midterm grade (μ=???)\left(\mu=???\right) of all STAT 100 students (N=300)\left(N=300\right).
  • Hailey randomly surveyed n=4n=4 students (small sample size),
  • Logan randomly surveyed n=50n=50 students (large sample size).
  • They each repeated the sampling process 5 times and recorded the sample mean (x)\left(\overline{x}\right) each time.
Results:

What do you notice?
  • The results vary a lot when the sample size is small \rightarrow large standard error
  • The results vary only a little when the sample size is large \rightarrow small standard error
  • Neither Hailey nor Logan correctly guessed the true population mean (μ=52.16)\left(\mu=52.16\right), but we are more confident with Logan's estimates!

Wize Concept
In statistics, the goal is not to get a “bullseye”; rather, it is to have an estimation that we are confident with – hence, confidence intervals!