4.3 Bias and unmodeled uncertainty

  • Unbiased estimate: correct on average. “In practice, it is typically impossible to construct estimates that are truly unbiased …”

  • Unmodeled uncertainty: sources of error that are not in our statistical model.

Example

Poll of 60,000 people on their support of some candidate with 52.5% responding yes. Assuming a binomial model the error would be only \(\sqrt{p (1-p)/n}\) ~ 0.2%. The sampling error is 0.2% but there are other sources of uncertainty and bias, for example:

  • The sample might not be representative (e.g. people who choose to answer may be more likely or less likely to be supporters)

  • Opinions change over time.

  • Survey response might be inaccurate (checked the wrong box?)

How to improve?

  • Improve data collection- e.g. perform a series of 600 person polls at different places and times.

  • Expand the model - e.g. control for demographic categories

  • Last resort: Increase uncertainty to account for unmodeled error - e.g. in the instant case we could estimate unmodeled error at 2.5%, so that the total error on our sample of 60,000 people is \(\sqrt{0.2^2 + 2.5^2} = 2.5\) percentage points. For only 600 people the error is \(\sqrt{2^2 + 2.5^2} = 3.2\) precentage points. Not much gained from increasing sample size by a factor of 100!