4.3 Bias and unmodeled uncertainty
Unbiased estimate: correct on average. “In practice, it is typically impossible to construct estimates that are truly unbiased …”
Unmodeled uncertainty: sources of error that are not in our statistical model.
Example
Poll of 60,000 people on their support of some candidate with 52.5% responding yes. Assuming a binomial model the error would be only \(\sqrt{p (1-p)/n}\) ~ 0.2%. The sampling error is 0.2% but there are other sources of uncertainty and bias, for example:
The sample might not be representative (e.g. people who choose to answer may be more likely or less likely to be supporters)
Opinions change over time.
Survey response might be inaccurate (checked the wrong box?)
How to improve?
Improve data collection- e.g. perform a series of 600 person polls at different places and times.
Expand the model - e.g. control for demographic categories
Last resort: Increase uncertainty to account for unmodeled error - e.g. in the instant case we could estimate unmodeled error at 2.5%, so that the total error on our sample of 60,000 people is \(\sqrt{0.2^2 + 2.5^2} = 2.5\) percentage points. For only 600 people the error is \(\sqrt{2^2 + 2.5^2} = 3.2\) precentage points. Not much gained from increasing sample size by a factor of 100!