11.1 Assumptions of Regression Analysis

  • Validity (“rarely meet all (if any) of these criteria”)

    • Model should include all relevant predictors
    • Outcome should accurately reflect phenomenon of interest
    • Model should generalize to cases to which it will apply
  • Representativesness (conditioned on predictors)

  • Additivity and Linearity

  • Independence of errors

  • Equal Variance of errors

  • Normality of errors (“typically barely important at all”- see exercises 11.3 and 11.6)

How to Deal With Failures of Assumptions

  • Extend model (e.g. measurement error models)

  • Change data or model, for example:

    • Failure of additivity: Transform the data

    • Failure of linearity: Transform predictors, add interactions

    • Non-representative: Add predictors

  • Change or restrict questions to align closer to the data.

Causal Inference

More assumptions are needed if regression is going to be given causal interpretation.

Example:

  • Causal: “Effect of a variable with all else held constant”, which would be an error for the earnings data! (Effect of increasing height on earnings?)

  • Non-causal: “Average difference in earnings comparing two people who differ by height”

11.1.0.1 Exercise 11.2: Descriptive and causal inference:

  1. For the model in Section 7.1 predicting presidential vote share from the economy, describe the coefficient for economic growth in purely descriptive, non-causal terms.

  2. Explain the difficulties of interpreting that coefficient as the effect of economic growth on the incumbent party’s vote share

More in part 4!