4.1 Inference and Sampling Distributions

Statistical inference can be formulated as a set of operations on data that yield estimates and uncertainty statements about predictions and parameters of some underlying process or population.

Role of inference

  • Sampling model - infer characteristics of population from sample

  • Measurement error model - infer parameters for underlying model, including measurement error. E.g. a b and σ in yi=a+bxi+ϵi, where ϵiN(0,σ)

  • Model Error - all models are wrong.

This book sets up regression models in the measurement error framework, yi=a+bxi+ϵi with the error also intepretable as model error, and sampling implicit in that the ϵi can be considered random samples from a distribution.

Sampling distribution

  • Set of possible datasets that could have been observed if the data collection process had been re-done, along with associated probabilities.

  • In general, this distribution is not known but estimated from observed data. For example in for linear regression the distribution depends on the unknown a, b, and σ (in yi=a+bxi+ϵi) which are estimated from the data.

  • Generative model - represents a random process to generate new data set