4.1 Inference and Sampling Distributions

Statistical inference can be formulated as a set of operations on data that yield estimates and uncertainty statements about predictions and parameters of some underlying process or population.

Role of inference

  • Sampling model - infer characteristics of population from sample

  • Measurement error model - infer parameters for underlying model, including measurement error. E.g. \(a\) \(b\) and \(\sigma\) in \(y_i = a + b x_i + \epsilon_i\), where \(\epsilon_i \sim N(0,\sigma)\)

  • Model Error - all models are wrong.

This book sets up regression models in the measurement error framework, \(y_i = a + b x_i + \epsilon_i\) with the error also intepretable as model error, and sampling implicit in that the \(\epsilon_i\) can be considered random samples from a distribution.

Sampling distribution

  • Set of possible datasets that could have been observed if the data collection process had been re-done, along with associated probabilities.

  • In general, this distribution is not known but estimated from observed data. For example in for linear regression the distribution depends on the unknown \(a\), \(b\), and \(\sigma\) (in \(y_i = a + b x_i + \epsilon_i\)) which are estimated from the data.

  • Generative model - represents a random process to generate new data set