4.1 Inference and Sampling Distributions
Statistical inference can be formulated as a set of operations on data that yield estimates and uncertainty statements about predictions and parameters of some underlying process or population.
Role of inference
Sampling model - infer characteristics of population from sample
Measurement error model - infer parameters for underlying model, including measurement error. E.g. \(a\) \(b\) and \(\sigma\) in \(y_i = a + b x_i + \epsilon_i\), where \(\epsilon_i \sim N(0,\sigma)\)
Model Error - all models are wrong.
This book sets up regression models in the measurement error framework, \(y_i = a + b x_i + \epsilon_i\) with the error also intepretable as model error, and sampling implicit in that the \(\epsilon_i\) can be considered random samples from a distribution.
Sampling distribution
Set of possible datasets that could have been observed if the data collection process had been re-done, along with associated probabilities.
In general, this distribution is not known but estimated from observed data. For example in for linear regression the distribution depends on the unknown \(a\), \(b\), and \(\sigma\) (in \(y_i = a + b x_i + \epsilon_i\)) which are estimated from the data.
Generative model - represents a random process to generate new data set