4.4 Statistical significance, hypothesis testing, and statistical errors
Statistically significant - observed values could not be reasonably explained by chance (i.e. by the null hypothesis \(H_0\))
Hypothesis test - based on a test statistic (\(T\)) that summarizes the deviation of the data from what would be expected under the null hypothesis (\(H_0\)).
p-value - Probability of observing something at least as extreme as the observed test statistic. \(p <0.05\) is often taken as ‘statistically significant’
In simplest case, \(H_0\) is presents some probability model of the data \(y\), \(p(y)\), with replication data \(y_{rep}\). Then the p-value is computed by:
\(p = Pr(T(y^{rep}) \geq T(y))\)
ROS authors do not recommend using statistical significance as a decision rule.
4.4.1 Type 1 and Type 2 errors vs. Type M and Type S errors
Type 1 - falsely rejecting a null hypothesis
Type 2 - not rejecting a null hypothesis that actually false
ROS authors do not like talking about these, mainly because in many problems the null hypothesis cannot really be true. A drug will have some effect, for example. They prefer:
Type M - The magnitude of the estimated effect is much different then the true effect.
Type S - The sign of the estimated effect is opposite to the true effect.
A statistical procedure can be characterized by its Type S error rate, and its expected exaggeration factor due to Type M errors. See section 16.1 for a detailed example, but the Type M error is a concern due to the “statistical significance filter” which puts a lower bound on the magnitude for a reported (published) effect.
The authors do not use null hypothesis significance testing as the primary research goal, but they do use it as a tool, for example ‘non-rejection’ means probably need more information / data.