4.4 Statistical significance, hypothesis testing, and statistical errors

Statistically significant - observed values could not be reasonably explained by chance (i.e. by the null hypothesis $H_0$ )
Hypothesis test - based on a test statistic ( $T$ ) that summarizes the deviation of the data from what would be expected under the null hypothesis ( $H_0$ ).
p-value - Probability of observing something at least as extreme as the observed test statistic. $p <0.05$ is often taken as ‘statistically significant’

In simplest case, $H_0$ is presents some probability model of the data $y$ , $p(y)$ , with replication data $y_{rep}$ . Then the p-value is computed by:

$p = Pr(T(y^{rep}) \geq T(y))$

ROS authors do not recommend using statistical significance as a decision rule.

4.4.1 Type 1 and Type 2 errors vs. Type M and Type S errors

Type 1 - falsely rejecting a null hypothesis
Type 2 - not rejecting a null hypothesis that actually false

ROS authors do not like talking about these, mainly because in many problems the null hypothesis cannot really be true. A drug will have some effect, for example. They prefer:

Type M - The magnitude of the estimated effect is much different then the true effect.
Type S - The sign of the estimated effect is opposite to the true effect.

A statistical procedure can be characterized by its Type S error rate, and its expected exaggeration factor due to Type M errors. See section 16.1 for a detailed example, but the Type M error is a concern due to the “statistical significance filter” which puts a lower bound on the magnitude for a reported (published) effect.

The authors do not use null hypothesis significance testing as the primary research goal, but they do use it as a tool, for example ‘non-rejection’ means probably need more information / data.