4.4 Statistical significance, hypothesis testing, and statistical errors

  • Statistically significant - observed values could not be reasonably explained by chance (i.e. by the null hypothesis \(H_0\))

  • Hypothesis test - based on a test statistic (\(T\)) that summarizes the deviation of the data from what would be expected under the null hypothesis (\(H_0\)).

  • p-value - Probability of observing something at least as extreme as the observed test statistic. \(p <0.05\) is often taken as ‘statistically significant’

In simplest case, \(H_0\) is presents some probability model of the data \(y\), \(p(y)\), with replication data \(y_{rep}\). Then the p-value is computed by:

\(p = Pr(T(y^{rep}) \geq T(y))\)

ROS authors do not recommend using statistical significance as a decision rule.

4.4.1 Type 1 and Type 2 errors vs. Type M and Type S errors

  • Type 1 - falsely rejecting a null hypothesis

  • Type 2 - not rejecting a null hypothesis that actually false

ROS authors do not like talking about these, mainly because in many problems the null hypothesis cannot really be true. A drug will have some effect, for example. They prefer:

  • Type M - The magnitude of the estimated effect is much different then the true effect.

  • Type S - The sign of the estimated effect is opposite to the true effect.

A statistical procedure can be characterized by its Type S error rate, and its expected exaggeration factor due to Type M errors. See section 16.1 for a detailed example, but the Type M error is a concern due to the “statistical significance filter” which puts a lower bound on the magnitude for a reported (published) effect.

The authors do not use null hypothesis significance testing as the primary research goal, but they do use it as a tool, for example ‘non-rejection’ means probably need more information / data.