3.3 More of {base}
and {stats}
R’s {base}
and {stats}
libraries have lots of built-in functions that help perform statistical analysis. For example, anova()
can be used to compare two regression models quickly.
anova(reg_fit, poly_fit)
## Analysis of Variance Table
##
## Model 1: Volume ~ Girth + Height
## Model 2: Volume ~ Girth + I(Girth^2) + Height
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 421.92
## 2 27 186.01 1 235.91 34.243 3.13e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We observe that the second order term for Girth
does indeed provide significant explanatory power to the model. (Formally, we reject the null hypothesis that the second order term for Girth
is zero.)
What is ANOVA?
Use base R statistical function when someone tries to test your statistics knowledge.
Question: If \(U_1\) and \(U_2\) are i.i.d. (independent and identically distributed) \(Unif(0,1)\) random variables, what is the distribution of \(U_1 + U_2\)?
set.seed(42)
<- 10000
n <- runif(n)
u_1 <- runif(n)
u_2 <- function(x, ...) {
.hist hist(x, probability = TRUE,...)
lines(density(x), col = "blue", lwd = 2, ...)
}
layout(matrix(c(1,2,3,3), 2, 2, byrow = TRUE))
.hist(u_1)
.hist(u_2)
.hist(u_1 + u_2)
Answer: Evidently it’s triangular.
There are probably lots of functions that you didn’t know you even needed.
<- function(data) {
add_column # Whoops! `df` should be `data`
%>% mutate(dummy = 1)
df
}
%>% add_column() trees
## Error in UseMethod("mutate"): no applicable method for 'mutate' applied to an object of class "function"
df()
is the density function for the F distribution with df1
and df2
degrees of freedom
df
## function (x, df1, df2, ncp, log = FALSE)
## {
## if (missing(ncp))
## .Call(C_df, x, df1, df2, log)
## else .Call(C_dnf, x, df1, df2, ncp, log)
## }
## <bytecode: 0x55e28a6cf4f0>
## <environment: namespace:stats>