Base types

Learning objectives:

  • Understand what OOP means–at the very least for R
  • Know how to discern an object’s nature–base or OO–and type

John Chambers, creator of S programming language

Session Info
library("DiagrammeR")
utils::sessionInfo()
#> R version 4.5.1 (2025-06-13 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: America/Chicago
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] DiagrammeR_1.0.11
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37      RColorBrewer_1.1-3 R6_2.6.1           fastmap_1.2.0     
#>  [5] xfun_0.52          magrittr_2.0.3     glue_1.8.0         knitr_1.50        
#>  [9] htmltools_0.5.8.1  rmarkdown_2.29     cli_3.6.5          visNetwork_2.1.2  
#> [13] compiler_4.5.1     tools_4.5.1        evaluate_1.0.4     yaml_2.3.10       
#> [17] rlang_1.1.6        jsonlite_2.0.0     htmlwidgets_1.6.4  keyring_1.4.1

Why OOP is hard in R

  • Multiple OOP systems exist: S3, R6, S4, and (now/soon) S7.
  • Multiple preferences: some users prefer one system; others, another.
  • R’s OOP systems are different enough that prior OOP experience may not transfer well.

XKCD 927

OOP: Big Ideas

  1. Polymorphism. Function has a single interface (outside), but contains (inside) several class-specific implementations.
# imagine a function with object x as an argument
# from the outside, users interact with the same function
# but inside the function, there are provisions to deal with objects of different classes
some_function <- function(x) {
  if is.numeric(x) {
    # implementation for numeric x
  } else if is.character(x) {
    # implementation for character x
  } ...
}
Example of polymorphism
# data frame
summary(mtcars[,1:4])
#>       mpg             cyl             disp             hp       
#>  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
#>  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
#>  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
#>  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
#>  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
#>  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0
# statistical model
lin_fit <- lm(mpg ~ hp, data = mtcars)
summary(lin_fit)
#> 
#> Call:
#> lm(formula = mpg ~ hp, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -5.7121 -2.1122 -0.8854  1.5819  8.2360 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
#> hp          -0.06823    0.01012  -6.742 1.79e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.863 on 30 degrees of freedom
#> Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
#> F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07
  1. Encapsulation. Function “encapsulates”–that is, encloses in an inviolate capsule–both data and how it acts on data. Think of a REST API: a client interacts with with an API only through a set of discrete endpoints (i.e., things to get or set), but the server does not otherwise give access to its internal workings or state. Like with an API, this creates a separation of concerns: OOP functions take inputs and yield results; users only consume those results.

OOP: Properties

Objects have class

  • Class defines:
    • Method (i.e., what can be done with object)
    • Fields (i.e., data that defines an instance of the class)
  • Objects are an instance of a class

Class is inherited

  • Class is defined:
    • By an object’s class (e.g., ordered factor)
    • By the parent of the object’s class (e.g., factor)
  • Inheritance matters for method dispatch
    • If a method is defined for an object’s class, use that method
    • If an object doesn’t have a method, use the method of the parent class
    • The process of finding a method, is called dispatch

OOP in R: Two Paradigms

1. Encapsulated OOP

  • Objects “encapsulate”
    • Methods (i.e., what can be done)
    • Fields (i.e., data on which things are done)
  • Calls communicate this encapsulation, since form follows function
    • Form: object.method(arg1, arg2)
    • Function: for object, apply method for object’s class with arguments arg1 and arg2

2. Functional OOP

  • Methods belong to “generic” functions
  • From the outside, look like regular functions: generic(object, arg2, arg3)
  • From the inside, components are also functions

Concept Map

Mermaid code
DiagrammeR::mermaid("
graph LR

OOP --> encapsulated_OOP
OOP --> functional_OOP

functional_OOP --> S3
functional_OOP --> S4

encapsulated_OOP --> R6
encapsulated_OOP --> RC
")

OOP in base R

  • S3
    • Paradigm: functional OOP
    • Noteworthy: R’s first OOP system
    • Use case: low-cost solution for common problems
    • Downsides: no guarantees
  • S4
    • Paradigm: functional OOP
    • Noteworthy: rewrite of S3, used by Bioconductor
    • Use case: “more guarantees and greater encapsulation” than S3
    • Downsides: higher setup cost than S3
  • RC
    • Paradigm: encapsulated OOP
    • Noteworthy: special type of S4 object is mutable–in other words, that can be modified in place (instead of R’s usual copy-on-modify behavior)
    • Use cases: problems that are hard to tackle with functional OOP (in S3 and S4)
    • Downsides: harder to reason about (because of modify-in-place logic)

OOP in packages

  • R6
    • Paradigm: encapsulated OOP
    • Noteworthy: resolves issues with RC
  • R7
    • Paradigm: functional OOP
    • Noteworthy:
  • R.oo
    • Paradigm: hybrid functional and encapsulated (?)
  • proto
    • Paradigm: prototype OOP
    • Noteworthy: OOP style used in ggplot2

How can you tell if an object is base or OOP?

Functions

Two functions:

  • base::is.object(), which yields TRUE/FALSE about whether is OOP object
  • sloop::otype(), which says what type of object type: "base", "S3", etc.

An few examples:

# Example 1: a base object
is.object(1:10)
#> [1] FALSE
sloop::otype(1:10)
#> [1] "base"
# Example 2: an OO object
is.object(mtcars)
#> [1] TRUE
sloop::otype(mtcars)
#> [1] "S3"

sloop

  • S Language Object-Oriented Programming

XKCD 927

Class

OO objects have a “class” attribute:

# base object has no class
attr(1:10, "class")
#> NULL
# OO object has one or more classes
attr(mtcars, "class")
#> [1] "data.frame"

What about types?

Only OO objects have a “class” attribute, but every object–whether base or OO–has class

Vectors

typeof(NULL)
#> [1] "NULL"
typeof(c("a", "b", "c"))
#> [1] "character"
typeof(1L)
#> [1] "integer"
typeof(1i)
#> [1] "complex"

Functions

# "normal" function
my_fun <- function(x) { x + 1 }
typeof(my_fun)
#> [1] "closure"
# internal function
typeof(`[`)
#> [1] "special"
# primitive function
typeof(sum)    
#> [1] "builtin"

Environments

typeof(globalenv())
#> [1] "environment"

S4

mle_obj <- stats4::mle(function(x = 1) (x - 2) ^ 2)
typeof(mle_obj)
#> [1] "S4"

Language components

typeof(quote(a))
#> [1] "symbol"
typeof(quote(a + 1))
#> [1] "language"
typeof(formals(my_fun))
#> [1] "pairlist"

Concept Map

Base types in R

Sankey graph code

The graph above was made with SankeyMATIC

// toggle "Show Values"
// set Default Flow Colors from "each flow's Source"

base\ntypes [8] vectors
base\ntypes [3] functions
base\ntypes [1] environments
base\ntypes [1] S4 OOP
base\ntypes [3] language\ncomponents
base\ntypes [6] C components

vectors [1] NULL
vectors [1] logical
vectors [1] integer
vectors [1] double
vectors [1] complex
vectors [1] character
vectors [1] list
vectors [1] raw

functions [1] closure
functions [1] special
functions [1] builtin

environments [1] environment

S4 OOP [1] S4

language\ncomponents [1] symbol
language\ncomponents [1] language
language\ncomponents [1] pairlist

C components [1] externalptr
C components [1] weakref
C components [1] bytecode
C components [1] promise
C components [1] ...
C components [1] any

Be careful about the numeric type

  1. Often “numeric” is treated as synonymous for double:
# create a double and integeger objects
one <- 1
oneL <- 1L
typeof(one)
#> [1] "double"
typeof(oneL)
#> [1] "integer"
# check their type after as.numeric()
one |> as.numeric() |> typeof()
#> [1] "double"
oneL |> as.numeric() |> typeof()
#> [1] "double"
  1. In S3 and S4, “numeric” is taken as either integer or double, when choosing methods:
sloop::s3_class(1)
#> [1] "double"  "numeric"
sloop::s3_class(1L)
#> [1] "integer" "numeric"
  1. is.numeric() tests whether an object behaves like a number
typeof(factor("x"))
#> [1] "integer"
is.numeric(factor("x"))
#> [1] FALSE

But Advanced R consistently uses numeric to mean integer or double type.