Chapter 13 S3

13.2 Basics

What’s a real use case for unclass?

I wanted to pass a paletteer palette to leaflet::colorNumeric which ends up calling the S3 method leaflet:::toPaletteFunc but that only has methods defined for character function and matrix . The paletteer output is a character vector with class color and so colorNumeric(palette, domain) failed. Simple solution: colorNumeric(unclass(palette), domain)

13.3 Classes

Can we come up with a simple example for a constructor, validator, and helper?

IPA Constructor

Create the custom class

IPA Helper

Calls both the validator and the constructor

13.0.0.1 Test it out!

Fails as expected:

Error in validate_ipa(x) : x >= 15 is not TRUE

But this works!

## [1] 50
## attr(,"class")
## [1] "ipa"

13.3.4.3 Exercises

Can we go over the factor source code, commenting it line for line?

# using the example:
x <- c("one", "two", "three", "one")

factor <- function (x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA) {
 
   # if character vector is null set to character()
  if (is.null(x)) 
      x <- character()
  # nx - names of x
  # in our case NULL
    nx <- names(x)
    # we didn't supply levels so it's missing
    if (missing(levels)) {
      # set y to the unique names found in the vector
      # nmax by default is NA but this is an argument
      # for unique setting the max number of unique values
        y <- unique(x, nmax = nmax)
        # get the order of the unique values
        # defaults to alphabetical
        ind <- order(y)
        # put our vector in order using only unique values
        levels <- unique(as.character(y)[ind])
    }
    # because the function arg is a promise 
    # we need to eval now to use it? I think?
    force(ordered) # FALSE
    # if x isn't a character string make it one
    if (!is.character(x)) 
        x <- as.character(x)
    # match the number of levels with NAs
    # then if they are NA it will resolve to TRUE
    # only return the TRUE levels
    # this is how we get rid of NAs!
    levels <- levels[is.na(match(levels, exclude))]
    # get the ordered levels
    f <- match(x, levels)
    # if the names aren't null give f those names
    if (!is.null(nx)) 
        names(f) <- nx
    # if missing levels make levels back into a character string?
    # why?
    if (missing(labels)) {
        levels(f) <- as.character(levels)
    }
    # otherwise get the legenth of your labels
    # if its the same as the levels
    else {
        nlab <- length(labels)
        if (nlab == length(levels)) {
            # xlevls is labels as character
            # then we get the unique values of that
            # and set that to xlevs
            nlevs <- unique(xlevs <- as.character(labels))
            # this is null
            at <- attributes(f)
            # attr levels
            at$levels <- nlevs
            # match the levels with ordered levels
            # and put the levels back in numeric order?
            # I think?
            f <- match(xlevs, nlevs)[f]
            # now set the attributes to the numeric levels
            attributes(f) <- at
        }
        # if you only have one lavel
        else if (nlab == 1L) 
            # add numerics to the end of the label?
            levels(f) <- paste0(labels, seq_along(levels))
            # otherwise invalid length
        else stop(gettextf("invalid 'labels'; length %d should be 1 or %d", 
            nlab, length(levels)), domain = NA)
    }
    # add ordered and factor to f's class list
    class(f) <- c(if (ordered) "ordered", "factor")
    f
}

factor(x)

13.3.4.4 Exercises

Now that we’ve gone through the source code, I don’t see where the contrasts attibute comes into play? We can look at the C() function but where is this used within factor()?

It is a little misleading to say that factors “have an … attribute”, where I really think he means that “some functions might check if that attribute has been assigned”.

Make a factor:

## [1] 1 2
## Levels: 1 2

Set contrasts attribute:

## [1] 1 2
## attr(,"contrasts")
##       unordered 
## contr.treatment 
## Levels: 1 2
## [1] 1 2
## Levels: 1 2

dang’t where did those contrasts go!

13.4.1 Generics

In the example:

The book states: “If you wonder why we have to repeat my_new_generic twice, think back to Section 6.2.3”

Can we go over why?

I think he’s referring to the fact that you have to use a string representation inside the UseMethod call.

I’m guessing part of the answer to this is baked into this warning on the UseMethod function docs:

Warning: Prior to R 2.1.0 UseMethod accepted a call with no arguments and tried to deduce the generic from the context.

Basically, you’re specifying the name of the function and then telling R to go find this generic function’s methods. You can’t self-reference the name of the generic after it’s been assigned to itself?

Maybe the name doesn’t exist!

## [1] "test"

Also because you might have assigned your generic to a new name, how would it know to use the true “generic” name.

## [1] "ugh"

13.4.4.5 Exercises

Can we explain what’s happening here in our own words I was a little confused with the solutions manual? g(x) uses 1 from the .default method but 10 from the generic? Why?

 x  y 
 1 10 

UseMethod creates a new function call with arguments matched as they came in to the generic. Any local variables defined before the call to UseMethod are retained (unlike S).

When you call g(x) even though you appear to be ignoring it in your definition of g and assigning x <- 10, when you call UseMethod it will basically match the x in the g.default formals to the x that was passed in. The y isn’t matched to any argument and is retained.

Tyler Grant Smith:m: 13 minutes ago r g <- function(x) { x <- 10 y <- 10 UseMethod(“g”) } g.default <- function(x = NULL) c(x = x, y = y) x <- 1 y <- 1 # this is interesting behavior g() #> y #> 10 g(x) #> x y #> 1 10

13.5 Object styles

Unfortunately, describing the appropriate use of each of these object styles is beyond the scope of this book.

Can we use the vctrs package in a concrete example to address some of these fringe cases and reasons why length(x) is causing an issue?

An object style is a pattern for structuring your class. You can use any structure you want but these 4 patterns get used very often. Date and factor are both vector style for example: the core data for objects of both types are stored in a single vector. And there are some other context-appropriate things denoted by attributes in each case.

I don’t think length is an issue but when you move away from the vanilla vector style object then the meaning changes. For example, under the hood, a data frame is really a list of vectors. Each of the vectors (columns) has the same length. So if you take the length of a data frame you get the length of the list i.e. the number of columns.

## [1] 11

The result of calling length on a data.frame isn’t necessarily intuitive: most people would probably say that the number of rows is a more natural value for the length of a data frame.

This is the same for dates: which are actually of length 3 when classes and 9 when unclassed.

## [1] 3
## [1] 11

13.7.1 S3 and Base Types

WHY isn’t the dispatch of mean(x1) determined by its type???

[1] "integer"
   mean.integer
   mean.numeric
=> mean.default
## [1] mean.Date     mean.default  mean.difftime mean.POSIXct  mean.POSIXlt 
## [6] mean.quosure*
## see '?methods' for accessing help and source code

mean doesn’t actually have a specific method defined in base R for int or numeric. the default method handles all base types, type checking along the way:

## function (x, trim = 0, na.rm = FALSE, ...) 
## {
##     if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
##         warning("argument is not numeric or logical: returning NA")
##         return(NA_real_)
##     }
##     if (na.rm) 
##         x <- x[!is.na(x)]
##     if (!is.numeric(trim) || length(trim) != 1L) 
##         stop("'trim' must be numeric of length one")
##     n <- length(x)
##     if (trim > 0 && n) {
##         if (is.complex(x)) 
##             stop("trimmed means are not defined for complex data")
##         if (anyNA(x)) 
##             return(NA_real_)
##         if (trim >= 0.5) 
##             return(stats::median(x, na.rm = FALSE))
##         lo <- floor(n * trim) + 1
##         hi <- n + 1 - lo
##         x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
##     }
##     .Internal(mean(x))
## }
## <bytecode: 0x7fd5257a7438>
## <environment: namespace:base>

13.7.3 Internal Generics

Defining a single group generic for your class overrides the default behaviour for all of the members of the group.

Does this mean that if I define abs for my class it overrides the behaviour of sign, sqrt and all the others in the Math group?

Essentially, these are all (loosely) math-related functions, so R understands how to do these for anything it can understand as a numeric (somehow). If you’re adding a class that isn’t handled in one of these numeric-ish ways, you’ll need to tell R how to handle the mathematics. You can’t make a group like “Math”, “Ops”, “Summary”, or “Complex” and expect R to know what it’s doing.

If you define abs for your class it won’t know how to do the other things. Suppose you defined your own min.POSIXct then when you call min(Sys.time()) it will encounter your function, evaluate, and then that is the end. Otherwise, min would call the min Primitive which is implemented in C as found here and if you trace through that you will see that is where the GroupDispatch is called.

13.7.4 Double Dispatch

What is happening here:

[1] "Called twice for some reason?"
[1] "Called twice for some reason?"

Double dispatch – where it’s calling the same method on x and zero. The inequality needs to convert both to the same class.

The book mentions that we can use vctrs::vec_arith to avoid the aboce, but does that pertain to this case? I tried: vctrs::vec_arith(">", x, 0) But I think my understanding is still off…

The vec_arith generic will first dispatch on class(x) So, in this case it tries to find vec_arith.foo but we haven’t defined that yet, so it instead finds vec_arith.default which signals the error.

[1] TRUE
Error: <foo> > <double> is not permitted
vec_arith will dispatch to this first, because it finds the "foo" generic
Error in UseMethod("vec_arith.foo", y): no applicable method for 'vec_arith.foo' applied to an object of class "c('double', 'numeric')"

Uh oh another error, and this is super confusing because none of our objects are class double. (typeof(x) is “double”). The problem is that vec_arith.foo further dispatches on the second argument and it is looking for vec_arith.numeric or vec_arith.default, but we haven’t defined that yet.

[1] TRUE
Error: <double> > <foo> is not permitted

now vec_arith will first dispatch on the 0 and scope to vec_arith.numeric which will then try to dispatch to vec_arith.numeric.foo which we havent defined yet.

[1] FALSE

What is happening here?!