Object Size

  • Use lobstr::obj_size()
  • Lists may be smaller than expected because of referencing the same value
  • Strings may be smaller than expected because using global string pool
  • Difficult to predict how big something will be
    • Can only add sizes together if they share no references in common

Alternative Representation

  • As of R 3.5.0 - ALTREP
  • Represent some vectors compactly
    • e.g., 1:1000 - not 10,000 values, just 1 and 1,000

Exercises

1. Why are the sizes so different?
y <- rep(list(runif(1e4)), 100)

object.size(y) # ~8000 kB
#> 8005648 bytes
obj_size(y)    # ~80   kB
#> 80.90 kB

From ?object.size():

“This function merely provides a rough indication: it should be reasonably accurate for atomic vectors, but does not detect if elements of a list are shared, for example.

2. Why is the size misleading?
funs <- list(mean, sd, var)
obj_size(funs)
#> 18.76 kB

Because they reference functions from base and stats, which are always available. Why bother looking at the size? What use is that?

3. Predict the sizes
a <- runif(1e6) # 8 MB
obj_size(a)
#> 8.00 MB
b <- list(a, a)
  • There is one value ~8MB
  • a and b[[1]] and b[[2]] all point to the same value.
obj_size(b)
#> 8.00 MB
obj_size(a, b)
#> 8.00 MB
b[[1]][[1]] <- 10
  • Now there are two values ~8MB each (16MB total)
  • a and b[[2]] point to the same value (8MB)
  • b[[1]] is new (8MB) because the first element (b[[1]][[1]]) has been changed
obj_size(b)     # 16 MB (two values, two element references)
#> 16.00 MB
obj_size(a, b)  # 16 MB (a & b[[2]] point to the same value)
#> 16.00 MB
b[[2]][[1]] <- 10
  • Finally, now there are three values ~8MB each (24MB total)
  • Although b[[1]] and b[[2]] have the same contents, they are not references to the same object.
obj_size(b)
#> 16.00 MB
obj_size(a, b)
#> 24.00 MB