Advanced RChapter 4: SubsettingAlan Kinene @alankinene 

 & 

Shel Kariuki @Shel_Kariuki2020/08/251 / 64

Outline

Section 4.1: Introduction
Section 4.2: Selecting multiple elements
Section 4.3: Selecting a single element

2 / 64

Outline

Section 4.1: Introduction
Section 4.2: Selecting multiple elements
Section 4.3: Selecting a single element
Section 4.4: Subsetting and assignment
Section 4.5: Applications (Using subsetting to solve problems)

3 / 64

Introduction

Interrelated concepts to internalise:
- There are 3 subsetting operators: [[, [, and $
- Subsetting operators interact differently with various vector types (e.g. atomic vectors, lists, factors, matrices, and data frames)
- Subsetting and assignment can be combined ("subsassignment")

Subsetting complements structure, or str(), which shows you all the pieces of an object, but subsetting lets you pull out only the pieces you are interested in.

Often useful to use RStudio Viewer, with View(my_object) to know which pieces you want to subset

4 / 64

4.2: Selecting multiple elements5 / 64

Subsetting atomic vectors

Use [ to select any number of elements from a vector.

6 / 64

Subsetting atomic vectors

Use [ to select any number of elements from a vector.

Assume we have a simple vector: x <- c(2.1, 4.2, 3.3, 5.4)

Positive integers return elements at the specified positions:

x[c(3, 1)]

## [1] 3.3 2.1

Negative integers exclude elements at the specified positions:

x[-c(3, 1)]

## [1] 4.2 5.4

7 / 64

Subsetting atomic vectors

Logical vectors select elements where the corresponding logical value is TRUE

x[c(TRUE, TRUE, FALSE, FALSE)]

## [1] 2.1 4.2

x[x > 3]

## [1] 4.2 3.3 5.4

x[c(TRUE, FALSE)] is equivalent to x[c(TRUE, FALSE, TRUE, FALSE)]

Nothing returns the original vector.

x[]

## [1] 2.1 4.2 3.3 5.4

8 / 64

Zero returns a zero-length vector.

x[0]

## numeric(0)

Named vector

(y <- setNames(x, letters[1:4]))

##   a   b   c   d 
## 2.1 4.2 3.3 5.4

y[c("d", "c", "a")]

##   d   c   a 
## 5.4 3.3 2.1

9 / 64

Subsetting lists

Subsetting a list works in the same way as subsetting an atomic vector.
Using [ always returns a list
[[ and $, as described in Section 4.3, let you pull out elements of a list.

10 / 64

Subsetting matrices and arrays

The most common way of subsetting matrices (2D) and arrays (>2D) is a simple generalisation of 1D subsetting

Subset with multiple vectors.

a <- matrix(1:9, nrow = 3)
colnames(a) <- c("A", "B", "C")

a[1:2, ]

##      A B C
## [1,] 1 4 7
## [2,] 2 5 8

a[c(TRUE, FALSE, TRUE), 
  c("B", "A")]

##      B A
## [1,] 4 1
## [2,] 6 3

11 / 64

Subsetting matrices and arrays

Consider the matrix below:

(vals = matrix(1:25, ncol = 5, byrow = TRUE))

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]    6    7    8    9   10
## [3,]   11   12   13   14   15
## [4,]   16   17   18   19   20
## [5,]   21   22   23   24   25

Subset with a single vector

vals[c(4, 15)]

## [1] 16 23

Subset with a matrix

select <- matrix(ncol = 2, byrow = TRUE, c(
  1, 1,
  3, 1,
  2, 4
))
vals[select]

## [1]  1 11  9

12 / 64

Subsetting data frames and tibbles

Data frames have the characteristics of both lists and matrices.
When subsetting with a single index, they behave like lists and index the columns, so df[1:2] selects the first two columns.
When subsetting with two indices, they behave like matrices, so df[1:3, ] selects the first three rows (and all the columns

Given df <- data.frame(x = 1:3, y = 3:1, z = letters[1:3]) what is the output for:
df[df$x == 2, ],
df[c("x", "z")],
df[, c("x", "z")],
str(df["x"]), and
str(df[, "x"])?

13 / 64

Preserving dimensionality

By default, subsetting a matrix or data frame with a single number, a single name, or a logical vector containing a single TRUE, will simplify the returned output, i.e. it will return an object with lower dimensionality.

To preserve the original dimensionality, you must use drop = FALSE

For matrices and arrays, any
dimensions with length 1 will
be dropped:

a <- matrix(1:4, nrow = 2)
str(a[1, ])

##  int [1:2] 1 3

str(a[1, , drop = FALSE])

##  int [1, 1:2] 1 3

Data frames with a single column
will return just that column

df <- data.frame(a = 1:2, b = 1:2)
str(df[, "a"])

##  int [1:2] 1 2

str(df[, "a", drop = FALSE])

## 'data.frame':    2 obs. of  1 variable:
##  $ a: int  1 2

14 / 64

4.3: Selecting a single element

There are two other subsetting operators: [[ and $.
[[ is used for extracting single items, while x$y is a useful shorthand for x[["y"]]

15 / 64

Use of `[[`

Primary use case for [[ is when working with lists, as you get a list back.

If list x is a train carrying objects, then x[[5]] is the object in car 5; x[4:6] is a train of cars 4-6.— @RLangTip

x <- list(1:3, "a", 4:6)

16 / 64

Use of `[[`

17 / 64

Use of `[[`

If you use a vector with [[, it will subset recursively, i.e. x[[c(1, 2)]] is equivalent to x[[1]][[2]].

18 / 64

Use of `$`

$ is a shorthand operator: x$y is roughly equivalent to x[["y"]].
Often used to access variables in a data frame, as in mtcars$cyl or diamonds$carat.
One common mistake with $ is to use it when you have the name of a column stored in a variable:

If var <- "cyl", mtcars$var doesn't work because it is translated to mtcars[["var"]]. Instead use mtcars[[var]]

19 / 64

Use of `$`

The one important difference between $ and [[ is that $ does (left-to-right) partial matching.

x <- list(abc = 1)
x$a

## [1] 1

x[["a"]]

## NULL

You can avoid this behaviour:

options(warnPartialMatchDollar = TRUE)
x$a

## [1] 1

Remember: For data frames, you can also avoid this problem by using tibbles, which never do partial matching.

20 / 64

Using @ and slot()

Two additional subsetting operators, which are needed for S4 objects:
1. @ (equivalent to $)
2. slot() (equivalent to [[).

@ is more restrictive than $ in that it will return an error if the slot does not exist.

21 / 64

Exercises

.

22 / 64

4.4: Subsetting and assignment

Subassignment: Combining subsetting operators with assignments to modify selected values in an input vector.

The basic form is x[i] <- value

Ensure that:

length(value) == length(x[i])

# wafanyikazi$new_var <- 1:10000
# Error in `$<-.data.frame`(`*tmp*`, new_var, value = 1:10000) : 
  #replacement has 10000 rows, data has 500

i is unique

23 / 64

To remove a component, use x[[i]] <- NULL

departments <- list("data", "operations", "finance")
departments

## [[1]]
## [1] "data"
## 
## [[2]]
## [1] "operations"
## 
## [[3]]
## [1] "finance"

departments[[3]] <- NULL
departments

## [[1]]
## [1] "data"
## 
## [[2]]
## [1] "operations"

To add a literal NULL, use x[i] <- list(NULL)

departments[3] <- list(NULL)
departments

## [[1]]
## [1] "data"
## 
## [[2]]
## [1] "operations"
## 
## [[3]]
## NULL

24 / 64

4.5: Applications (Using subsetting to solve problems)Lookup tables (character subsetting)
25 / 64

4.5: Applications (Using subsetting to solve problems)

Lookup tables (character subsetting)
Matching and merging by hand (integer subsetting)

26 / 64

4.5: Applications (Using subsetting to solve problems)

Lookup tables (character subsetting)
Matching and merging by hand (integer subsetting)
Random samples and bootstraps (integer subsetting)

27 / 64

4.5: Applications (Using subsetting to solve problems)

Lookup tables (character subsetting)
Matching and merging by hand (integer subsetting)
Random samples and bootstraps (integer subsetting)
Ordering (integer subsetting)

28 / 64

4.5: Applications (Using subsetting to solve problems)

Lookup tables (character subsetting)
Matching and merging by hand (integer subsetting)
Random samples and bootstraps (integer subsetting)
Ordering (integer subsetting)
Expanding aggregated counts (integer subsetting)

29 / 64

4.5: Applications (Using subsetting to solve problems)

Lookup tables (character subsetting)
Matching and merging by hand (integer subsetting)
Random samples and bootstraps (integer subsetting)
Ordering (integer subsetting)
Expanding aggregated counts (integer subsetting)
Removing columns from data frames (character )

30 / 64

4.5: Applications (Using subsetting to solve problems)

Lookup tables (character subsetting)
Matching and merging by hand (integer subsetting)
Random samples and bootstraps (integer subsetting)
Ordering (integer subsetting)
Expanding aggregated counts (integer subsetting)
Removing columns from data frames (character )
Selecting rows based on a condition (logical subsetting)

31 / 64

4.5: Applications (Using subsetting to solve problems)

Lookup tables (character subsetting)
Matching and merging by hand (integer subsetting)
Random samples and bootstraps (integer subsetting)
Ordering (integer subsetting)
Expanding aggregated counts (integer subsetting)
Removing columns from data frames (character )
Selecting rows based on a condition (logical subsetting)
Boolean algebra versus sets (logical and integer )

32 / 64

4.5.1 Lookup tables (character subsetting)

Character matching

x <- c("m", "f", "u", "f", "f", "m", "m")
lookup <- c(m = "Male", f = "Female", u = NA)
lookup[x] ## Is this the same as saying look for x in the vector lookup? Is it also the same as using an ifelse function?

##        m        f        u        f        f        m        m 
##   "Male" "Female"       NA "Female" "Female"   "Male"   "Male"

We can exclude names in the results using:

unname(lookup[x])

## [1] "Male"   "Female" NA       "Female" "Female" "Male"   "Male"

33 / 64

4.5.2 Matching and merging by hand (integer subsetting)

grades <- c(1, 2, 2, 3, 1)
info <- data.frame(
  grade = 3:1,
  desc = c("Excellent", "Good", "Poor"),
  fail = c(F, F, T)
)
head(info)

##   grade      desc  fail
## 1     3 Excellent FALSE
## 2     2      Good FALSE
## 3     1      Poor  TRUE

Assuming we want to duplicate the info table so that we have a row for each value in grades.

match(needles, haystack) // look for (needles, haystack)

34 / 64

What is the position of the needles [grades elements : (1,2,2,3,1)] in the haystack [info$grade: (3,2,1)]

id <- match(grades, info$grade)
id

## [1] 3 2 2 1 3

info[id, ]

##     grade      desc  fail
## 3       1      Poor  TRUE
## 2       2      Good FALSE
## 2.1     2      Good FALSE
## 1       3 Excellent FALSE
## 3.1     1      Poor  TRUE

When matching on multiple columns, you will need to first collapse them into a single column (with e.g interaction()).

## insert intersection code here

But dplyr{} *_join() functions would be your best friends at this point

35 / 64

4.5.3 Random samples and bootstraps (integer subsetting)

Using integer indices to randomly sample or bootstrap a vector or data frame.

Use sample(n) to generate a random permutation of 1:n, and then use the results to subset the values

Simulate a dataframe

df = data.frame(names = c("John", "Teresa", "Shel", "Christine", "Brenda"),
                gender = c("M", "F", "F", "F", "F"),
                rshp = c("Father", "Mother", "Self", "Sister", "Sister"))
df

##       names gender   rshp
## 1      John      M Father
## 2    Teresa      F Mother
## 3      Shel      F   Self
## 4 Christine      F Sister
## 5    Brenda      F Sister

36 / 64

Reorder the dataframe randomly

df[sample(nrow(df)), ]

##       names gender   rshp
## 1      John      M Father
## 5    Brenda      F Sister
## 4 Christine      F Sister
## 3      Shel      F   Self
## 2    Teresa      F Mother

Select two random rows

df[sample(nrow(df), 2), ]

##    names gender   rshp
## 5 Brenda      F Sister
## 3   Shel      F   Self

Select 7 bootstrap replicates

df[sample(nrow(df), 7, replace = T), ]

##      names gender   rshp
## 5   Brenda      F Sister
## 3     Shel      F   Self
## 5.1 Brenda      F Sister
## 3.1   Shel      F   Self
## 3.2   Shel      F   Self
## 1     John      M Father
## 1.1   John      M Father

37 / 64

4.5.4 Ordering (integer subsetting)

order() takes a vector as its input and returns an integer vector describing how to order the subsetted vector

fam <- c("John", "Teresa", "Shel", "Christine", "Brenda")
order(fam) ## orders alphabetically (in ascending order by default)

## [1] 5 4 1 3 2

fam[order(fam)]

## [1] "Brenda"    "Christine" "John"      "Shel"      "Teresa"

## We can also order the vector in ascending order
fam[order(fam, decreasing = T)]

## [1] "Teresa"    "Shel"      "John"      "Christine" "Brenda"

NB: By default, any missing values will be put at the end of the vector; however, you can remove them with na.last = NA or put them at the front with na.last = FALSE.

# us <- c("Me", "You", NA)
# us[order(us)]
# us[order(us, na.last = FALSE)]

38 / 64

Using order() to order values in a variable, or variables themselves, in a dataframe

# Randomly reorder df
df2 <- df[sample(nrow(df)), 3:1]
df2

##     rshp gender     names
## 4 Sister      F Christine
## 2 Mother      F    Teresa
## 3   Self      F      Shel
## 1 Father      M      John
## 5 Sister      F    Brenda

# Order by one variable
df[order(df$gender), ]

##       names gender   rshp
## 2    Teresa      F Mother
## 3      Shel      F   Self
## 4 Christine      F Sister
## 5    Brenda      F Sister
## 1      John      M Father

39 / 64

# Order the variables themselves
df[, order(names(df))]

##   gender     names   rshp
## 1      M      John Father
## 2      F    Teresa Mother
## 3      F      Shel   Self
## 4      F Christine Sister
## 5      F    Brenda Sister

You can sort vectors directly with sort(), or similarly dplyr::arrange(), to sort a data frame.

40 / 64

4.5.5 Expanding aggregated counts (integer subsetting)

df <- data.frame(x = c(2, 4, 1), y = c(9, 11, 6), n = c(3, 5, 1))
df

##   x  y n
## 1 2  9 3
## 2 4 11 5
## 3 1  6 1

rep(1:nrow(df), df$n)

## [1] 1 1 1 2 2 2 2 2 3

df[rep(1:nrow(df), df$n), ]

##     x  y n
## 1   2  9 3
## 1.1 2  9 3
## 1.2 2  9 3
## 2   4 11 5
## 2.1 4 11 5
## 2.2 4 11 5
## 2.3 4 11 5
## 2.4 4 11 5
## 3   1  6 1

41 / 64

4.5.6 Removing columns from data frames (character )

Method 1: Set individual columns to NULL

df = data.frame(names = c("John", "Teresa", "Shel", "Christine", "Brenda"),
                gender = c("M", "F", "F", "F", "F"),
                rshp = c("Father", "Mother", "Self", "Sister", "Sister"))
df

##       names gender   rshp
## 1      John      M Father
## 2    Teresa      F Mother
## 3      Shel      F   Self
## 4 Christine      F Sister
## 5    Brenda      F Sister

## create a copy of the dataframe
df2 <- df
## drop a column
df2$gender <- NULL
df2

##       names   rshp
## 1      John Father
## 2    Teresa Mother
## 3      Shel   Self
## 4 Christine Sister
## 5    Brenda Sister

42 / 64

Method 2: Subset to return only the columns you want

df[c("names", "rshp")]

##       names   rshp
## 1      John Father
## 2    Teresa Mother
## 3      Shel   Self
## 4 Christine Sister
## 5    Brenda Sister

Method 3: Use set operations to work out which columns to keep. This is useful when you only know the columns that you don't want.

to_keep <- setdiff(names(df), "gender")
to_keep

## [1] "names" "rshp"

df[to_keep]

##       names   rshp
## 1      John Father
## 2    Teresa Mother
## 3      Shel   Self
## 4 Christine Sister
## 5    Brenda Sister

43 / 64

4.5.7 Selecting rows based on a condition (logical subsetting)

library(rChambua)
head(wafanyikazi, n=3)

##     Sid Gender Age       Department   Role Income Marital_Status  County
## 1 10715   Male  31          Finance    Mid   5991         Single  Kisumu
## 2 17041   Male  48 Research Analyst Junior   3387       Divorced   Wajir
## 3 16232   Male  35       Operations Junior   3170        Married Mombasa
##   Leave_Days Promotion
## 1         11        No
## 2          8       Yes
## 3          0        No

Select all juniors

df1 <- wafanyikazi[wafanyikazi$Role == "Junior",]
head(df1, 3)

##     Sid Gender Age       Department   Role Income Marital_Status  County
## 2 17041   Male  48 Research Analyst Junior   3387       Divorced   Wajir
## 3 16232   Male  35       Operations Junior   3170        Married Mombasa
## 5 13463 Female  43        Associate Junior   1651        Married Nairobi
##   Leave_Days Promotion
## 2          8       Yes
## 3          0        No
## 5          2       Yes

44 / 64

Select females who come from Nyeri county

df2 <- wafanyikazi[wafanyikazi$Gender == "Female" & 
                     wafanyikazi$County == "Nyeri",]
head(df2, 3)

##      Sid Gender Age Department   Role Income Marital_Status County Leave_Days
## 4  19576 Female  41    Finance Senior   5557        Married  Nyeri          8
## 34 18997 Female  32  Associate Senior   5340         Single  Nyeri         14
## 67 19891 Female  48 Operations Senior   9029        Married  Nyeri         14
##    Promotion
## 4         No
## 34        No
## 67        No

De Morgan’s laws:

!(X & Y) is the same as !X | !Y
!(X | Y) is the same as !X & !Y

45 / 64

4.5.8 Boolean algebra versus sets (logical and integer )

Two types of subsetting.

integer subsetting: (set operations)
- Effective when you want to find the first (or last) TRUE and
- You have very few TRUEs and very many FALSEs; a set representation may be faster and require less storage.
logical subsetting: (Boolean algebra)

which() allows you to convert a Boolean representation to an integer representation.

which(df$names %in% "John")

## [1] 1

46 / 64

You can create a function that does the reverse i.e. converts an integer representation to a Boolean representation.

Do we really need to do this?

unwhich <- function(x, n) {
  out <- rep_len(FALSE, n)
  out[x] <- TRUE
  out
}
unwhich(which(df$names %in% "John"), length(df$names))

## [1]  TRUE FALSE FALSE FALSE FALSE

When we can just do this?

df$names %in% "John"

## [1]  TRUE FALSE FALSE FALSE FALSE

47 / 64

Relationship between Boolean and set operations.

Create two logical vectors (x1 , y1) and their integer equivalents (x2, y2)

(x1 <- 1:10 %% 2 == 0)

##  [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE

(x2 <- which(x1))

## [1]  2  4  6  8 10

(y1 <- 1:10 %% 5 == 0)

##  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE

(y2 <- which(y1))

## [1]  5 10

48 / 64

X & Y <-> intersect(x, y)

x1 & y1

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

intersect(x2, y2)

## [1] 10

49 / 64

X & Y <-> intersect(x, y)

x1 & y1

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

intersect(x2, y2)

## [1] 10

X | Y <-> union(x, y)

x1 | y1

##  [1] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

union(x2, y2)

## [1]  2  4  6  8 10  5

50 / 64

X & !Y <-> setdiff(x, y)

x1 & !y1

##  [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE

setdiff(x2, y2)

## [1] 2 4 6 8

51 / 64

X & !Y <-> setdiff(x, y)

x1 & !y1

##  [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE

setdiff(x2, y2)

## [1] 2 4 6 8

xor(X, Y) <-> setdiff(union(x, y), intersect(x, y))

xor(x1, y1)

##  [1] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE

setdiff(union(x2, y2), intersect(x2, y2))

## [1] 2 4 6 8 5

52 / 64

ExercisesHow would you randomly permute the columns of a data frame? (This is an important technique in random forests.) Can you simultaneously permute the rows and columns in one step?
53 / 64

Exercises

How would you randomly permute the columns of a data frame? (This is an important technique in random forests.) Can you simultaneously permute the rows and columns in one step?

# Read in the data
df <- rChambua::wafanyikazi
head(df1, n=3)

##     Sid Gender Age       Department   Role Income Marital_Status  County
## 2 17041   Male  48 Research Analyst Junior   3387       Divorced   Wajir
## 3 16232   Male  35       Operations Junior   3170        Married Mombasa
## 5 13463 Female  43        Associate Junior   1651        Married Nairobi
##   Leave_Days Promotion
## 2          8       Yes
## 3          0        No
## 5          2       Yes

54 / 64

# Permutate columns
df1 <- df[,sample(names(df))]
head(df1, n=3)

##     Role Marital_Status Leave_Days Gender   Sid Promotion       Department
## 1    Mid         Single         11   Male 10715        No          Finance
## 2 Junior       Divorced          8   Male 17041       Yes Research Analyst
## 3 Junior        Married          0   Male 16232        No       Operations
##    County Age Income
## 1  Kisumu  31   5991
## 2   Wajir  48   3387
## 3 Mombasa  35   3170

55 / 64

# Permutate columns
df1 <- df[,sample(names(df))]
head(df1, n=3)

##     Role Marital_Status Leave_Days Gender   Sid Promotion       Department
## 1    Mid         Single         11   Male 10715        No          Finance
## 2 Junior       Divorced          8   Male 17041       Yes Research Analyst
## 3 Junior        Married          0   Male 16232        No       Operations
##    County Age Income
## 1  Kisumu  31   5991
## 2   Wajir  48   3387
## 3 Mombasa  35   3170

# Permutate rows and columns
df2 <- df[sample(nrow(df)),sample(names(df))]
head(df2, n=3)

##     Leave_Days   Role Marital_Status Age Income   Sid Promotion Gender  County
## 224         20 Junior        Married  33   7727 17704       Yes   Male Nairobi
## 405         18 Senior       Divorced  30   6425 17770        No   Male    Lamu
## 463         11 Junior       Divorced  29   3604 15549       Yes Female   Taita
##     Department
## 224    Finance
## 405       Data
## 463  Associate

56 / 64

How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?
57 / 64

How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?

# Generate a vector of the first and last row ids
first_last_ids <- c(1,nrow(df))
first_last_ids

## [1]   1 500

58 / 64

How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?

# Generate a vector of the first and last row ids
first_last_ids <- c(1,nrow(df))
first_last_ids

## [1]   1 500

# Sample m (2) rows from the dataframe, excluding the first and last rows
original_ids <- 1:nrow(df)
other_ids <- sample(original_ids[!original_ids %in% first_last_ids] , 2)
other_ids

## [1] 289 488

59 / 64

How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?

# Generate a vector of the first and last row ids
first_last_ids <- c(1,nrow(df))
first_last_ids

## [1]   1 500

# Sample m (2) rows from the dataframe, excluding the first and last rows
original_ids <- 1:nrow(df)
other_ids <- sample(original_ids[!original_ids %in% first_last_ids] , 2)
other_ids

## [1] 289 488

# Combine the first, last and the rows in between
final_ids <- c(first_last_ids[1], other_ids, first_last_ids[2])
final_ids

## [1]   1 289 488 500

60 / 64

# Call the data, with only these specific rows
df3 <- df[final_ids,]
df3

##       Sid Gender Age Department   Role Income Marital_Status    County
## 1   10715   Male  31    Finance    Mid   5991         Single    Kisumu
## 289 14070   Male  24    Finance Junior   9680         Single Kirinyaga
## 488 19363 Female  38  Associate Junior   8378       Divorced     Nyeri
## 500 16114 Female  22    Finance Junior   2736       Divorced   Mombasa
##     Leave_Days Promotion
## 1           11        No
## 289          2       Yes
## 488          0        No
## 500         24       Yes

61 / 64

How could you put the columns in a data frame in alphabetical order?
62 / 64

How could you put the columns in a data frame in alphabetical order?

df4 <- df[,order(names(df))]
head(df4)

##   Age  County       Department Gender Income Leave_Days Marital_Status
## 1  31  Kisumu          Finance   Male   5991         11         Single
## 2  48   Wajir Research Analyst   Male   3387          8       Divorced
## 3  35 Mombasa       Operations   Male   3170          0        Married
## 4  41   Nyeri          Finance Female   5557          8        Married
## 5  43 Nairobi        Associate Female   1651          2        Married
## 6  30   Taita          Finance Female   6859          9         Single
##   Promotion   Role   Sid
## 1        No    Mid 10715
## 2       Yes Junior 17041
## 3        No Junior 16232
## 4        No Senior 19576
## 5       Yes Junior 13463
## 6       Yes Junior 19788

63 / 64

Discussion

...

64 / 64

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Advanced R

Chapter 4: Subsetting

Alan Kinene @alankinene & Shel Kariuki @Shel_Kariuki

2020/08/25

Outline

Outline

Introduction

4.2: Selecting multiple elements

Subsetting atomic vectors

Subsetting atomic vectors

Subsetting atomic vectors

Subsetting lists

Subsetting matrices and arrays

Subset with multiple vectors.

Subsetting matrices and arrays

Subset with a single vector

Subset with a matrix

Subsetting data frames and tibbles

Preserving dimensionality

For matrices and arrays, any dimensions with length 1 will be dropped:

Data frames with a single column will return just that column

4.3: Selecting a single element

Use of [[

Use of [[

Use of [[

Use of $

Use of $

Using @ and slot()

Exercises

.

.

.

4.4: Subsetting and assignment

4.5: Applications (Using subsetting to solve problems)

4.5: Applications (Using subsetting to solve problems)

4.5: Applications (Using subsetting to solve problems)

4.5: Applications (Using subsetting to solve problems)

4.5: Applications (Using subsetting to solve problems)

4.5: Applications (Using subsetting to solve problems)

4.5: Applications (Using subsetting to solve problems)

4.5: Applications (Using subsetting to solve problems)

4.5.1 Lookup tables (character subsetting)

4.5.2 Matching and merging by hand (integer subsetting)

4.5.3 Random samples and bootstraps (integer subsetting)

4.5.4 Ordering (integer subsetting)

4.5.5 Expanding aggregated counts (integer subsetting)

4.5.6 Removing columns from data frames (character )

4.5.7 Selecting rows based on a condition (logical subsetting)

4.5.8 Boolean algebra versus sets (logical and integer )

Relationship between Boolean and set operations.

X & Y <-> intersect(x, y)

X & Y <-> intersect(x, y)

X | Y <-> union(x, y)

X & !Y <-> setdiff(x, y)

X & !Y <-> setdiff(x, y)

xor(X, Y) <-> setdiff(union(x, y), intersect(x, y))

Exercises

Exercises

Discussion

Outline

Help

Alan Kinene @alankinene

&

Shel Kariuki @Shel_Kariuki

For matrices and arrays, any
dimensions with length 1 will
be dropped:

Data frames with a single column
will return just that column

Use of `[[`

Use of `[[`

Use of `[[`

Use of `$`

Use of `$`