+ - 0:00:00
Notes for current slide
Notes for next slide

Advanced R

Chapter 4: Subsetting

Alan Kinene @alankinene

&

Shel Kariuki @Shel_Kariuki

2020/08/25

1 / 64

Outline

  • Section 4.1: Introduction

  • Section 4.2: Selecting multiple elements

  • Section 4.3: Selecting a single element

2 / 64

Outline

  • Section 4.1: Introduction

  • Section 4.2: Selecting multiple elements

  • Section 4.3: Selecting a single element

  • Section 4.4: Subsetting and assignment

  • Section 4.5: Applications (Using subsetting to solve problems)

3 / 64

Introduction

  • Interrelated concepts to internalise:

    • There are 3 subsetting operators: [[, [, and $

    • Subsetting operators interact differently with various vector types (e.g. atomic vectors, lists, factors, matrices, and data frames)

    • Subsetting and assignment can be combined ("subsassignment")

Subsetting complements structure, or str(), which shows you all the pieces of an object, but subsetting lets you pull out only the pieces you are interested in.

Often useful to use RStudio Viewer, with View(my_object) to know which pieces you want to subset

4 / 64

4.2: Selecting multiple elements

5 / 64

Subsetting atomic vectors

Use [ to select any number of elements from a vector.

6 / 64

Subsetting atomic vectors

Use [ to select any number of elements from a vector.

Assume we have a simple vector: x <- c(2.1, 4.2, 3.3, 5.4)

  • Positive integers return elements at the specified positions:
x[c(3, 1)]
## [1] 3.3 2.1
  • Negative integers exclude elements at the specified positions:
x[-c(3, 1)]
## [1] 4.2 5.4
7 / 64

Subsetting atomic vectors

  • Logical vectors select elements where the corresponding logical value is TRUE
x[c(TRUE, TRUE, FALSE, FALSE)]
## [1] 2.1 4.2
x[x > 3]
## [1] 4.2 3.3 5.4

x[c(TRUE, FALSE)] is equivalent to x[c(TRUE, FALSE, TRUE, FALSE)]

  • Nothing returns the original vector.
x[]
## [1] 2.1 4.2 3.3 5.4
8 / 64
  • Zero returns a zero-length vector.
x[0]
## numeric(0)
  • Named vector
(y <- setNames(x, letters[1:4]))
## a b c d
## 2.1 4.2 3.3 5.4
y[c("d", "c", "a")]
## d c a
## 5.4 3.3 2.1
9 / 64

Subsetting lists

  • Subsetting a list works in the same way as subsetting an atomic vector.
  • Using [ always returns a list

  • [[ and $, as described in Section 4.3, let you pull out elements of a list.

10 / 64

Subsetting matrices and arrays

The most common way of subsetting matrices (2D) and arrays (>2D) is a simple generalisation of 1D subsetting

Subset with multiple vectors.

a <- matrix(1:9, nrow = 3)
colnames(a) <- c("A", "B", "C")
a[1:2, ]
## A B C
## [1,] 1 4 7
## [2,] 2 5 8
a[c(TRUE, FALSE, TRUE),
c("B", "A")]
## B A
## [1,] 4 1
## [2,] 6 3
11 / 64

Subsetting matrices and arrays

Consider the matrix below:

(vals = matrix(1:25, ncol = 5, byrow = TRUE))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 6 7 8 9 10
## [3,] 11 12 13 14 15
## [4,] 16 17 18 19 20
## [5,] 21 22 23 24 25

Subset with a single vector

vals[c(4, 15)]
## [1] 16 23

Subset with a matrix

select <- matrix(ncol = 2, byrow = TRUE, c(
1, 1,
3, 1,
2, 4
))
vals[select]
## [1] 1 11 9
12 / 64

Subsetting data frames and tibbles

  • Data frames have the characteristics of both lists and matrices.

  • When subsetting with a single index, they behave like lists and index the columns, so df[1:2] selects the first two columns.

  • When subsetting with two indices, they behave like matrices, so df[1:3, ] selects the first three rows (and all the columns

Given df <- data.frame(x = 1:3, y = 3:1, z = letters[1:3]) what is the output for:
df[df$x == 2, ],
df[c("x", "z")],
df[, c("x", "z")],
str(df["x"]), and
str(df[, "x"])?

13 / 64

Preserving dimensionality

By default, subsetting a matrix or data frame with a single number, a single name, or a logical vector containing a single TRUE, will simplify the returned output, i.e. it will return an object with lower dimensionality.

To preserve the original dimensionality, you must use drop = FALSE

For matrices and arrays, any
dimensions with length 1 will
be dropped:

a <- matrix(1:4, nrow = 2)
str(a[1, ])
## int [1:2] 1 3
str(a[1, , drop = FALSE])
## int [1, 1:2] 1 3

Data frames with a single column
will return just that column

df <- data.frame(a = 1:2, b = 1:2)
str(df[, "a"])
## int [1:2] 1 2
str(df[, "a", drop = FALSE])
## 'data.frame': 2 obs. of 1 variable:
## $ a: int 1 2
14 / 64

4.3: Selecting a single element

There are two other subsetting operators: [[ and $.
[[ is used for extracting single items, while x$y is a useful shorthand for x[["y"]]

15 / 64

Use of [[

  • Primary use case for [[ is when working with lists, as you get a list back.

    If list x is a train carrying objects, then x[[5]] is the object in car 5; x[4:6] is a train of cars 4-6.— @RLangTip

x <- list(1:3, "a", 4:6)
16 / 64

Use of [[

17 / 64

Use of [[

  • If you use a vector with [[, it will subset recursively, i.e. x[[c(1, 2)]] is equivalent to x[[1]][[2]].
18 / 64

Use of $

  • $ is a shorthand operator: x$y is roughly equivalent to x[["y"]].
  • Often used to access variables in a data frame, as in mtcars$cyl or diamonds$carat.
  • One common mistake with $ is to use it when you have the name of a column stored in a variable:

    If var <- "cyl", mtcars$var doesn't work because it is translated to mtcars[["var"]]. Instead use mtcars[[var]]

19 / 64

Use of $

  • The one important difference between $ and [[ is that $ does (left-to-right) partial matching.
x <- list(abc = 1)
x$a
## [1] 1
x[["a"]]
## NULL
  • You can avoid this behaviour:
options(warnPartialMatchDollar = TRUE)
x$a
## [1] 1

Remember: For data frames, you can also avoid this problem by using tibbles, which never do partial matching.

20 / 64

Using @ and slot()

  • Two additional subsetting operators, which are needed for S4 objects:
    1. @ (equivalent to $)
    2. slot() (equivalent to [[).

@ is more restrictive than $ in that it will return an error if the slot does not exist.

21 / 64

Exercises


.


.


.

22 / 64

4.4: Subsetting and assignment

Subassignment: Combining subsetting operators with assignments to modify selected values in an input vector.

The basic form is x[i] <- value

Ensure that:

  • length(value) == length(x[i])
# wafanyikazi$new_var <- 1:10000
# Error in `$<-.data.frame`(`*tmp*`, new_var, value = 1:10000) :
#replacement has 10000 rows, data has 500
  • i is unique
23 / 64

To remove a component, use x[[i]] <- NULL

departments <- list("data", "operations", "finance")
departments
## [[1]]
## [1] "data"
##
## [[2]]
## [1] "operations"
##
## [[3]]
## [1] "finance"
departments[[3]] <- NULL
departments
## [[1]]
## [1] "data"
##
## [[2]]
## [1] "operations"

To add a literal NULL, use x[i] <- list(NULL)

departments[3] <- list(NULL)
departments
## [[1]]
## [1] "data"
##
## [[2]]
## [1] "operations"
##
## [[3]]
## NULL
24 / 64

4.5: Applications (Using subsetting to solve problems)

  • Lookup tables (character subsetting)
25 / 64

4.5: Applications (Using subsetting to solve problems)

  • Lookup tables (character subsetting)

  • Matching and merging by hand (integer subsetting)

26 / 64

4.5: Applications (Using subsetting to solve problems)

  • Lookup tables (character subsetting)

  • Matching and merging by hand (integer subsetting)

  • Random samples and bootstraps (integer subsetting)

27 / 64

4.5: Applications (Using subsetting to solve problems)

  • Lookup tables (character subsetting)

  • Matching and merging by hand (integer subsetting)

  • Random samples and bootstraps (integer subsetting)

  • Ordering (integer subsetting)

28 / 64

4.5: Applications (Using subsetting to solve problems)

  • Lookup tables (character subsetting)

  • Matching and merging by hand (integer subsetting)

  • Random samples and bootstraps (integer subsetting)

  • Ordering (integer subsetting)

  • Expanding aggregated counts (integer subsetting)

29 / 64

4.5: Applications (Using subsetting to solve problems)

  • Lookup tables (character subsetting)

  • Matching and merging by hand (integer subsetting)

  • Random samples and bootstraps (integer subsetting)

  • Ordering (integer subsetting)

  • Expanding aggregated counts (integer subsetting)

  • Removing columns from data frames (character )

30 / 64

4.5: Applications (Using subsetting to solve problems)

  • Lookup tables (character subsetting)

  • Matching and merging by hand (integer subsetting)

  • Random samples and bootstraps (integer subsetting)

  • Ordering (integer subsetting)

  • Expanding aggregated counts (integer subsetting)

  • Removing columns from data frames (character )

  • Selecting rows based on a condition (logical subsetting)

31 / 64

4.5: Applications (Using subsetting to solve problems)

  • Lookup tables (character subsetting)

  • Matching and merging by hand (integer subsetting)

  • Random samples and bootstraps (integer subsetting)

  • Ordering (integer subsetting)

  • Expanding aggregated counts (integer subsetting)

  • Removing columns from data frames (character )

  • Selecting rows based on a condition (logical subsetting)

  • Boolean algebra versus sets (logical and integer )

32 / 64

4.5.1 Lookup tables (character subsetting)

Character matching

x <- c("m", "f", "u", "f", "f", "m", "m")
lookup <- c(m = "Male", f = "Female", u = NA)
lookup[x] ## Is this the same as saying look for x in the vector lookup? Is it also the same as using an ifelse function?
## m f u f f m m
## "Male" "Female" NA "Female" "Female" "Male" "Male"

We can exclude names in the results using:

unname(lookup[x])
## [1] "Male" "Female" NA "Female" "Female" "Male" "Male"
33 / 64

4.5.2 Matching and merging by hand (integer subsetting)

grades <- c(1, 2, 2, 3, 1)
info <- data.frame(
grade = 3:1,
desc = c("Excellent", "Good", "Poor"),
fail = c(F, F, T)
)
head(info)
## grade desc fail
## 1 3 Excellent FALSE
## 2 2 Good FALSE
## 3 1 Poor TRUE

Assuming we want to duplicate the info table so that we have a row for each value in grades.

match(needles, haystack) // look for (needles, haystack)

34 / 64

What is the position of the needles [grades elements : (1,2,2,3,1)] in the haystack [info$grade: (3,2,1)]

id <- match(grades, info$grade)
id
## [1] 3 2 2 1 3
info[id, ]
## grade desc fail
## 3 1 Poor TRUE
## 2 2 Good FALSE
## 2.1 2 Good FALSE
## 1 3 Excellent FALSE
## 3.1 1 Poor TRUE

When matching on multiple columns, you will need to first collapse them into a single column (with e.g interaction()).

## insert intersection code here

But dplyr{} *_join() functions would be your best friends at this point

35 / 64

4.5.3 Random samples and bootstraps (integer subsetting)

Using integer indices to randomly sample or bootstrap a vector or data frame.

Use sample(n) to generate a random permutation of 1:n, and then use the results to subset the values

Simulate a dataframe

df = data.frame(names = c("John", "Teresa", "Shel", "Christine", "Brenda"),
gender = c("M", "F", "F", "F", "F"),
rshp = c("Father", "Mother", "Self", "Sister", "Sister"))
df
## names gender rshp
## 1 John M Father
## 2 Teresa F Mother
## 3 Shel F Self
## 4 Christine F Sister
## 5 Brenda F Sister
36 / 64

Reorder the dataframe randomly

df[sample(nrow(df)), ]
## names gender rshp
## 1 John M Father
## 5 Brenda F Sister
## 4 Christine F Sister
## 3 Shel F Self
## 2 Teresa F Mother

Select two random rows

df[sample(nrow(df), 2), ]
## names gender rshp
## 5 Brenda F Sister
## 3 Shel F Self

Select 7 bootstrap replicates

df[sample(nrow(df), 7, replace = T), ]
## names gender rshp
## 5 Brenda F Sister
## 3 Shel F Self
## 5.1 Brenda F Sister
## 3.1 Shel F Self
## 3.2 Shel F Self
## 1 John M Father
## 1.1 John M Father
37 / 64

4.5.4 Ordering (integer subsetting)

order() takes a vector as its input and returns an integer vector describing how to order the subsetted vector

fam <- c("John", "Teresa", "Shel", "Christine", "Brenda")
order(fam) ## orders alphabetically (in ascending order by default)
## [1] 5 4 1 3 2
fam[order(fam)]
## [1] "Brenda" "Christine" "John" "Shel" "Teresa"
## We can also order the vector in ascending order
fam[order(fam, decreasing = T)]
## [1] "Teresa" "Shel" "John" "Christine" "Brenda"

NB: By default, any missing values will be put at the end of the vector; however, you can remove them with na.last = NA or put them at the front with na.last = FALSE.

# us <- c("Me", "You", NA)
# us[order(us)]
# us[order(us, na.last = FALSE)]
38 / 64

Using order() to order values in a variable, or variables themselves, in a dataframe

# Randomly reorder df
df2 <- df[sample(nrow(df)), 3:1]
df2
## rshp gender names
## 4 Sister F Christine
## 2 Mother F Teresa
## 3 Self F Shel
## 1 Father M John
## 5 Sister F Brenda
# Order by one variable
df[order(df$gender), ]
## names gender rshp
## 2 Teresa F Mother
## 3 Shel F Self
## 4 Christine F Sister
## 5 Brenda F Sister
## 1 John M Father
39 / 64
# Order the variables themselves
df[, order(names(df))]
## gender names rshp
## 1 M John Father
## 2 F Teresa Mother
## 3 F Shel Self
## 4 F Christine Sister
## 5 F Brenda Sister

You can sort vectors directly with sort(), or similarly dplyr::arrange(), to sort a data frame.

40 / 64

4.5.5 Expanding aggregated counts (integer subsetting)

df <- data.frame(x = c(2, 4, 1), y = c(9, 11, 6), n = c(3, 5, 1))
df
## x y n
## 1 2 9 3
## 2 4 11 5
## 3 1 6 1
rep(1:nrow(df), df$n)
## [1] 1 1 1 2 2 2 2 2 3
df[rep(1:nrow(df), df$n), ]
## x y n
## 1 2 9 3
## 1.1 2 9 3
## 1.2 2 9 3
## 2 4 11 5
## 2.1 4 11 5
## 2.2 4 11 5
## 2.3 4 11 5
## 2.4 4 11 5
## 3 1 6 1
41 / 64

4.5.6 Removing columns from data frames (character )

Method 1: Set individual columns to NULL

df = data.frame(names = c("John", "Teresa", "Shel", "Christine", "Brenda"),
gender = c("M", "F", "F", "F", "F"),
rshp = c("Father", "Mother", "Self", "Sister", "Sister"))
df
## names gender rshp
## 1 John M Father
## 2 Teresa F Mother
## 3 Shel F Self
## 4 Christine F Sister
## 5 Brenda F Sister
## create a copy of the dataframe
df2 <- df
## drop a column
df2$gender <- NULL
df2
## names rshp
## 1 John Father
## 2 Teresa Mother
## 3 Shel Self
## 4 Christine Sister
## 5 Brenda Sister
42 / 64

Method 2: Subset to return only the columns you want

df[c("names", "rshp")]
## names rshp
## 1 John Father
## 2 Teresa Mother
## 3 Shel Self
## 4 Christine Sister
## 5 Brenda Sister

Method 3: Use set operations to work out which columns to keep. This is useful when you only know the columns that you don't want.

to_keep <- setdiff(names(df), "gender")
to_keep
## [1] "names" "rshp"
df[to_keep]
## names rshp
## 1 John Father
## 2 Teresa Mother
## 3 Shel Self
## 4 Christine Sister
## 5 Brenda Sister
43 / 64

4.5.7 Selecting rows based on a condition (logical subsetting)

library(rChambua)
head(wafanyikazi, n=3)
## Sid Gender Age Department Role Income Marital_Status County
## 1 10715 Male 31 Finance Mid 5991 Single Kisumu
## 2 17041 Male 48 Research Analyst Junior 3387 Divorced Wajir
## 3 16232 Male 35 Operations Junior 3170 Married Mombasa
## Leave_Days Promotion
## 1 11 No
## 2 8 Yes
## 3 0 No

Select all juniors

df1 <- wafanyikazi[wafanyikazi$Role == "Junior",]
head(df1, 3)
## Sid Gender Age Department Role Income Marital_Status County
## 2 17041 Male 48 Research Analyst Junior 3387 Divorced Wajir
## 3 16232 Male 35 Operations Junior 3170 Married Mombasa
## 5 13463 Female 43 Associate Junior 1651 Married Nairobi
## Leave_Days Promotion
## 2 8 Yes
## 3 0 No
## 5 2 Yes
44 / 64

Select females who come from Nyeri county

df2 <- wafanyikazi[wafanyikazi$Gender == "Female" &
wafanyikazi$County == "Nyeri",]
head(df2, 3)
## Sid Gender Age Department Role Income Marital_Status County Leave_Days
## 4 19576 Female 41 Finance Senior 5557 Married Nyeri 8
## 34 18997 Female 32 Associate Senior 5340 Single Nyeri 14
## 67 19891 Female 48 Operations Senior 9029 Married Nyeri 14
## Promotion
## 4 No
## 34 No
## 67 No

De Morgan’s laws:

  • !(X & Y) is the same as !X | !Y

  • !(X | Y) is the same as !X & !Y

45 / 64

4.5.8 Boolean algebra versus sets (logical and integer )

Two types of subsetting.

  • integer subsetting: (set operations)

    • Effective when you want to find the first (or last) TRUE and

    • You have very few TRUEs and very many FALSEs; a set representation may be faster and require less storage.

  • logical subsetting: (Boolean algebra)

which() allows you to convert a Boolean representation to an integer representation.

which(df$names %in% "John")
## [1] 1
46 / 64

You can create a function that does the reverse i.e. converts an integer representation to a Boolean representation.

Do we really need to do this?

unwhich <- function(x, n) {
out <- rep_len(FALSE, n)
out[x] <- TRUE
out
}
unwhich(which(df$names %in% "John"), length(df$names))
## [1] TRUE FALSE FALSE FALSE FALSE

When we can just do this?

df$names %in% "John"
## [1] TRUE FALSE FALSE FALSE FALSE
47 / 64

Relationship between Boolean and set operations.

Create two logical vectors (x1 , y1) and their integer equivalents (x2, y2)

(x1 <- 1:10 %% 2 == 0)
## [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
(x2 <- which(x1))
## [1] 2 4 6 8 10
(y1 <- 1:10 %% 5 == 0)
## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
(y2 <- which(y1))
## [1] 5 10
48 / 64

X & Y <-> intersect(x, y)

x1 & y1
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
intersect(x2, y2)
## [1] 10
49 / 64

X & Y <-> intersect(x, y)

x1 & y1
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
intersect(x2, y2)
## [1] 10

X | Y <-> union(x, y)

x1 | y1
## [1] FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE
union(x2, y2)
## [1] 2 4 6 8 10 5
50 / 64

X & !Y <-> setdiff(x, y)

x1 & !y1
## [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
setdiff(x2, y2)
## [1] 2 4 6 8
51 / 64

X & !Y <-> setdiff(x, y)

x1 & !y1
## [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
setdiff(x2, y2)
## [1] 2 4 6 8

xor(X, Y) <-> setdiff(union(x, y), intersect(x, y))

xor(x1, y1)
## [1] FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE
setdiff(union(x2, y2), intersect(x2, y2))
## [1] 2 4 6 8 5
52 / 64

Exercises

  • How would you randomly permute the columns of a data frame? (This is an important technique in random forests.) Can you simultaneously permute the rows and columns in one step?
53 / 64

Exercises

  • How would you randomly permute the columns of a data frame? (This is an important technique in random forests.) Can you simultaneously permute the rows and columns in one step?
# Read in the data
df <- rChambua::wafanyikazi
head(df1, n=3)
## Sid Gender Age Department Role Income Marital_Status County
## 2 17041 Male 48 Research Analyst Junior 3387 Divorced Wajir
## 3 16232 Male 35 Operations Junior 3170 Married Mombasa
## 5 13463 Female 43 Associate Junior 1651 Married Nairobi
## Leave_Days Promotion
## 2 8 Yes
## 3 0 No
## 5 2 Yes
54 / 64
# Permutate columns
df1 <- df[,sample(names(df))]
head(df1, n=3)
## Role Marital_Status Leave_Days Gender Sid Promotion Department
## 1 Mid Single 11 Male 10715 No Finance
## 2 Junior Divorced 8 Male 17041 Yes Research Analyst
## 3 Junior Married 0 Male 16232 No Operations
## County Age Income
## 1 Kisumu 31 5991
## 2 Wajir 48 3387
## 3 Mombasa 35 3170
55 / 64
# Permutate columns
df1 <- df[,sample(names(df))]
head(df1, n=3)
## Role Marital_Status Leave_Days Gender Sid Promotion Department
## 1 Mid Single 11 Male 10715 No Finance
## 2 Junior Divorced 8 Male 17041 Yes Research Analyst
## 3 Junior Married 0 Male 16232 No Operations
## County Age Income
## 1 Kisumu 31 5991
## 2 Wajir 48 3387
## 3 Mombasa 35 3170
# Permutate rows and columns
df2 <- df[sample(nrow(df)),sample(names(df))]
head(df2, n=3)
## Leave_Days Role Marital_Status Age Income Sid Promotion Gender County
## 224 20 Junior Married 33 7727 17704 Yes Male Nairobi
## 405 18 Senior Divorced 30 6425 17770 No Male Lamu
## 463 11 Junior Divorced 29 3604 15549 Yes Female Taita
## Department
## 224 Finance
## 405 Data
## 463 Associate
56 / 64
  • How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?
57 / 64
  • How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?
# Generate a vector of the first and last row ids
first_last_ids <- c(1,nrow(df))
first_last_ids
## [1] 1 500
58 / 64
  • How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?
# Generate a vector of the first and last row ids
first_last_ids <- c(1,nrow(df))
first_last_ids
## [1] 1 500
# Sample m (2) rows from the dataframe, excluding the first and last rows
original_ids <- 1:nrow(df)
other_ids <- sample(original_ids[!original_ids %in% first_last_ids] , 2)
other_ids
## [1] 289 488
59 / 64
  • How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?
# Generate a vector of the first and last row ids
first_last_ids <- c(1,nrow(df))
first_last_ids
## [1] 1 500
# Sample m (2) rows from the dataframe, excluding the first and last rows
original_ids <- 1:nrow(df)
other_ids <- sample(original_ids[!original_ids %in% first_last_ids] , 2)
other_ids
## [1] 289 488
# Combine the first, last and the rows in between
final_ids <- c(first_last_ids[1], other_ids, first_last_ids[2])
final_ids
## [1] 1 289 488 500
60 / 64
# Call the data, with only these specific rows
df3 <- df[final_ids,]
df3
## Sid Gender Age Department Role Income Marital_Status County
## 1 10715 Male 31 Finance Mid 5991 Single Kisumu
## 289 14070 Male 24 Finance Junior 9680 Single Kirinyaga
## 488 19363 Female 38 Associate Junior 8378 Divorced Nyeri
## 500 16114 Female 22 Finance Junior 2736 Divorced Mombasa
## Leave_Days Promotion
## 1 11 No
## 289 2 Yes
## 488 0 No
## 500 24 Yes
61 / 64
  • How could you put the columns in a data frame in alphabetical order?
62 / 64
  • How could you put the columns in a data frame in alphabetical order?
df4 <- df[,order(names(df))]
head(df4)
## Age County Department Gender Income Leave_Days Marital_Status
## 1 31 Kisumu Finance Male 5991 11 Single
## 2 48 Wajir Research Analyst Male 3387 8 Divorced
## 3 35 Mombasa Operations Male 3170 0 Married
## 4 41 Nyeri Finance Female 5557 8 Married
## 5 43 Nairobi Associate Female 1651 2 Married
## 6 30 Taita Finance Female 6859 9 Single
## Promotion Role Sid
## 1 No Mid 10715
## 2 Yes Junior 17041
## 3 No Junior 16232
## 4 No Senior 19576
## 5 Yes Junior 13463
## 6 Yes Junior 19788
63 / 64

Discussion

...

...

64 / 64

Outline

  • Section 4.1: Introduction

  • Section 4.2: Selecting multiple elements

  • Section 4.3: Selecting a single element

2 / 64
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow