20.5 Reading worksheets

An important feature that distinguishes spreadsheets from flat files is the notion of multiple sheets.

You can read a single worksheet from a spreadsheet with the sheet argument in read_excel(). The default, which we’ve been relying on up until now, is the first sheet.

read_excel("data/penguins.xlsx", sheet = "Torgersen Island")
## # A tibble: 52 × 8
##    species island    bill_length_mm  bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>     <chr>           <chr>         <chr>             <chr>      
##  1 Adelie  Torgersen 39.1            18.7          181               3750       
##  2 Adelie  Torgersen 39.5            17.399999999… 186               3800       
##  3 Adelie  Torgersen 40.29999999999… 18            195               3250       
##  4 Adelie  Torgersen NA              NA            NA                NA         
##  5 Adelie  Torgersen 36.70000000000… 19.3          193               3450       
##  6 Adelie  Torgersen 39.29999999999… 20.6          190               3650       
##  7 Adelie  Torgersen 38.9            17.8          181               3625       
##  8 Adelie  Torgersen 39.20000000000… 19.600000000… 195               4675       
##  9 Adelie  Torgersen 34.1            18.100000000… 193               3475       
## 10 Adelie  Torgersen 42              20.2          190               4250       
## # ℹ 42 more rows
## # ℹ 2 more variables: sex <chr>, year <dbl>

Some variables that appear to contain numerical data are read in as characters due to the character string “NA” not being recognized as a true NA.

penguins_torgersen <- read_excel("data/penguins.xlsx", sheet = "Torgersen Island", na = "NA")

penguins_torgersen
## # A tibble: 52 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 42 more rows
## # ℹ 2 more variables: sex <chr>, year <dbl>

Alternatively, we can use excel_sheets() to get information on all worksheets in an Excel spreadsheet, and then read the one(s) you’re interested in.

excel_sheets("data/penguins.xlsx")
## [1] "Torgersen Island" "Biscoe Island"    "Dream Island"

Once you know the names of the worksheets, you can read them in individually with read_excel().

penguins_biscoe <- read_excel("data/penguins.xlsx", sheet = "Biscoe Island", na = "NA")

penguins_dream  <- read_excel("data/penguins.xlsx", sheet = "Dream Island", na = "NA")
dim(penguins_torgersen)
## [1] 52  8
dim(penguins_biscoe)
## [1] 168   8
dim(penguins_dream)
## [1] 124   8

We can put them together with bind_rows().

penguins <- bind_rows(penguins_torgersen, penguins_biscoe, penguins_dream)

penguins
## # A tibble: 344 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 2 more variables: sex <chr>, year <dbl>
  • In Chapter 27 we’ll talk about ways of doing this sort of task without repetitive code.