• R for Data Science Book Club
  • Welcome
    • Book club meetings
    • Pace
    • Learning objectives
    • Today’s learning objectives
    • GitHub
  • Introduction
    • A typical data science project
    • Import the data
    • Wrangle the data
    • Understand & communicate
    • The order of content in this book
    • Not covered by this book
    • Prerequisites
    • Prerequisites: Install these packages
    • Conventions within the book: #>
    • Conventions within the book: Code
    • Bonus: Resources for getting help
    • Bonus: Reproducible examples
    • Meeting Videos
      • 0.0.1 Cohort 5
      • 0.0.2 Cohort 6
      • 0.0.3 Cohort 7
      • 0.0.4 Cohort 8
  • Whole game
  • 1 Data visualisation
    • Loading Packages in R
    • Peeking at the palmerpenguins
    • Creating a ggplot: ggplot()
    • Creating a ggplot: geom_*()
    • Adding Aesthetics and Layers
    • Adding Layers
    • Global vs. Local Aesthetics
    • Improving Accessibility
    • Why Stop There?
    • Exercises
    • ggplot2 calls
    • Visualizing Distributions: Barplots
    • Visualizing Distributions: Histograms
    • Visualizing Distributions: Density plots
    • (more) Exercises
    • Visualizing relationships
    • Numerical Variable and Categorical Variable
    • Two Categorical Variables
    • Two Numerical Variables
    • Three or More Variables
    • (Even More) Exercises
    • Saving Your Plots
    • (Yet More) Exercises
    • Common Problems
    • Summary
      • Resources:
    • Meeting Videos
      • Cohort 5
      • Cohort 6
      • Cohort 7
      • Cohort 8
  • 2 Workflow: basics
    • Basic math calculations
    • Create new objects
    • Create new objects: Reverse assign
    • Create new objects: Double assign
    • Combining elements
    • Comments
    • Comments: Why?
    • Assigning names
    • Assigning names: Cases
    • Calling Functions
    • Calling Functions: Arguments
    • Other RStudio Features
    • 2.1 Exercises
    • Meeting Videos
      • 2.1.1 Cohort 5
      • 2.1.2 Cohort 7
      • 2.1.3 Cohort 8
  • 3 Data transformation
    • Prerequisites
    • nycflights13
    • dplyr basics: General structure
    • dplyr basics: Pipe
    • Filter rows with filter()
      • Comparisons
      • Logical operators
    • Common mistakes
    • Arrange rows with arrange()
    • Distinct
    • Exercises (rows)
      • Question 1
      • Question 2
      • Question 3
      • Question 4
      • Question 5
      • Question 6
    • Columns
    • Add new variables with mutate()
    • Select columns with select()
    • Rename columns with rename()
    • Move variables around with relocate()
    • Exercises (columns)
      • Question 1
      • Question 2
      • Question 3
      • Question 4
      • Question 5
      • Question 6
      • Question 7
    • The pipe
    • Groups
      • group_by()
    • summarize()
    • Slice functions
    • Grouping by multiple variables
    • Ungrouping
    • Group using .by
    • Exercises (groups)
      • Question 1
      • Question 2
      • Question 3
      • Question 4
      • Question 5
    • Case Study
    • Meeting Videos
      • Cohort 5
      • Cohort 6
      • Cohort 7
      • Cohort 8
  • 4 Workflow: code style
    • Use styler
    • Names
    • Use snake_case for names
    • Names should be descriptive
    • Spaces
    • Spaces in function calls
    • Use extra spaces for alignment
    • Pipes: Spacing
    • Pipes: Newlines
    • Pipes: Indentation
    • Pipes: Breaking the rules
    • Pipe length
    • ggplot2 spacing
    • ggplot2 spacing 2
    • Sectioning comments
    • Exercises
    • Meeting Videos
      • Cohort 5
      • Cohort 6
      • Cohort 7
      • Cohort 8
  • 5 Data tidying
    • Tidy data: what?
    • Tidy data: why?
    • An example (1)
    • An example (2)
    • An example (3)
    • Which example dataframe was tidy?
    • Which example dataframe was tidy?
    • The challenge (1)
    • The challenge (2)
    • Two different scenarios (1)
    • Too wide
    • Two different scenarios (2)
    • Too long
    • Tidy your data (1)
    • Tidy your data (2)
    • 1. One names variable and one values variable
    • Example
    • 2. Multiple names variables and one values variable
    • Example
    • Intermezzo: missing values (1)
    • Intermezzo: missing values (2)
    • Intermezzo: missing values (3)
    • Intermezzo: missing values (4)
    • Intermezzo: missing values (5)
    • Intermezzo: missing values (6)
    • Intermezzo: missing values (7)
    • Recap
    • 3. The values variable’s name is included in the wide-table’s column names
    • Example
    • 4. Multiple values variables’ names are included in the wide-table’s column names
    • Example
    • There’s more!
    • Meeting Videos
      • Cohort 5
      • Cohort 6
      • Cohort 7
      • Cohort 8
      • Cohort 9
      • Cohort 10
  • 6 Workflow: scripts and projects
    • Scripts
    • Scripts: panes layout
    • Running Code
    • RStudio Dianostics
    • Saving and Naming Files
    • Projects
      • What Is the Source of Truth?
    • Where Does Your Analysis Live?
    • RStudio Projects
    • Relative and Absolute Paths
    • Package here
    • Exercises
    • Meeting Videos
      • 6.0.1 Cohort 5
      • 6.0.2 Cohort 6
      • 6.0.3 Cohort 7
      • 6.0.4 Cohort 8
  • 7 Data import
    • Reading data from a file
    • readr::read_csv
    • Reading from URL
    • Transforming data during read
    • Mislabeled NA values
    • Non-syntactic column names
    • janitor 📦 for non-syntactic column names
    • Mislabeled variable types
    • Mislabeled variable types cont.
    • Other arguments
    • skip = n
    • comment = "#"
    • col_names = FALSE
    • col_names as a character vector
    • Other types of files
    • Controlling column types
    • Unexpected values
    • col_types
    • problems()
    • Column types
    • Overriding the default column
    • Reading data from multiple files
    • Reading data from multiple files cont.
    • Writing to a file
    • RDS
    • Arrow
    • Data Entry
    • Summary
    • Exercises
    • Meeting Videos
      • Cohort 5
      • Cohort 6
      • Cohort 7
      • Cohort 8
  • 8 Workflow: getting help
    • 8.1 Google
    • 8.2 Reprex
    • 8.3 Making reprexes reproducible
    • 8.4 Investing in yourself
    • 8.5 Meeting Videos
      • 8.5.1 Cohort 7
      • 8.5.2 Cohort 8
  • Visualize
  • 9 Layers
    • 9.1 Introduction
    • 9.2 Aesthetic mappings
    • 9.3 Mapping categorical variables to aesthetics
    • 9.4 Mapping categorical variables to aesthetics cont.
    • 9.5
    • 9.6 Manually setting aesthetic propoerties
    • 9.7
    • 9.8 10.2 Exercises
      • 9.8.1 10.2.1.1
      • 9.8.2 10.2.1.4
    • 9.9 Geometric objects
    • 9.10 Geometric objects cont.
    • 9.11
    • 9.12
    • 9.13 10.3 Exercises
    • 9.14 Facets
    • 9.15 10.4 Exercises
    • 9.16 Statistical transformations
    • 9.17 10.5 Exercises
    • 9.18 Position adjustments
    • 9.19 10.6 Exercises
    • 9.20 Coordinate systems
    • 9.21 10.7 Exercises
    • 9.22 Resources
    • 9.23 Meeting Videos
      • 9.23.1 Cohort 5
      • 9.23.2 Cohort 6
      • 9.23.3 Cohort 7
      • 9.23.4 Cohort 8
  • 10 Exploratory Data Analysis
    • 10.1 Learning objectives
    • 10.2 Overall Vocabulary
    • 10.3 Variation
    • 10.4 Missing values
    • 10.5 Covariation
      • 10.5.1 categorical + continuous
      • 10.5.2 categorical + categorical
      • 10.5.3 continuous + continuous
    • 10.6 Finding Patterns
    • 10.7 Simplified ggplot2
    • 10.8 Learning More
    • 10.9 Meeting Videos
      • 10.9.1 Cohort 5
      • 10.9.2 Cohort 6
      • 10.9.3 Cohort 7
      • 10.9.4 Cohort 8
  • 11 Communication
    • 11.1 Tools you need to create good graphics
    • 11.2 Use labels and annotations
    • 11.3 Scales
    • 11.4 Themes
    • 11.5 Save the plot
    • 11.6 Meeting Videos
      • 11.6.1 Cohort 5
      • 11.6.2 Cohort 6
      • 11.6.3 Cohort 7
      • 11.6.4 Cohort 8
  • Transform
  • 12 Logical Vectors
    • What and why
    • Definition
    • Challenging: NA values
    • Operations overview
    • Generating a logical vector (1)
    • Generating a logical vector (2)
    • Generating a logical vector (3)
    • Generating a logical vector (4)
    • Generating a logical vector (5)
    • Generating a logical vector (6)
    • Generating a logical vector (7)
    • Generating a logical vector (8)
    • Generating a logical vector (9)
    • Missing values (1)
    • Missing values (2)
    • Missing values (3)
    • Missing values (4)
    • Missing values (5)
    • Conditional transformations (1)
    • Conditional transformations (2)
    • Conditional transformations (3)
    • Conditional transformations (4)
    • Subsetting vectors
    • Summarizing logical vectors (1)
    • Summarizing logical vectors (2)
    • Summarizing logical vectors (3)
    • Summarizing logical vectors (4)
    • 12.1 Meeting Videos
      • 12.1.1 Cohort 5
      • 12.1.2 Cohort 6
      • 12.1.3 Cohort 7
      • 12.1.4 Cohort 8
  • 13 Numbers
    • 13.1 Introduction
      • 13.1.1 Prerequisites
    • 13.2 Vector basics
    • 13.3 Important types of atomic vector
      • 13.3.1 Logical
      • 13.3.2 Numeric
      • 13.3.3 Character
      • 13.3.4 Missing values
      • 13.3.5 Exercises
    • 13.4 Using atomic vectors
      • 13.4.1 Coercion
      • 13.4.2 Test functions
      • 13.4.3 Scalars and recycling rules
      • 13.4.4 Naming vectors
      • 13.4.5 Subsetting
      • 13.4.6 Exercises
    • 13.5 Recursive vectors (lists)
      • 13.5.1 Visualising lists
      • 13.5.2 Subsetting
      • 13.5.3 Lists of condiments
      • 13.5.4 Exercises
    • 13.6 Attributes
    • 13.7 Augmented vectors
      • 13.7.1 Factors
      • 13.7.2 Dates and date-times
      • 13.7.3 Tibbles
      • 13.7.4 Exercises
    • 13.8 Meeting Videos
      • 13.8.1 Cohort 5
      • 13.8.2 Cohort 6
      • 13.8.3 Cohort 7
      • 13.8.4 Cohort 8
  • 14 Strings
    • Introduction
    • Creating strings: ” and ’
    • Escapes
    • Printing strings
    • Multiple escapes = confusion
    • Raw strings
    • Other special characters
    • str_c()
    • str_c() and missing values
    • str_glue()
    • str_glue() escapes
    • str_flatten()
    • Separating strings
    • separate_longer_delim()
    • separate_longer_position
    • separate_wider_delim()
    • Delim: Omit columns
    • separate_wider_position()
    • Position: omit columns
    • Diagnosing widening problems
    • too_few
    • too_many
    • str_length()
    • str_length() and babynames
    • Substrings
    • Encoding
    • Specifying encoding
    • Locales (1)
    • Locales (2)
    • Meeting Videos
      • Cohort 5
      • Cohort 6
      • Cohort 7
      • Cohort 8
  • 15 Regular expressions
    • Introduction
      • Pre-Requisites
    • Why regexp? Google Sheets find and replace
    • Why regexp? IPv4 loop
    • Why regexp? IPv4 regexp
    • Why regexp? IPv4 regexplained
    • Matching Patterns with Regular Expressions
    • Exact match letters numbers
    • Metacharacters
    • Quantifiers
    • Character classes
    • Alternation
    • Detect matches
    • Detect matches (2)
    • str_detect() related
    • Count matches
    • Pitfall: case sensitive regex
    • Extract variables
    • Extract variables debug
    • Pattern details
    • Escaping metacharacters
    • Escape literal \
    • Escaping metacharacters, alternatives
    • Character classes
    • Special character classes
    • Quantifiers
    • Operator precedence and parentheses
    • Grouping and capturing
    • Back references in str_replace()
    • Non-capturing group
    • Regex flags, ignore_case
    • Regex flags, dotall and multiline
    • Regex flags, comments
    • Fixed matches
    • Fixed matches, coll()
    • General techniques
    • Meeting Videos
      • Cohort 5
      • Cohort 6
      • Cohort 7
      • Cohort 8
  • 16 Factors
    • Introduction
    • Factor basics
    • Fix issues with strings using factors
    • Values not included in levels
    • Other ways of dealing with factors
    • General Social Survey
    • Modifying factor order
    • Let’s re order the factos with fct_reorder()
    • Change levels of a factor
    • Modifying factor levels
    • Combine groups
    • Collapse levels into one
    • Lump together several small groups
    • Ordered factors
    • Meeting Videos
      • 16.0.1 Cohort 5
      • 16.0.2 Cohort 6
      • 16.0.3 Cohort 7
      • 16.0.4 Cohort 8
  • 17 Dates and times
    • Date/time objects
    • today() and now()
    • datetimes and read_csv()
    • ymd_hms() and friends
    • make_date() and make_datetime()
    • as_date() and as_datetime()
    • Getting components
    • Rounding datetimes
    • Updating components
    • Time spans
    • Durations
    • Periods
    • Intervals
    • Time zones
    • Daylight saving time
    • Lord Howe Island likes bugs
    • Lord Howe Island likes time zone bugs
    • 17.1 Meeting Videos
      • 17.1.1 Cohort 5
      • 17.1.2 Cohort 6
      • 17.1.3 Cohort 7
      • 17.1.4 Cohort 8
  • 18 Missing values
    • 18.1 Introduction
    • 18.2 Explicit missing values
    • 18.3 Implicit missing values
      • 18.3.1 Implicit missing values
      • 18.3.2 dplyr::antijoin Extra
      • 18.3.3 Exercises
    • 18.4 Factors and empty groups
      • 18.4.1 Factors and empty groups
      • 18.4.2 forcats 1.0.0 Extra
    • 18.5 Meeting Videos
      • 18.5.1 Cohort 7
      • 18.5.2 Cohort 8
  • 19 Joins
    • What?
    • What?
    • Keys
    • Keys
    • Keys
    • Keys
    • Keys
    • Mutating joins
    • Mutating joins
    • Mutating joins
    • Mutating joins
    • Mutating joins
    • Mutating joins: examples
    • Mutating joins: examples
    • Mutating joins: examples
    • Mutating joins: extras
    • Mutating joins: extras
    • Relationships in mutating joins
    • Relationships in mutating joins
    • Relationships in mutating joins
    • Relationships in mutating joins
    • Relationships in mutating joins
    • Relationships in mutating joins
    • Filtering joins
    • Filtering joins
    • Filtering joins
    • Filtering joins: examples
    • Filtering joins: examples
    • Specifying join keys and their matching conditions
    • Specifying join keys and their matching conditions
    • Specifying join keys and their matching conditions
    • Specifying join keys and their matching conditions
    • Specifying join keys and their matching conditions
    • Specifying join keys and their matching conditions
    • Some join_by() examples
    • Some join_by() examples
    • Some join_by() examples
    • Key matching conditions
    • Key matching conditions
    • Key matching conditions
    • Key matching conditions: examples
    • Key matching conditions: examples
    • Key matching conditions: examples
    • 19.1 Meeting Videos
      • 19.1.1 Cohort 5
      • 19.1.2 Cohort 6
      • 19.1.3 Cohort 7
      • 19.1.4 Cohort 8
  • Import
  • 20 Spreadsheets
    • Data Organization in Spreadsheets
    • Excel
    • Prerequisites
    • Getting started
    • Reading Excel spreadsheets
    • Column names
    • Skiping the first row
    • Dealing with the missing data
    • Specify the column types
    • Read file and then fix data
    • Reading worksheets
    • Read single worksheet
    • read_excel with NAs
    • excel_sheets for information
    • Bind data for single dataframe
    • Reading part of a sheet
    • Example file deaths.xlsx
    • read_excel with range
    • Data types
    • Writing to Excel
    • Lost datatypes when reading
    • Formatted output
    • Google Sheets
    • Prerequisites
    • Getting started with Google Sheets
    • Reading Google Sheets
    • Define datatypes
    • Read specific sheet
    • List of sheets
    • Read portion of the sheet
    • Writing to Google Sheets
    • Authentication
    • Summary
    • Meeting Videos
      • Cohort 7
      • Cohort 8
  • 21 Databases
    • Database basics
    • Connecting to a database
    • Load some data
    • DBI basics
    • dbplyr basics
    • SQL
    • SQL basics
    • SELECT
    • Subqueries
    • Joins
    • Other verbs
    • Function translations
    • Clean up
    • Meeting Videos
      • Cohort 7
      • Cohort 8
  • 22 Arrow
    • Why learn {arrow}?
    • Packages used
    • Download data
    • Open the data
    • Glimpse the data
    • Manipulate the data
    • Parquet > CSV
    • Benefits of parquet
    • Partitioning
    • Seattle library CSV to parquet
    • Seattle library parquet files
    • parquet + {arrow} + {dplyr}
    • Results (uncollected)
    • Results (collected)
    • Available verbs
    • Performance
    • Using {duckdb} with {arrow}
    • Meeting Videos
      • Cohort 7
      • Cohort 8
  • 23 Hierarchical data
    • 23.1 Packages
    • Introduction to lists
    • Introduction to lists
    • Introduction to lists
    • Introduction to lists
    • Introduction to lists
    • Introduction to lists
    • Introduction to lists
    • Introduction to lists
    • Introduction to lists
    • Tibbles can have a list column
    • Tibbles can have a list column
    • Tibbles can have a list column
    • Rectangling data by unnesting list columns
    • Rectangling data by unnesting list columns
    • Rectangling data by unnesting list columns
    • Rectangling data by unnesting list columns
    • Rectangling data by unnesting list columns
    • Rectangling data: special cases
    • Rectangling data: special cases
    • Rectangling data: special cases
    • Rectangling data: special cases
    • Rectangling data: special cases
    • Rectangling data: special cases
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • Rectangling data: applications
    • JSON
    • JSON
    • JSON
    • Get JSON into R
    • Get JSON into R
    • JSON and data frames
    • JSON and data frames
    • JSON and data frames
    • JSON and data frames
    • JSON and data frames
    • JSON and data frames
    • JSON and data frames
    • JSON and data frames
    • 23.2 Meeting Videos
      • 23.2.1 Cohort 7
      • 23.2.2 Cohort 8
      • 23.2.3 Cohort 9
  • 24 Web scraping
    • Ethics & Legalities
    • Typical HTML structure
    • Use {rvest} to scrape web pages
    • Example: Table
    • Select a specific element
    • Example: One specific table
    • Select finer-grained elements
    • Extract data
    • Example: Star Wars Rows
    • Example: Star Wars Directors
    • Example: Star Wars Tibble
    • Learn more
    • Meeting Videos
      • Cohort 7
      • Cohort 8
      • Cohort 9
  • Program
  • 25 Functions
    • 25.1 Introduction
    • 25.2 When and how to write a function
    • 25.3 Vector functions
    • 25.4 Writing a vector function
    • 25.5 Using the rescale01() function
    • 25.6 Using the rescale01() function (cont.)
    • 25.7 Other vector functions
    • 25.8 Data frame functions
    • 25.9 The problem of indirection
    • 25.10 The problem of indirection explained
    • 25.11 Tidy evaluation and embracing
    • 25.12 When to embrace?
    • 25.13 Common use cases
    • 25.14 Plot functions
    • 25.15 Adding more variables to plot functions
    • 25.16 Combining with other tidyverse
    • 25.17 Labeling
    • 25.18 Style: Making functions readable
    • 25.19 Summary
    • 25.20 Meeting Videos
      • 25.20.1 Cohort 5
      • 25.20.2 Cohort 6
      • 25.20.3 Cohort 7
      • 25.20.4 Cohort 8
  • 26 Iteration
    • Intro to iteration
    • Summarize w/ across(): setup
    • Summarize w/ across(): motivation
    • Summarize w/ across(): cleaner
    • Selecting columns
    • Passing functions
    • Multiple functions
    • Multiple functions with names
    • Multiple functions with names & args
    • Fancier naming
    • Filtering with if_any() and if_all()
    • across() in functions: setup
    • across() in functions: mpg
    • across() in functions: diamonds
    • Iterate over files
    • One vs everything
    • Walk vs map
    • Meeting Videos
      • Cohort 5
      • Cohort 6
      • Cohort 7
      • Cohort 8
  • 27 A field guide to base R
    • Intro
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting multiple elements with [
    • Selecting single elem. with [[ and $
    • Selecting single elem. with [[ and $
    • Selecting single elem. with [[ and $
    • Selecting single elem. with [[ and $
    • apply() functions
    • apply() functions
    • apply() functions
    • apply() functions
    • apply() functions
    • for() loops
    • for() loops
    • Base R plotting function
    • 27.1 Meeting Videos
      • 27.1.1 Cohort 7
      • 27.1.2 Cohort 8
  • Communicate
  • 28 Quarto
    • Introduction
    • Quarto basics
    • Run code in quarto
    • Visual editor
    • Source editor
    • Code chunks
    • Chunk label
    • Chunk options
    • Global options
    • Inline code
    • Figures
    • Tables
    • Caching
    • YAML header
      • Self contained
      • Parameters
    • Bibliographies and citations
    • Meeting Videos
      • 28.0.1 Cohort 5
      • 28.0.2 Cohort 6
      • 28.0.3 Cohort 7
      • 28.0.4 Cohort 8
  • 29 Quarto formats
    • Setting output type
    • Output options
    • Documents
    • Presentations
    • Interactivity
    • htmlwidgets
    • Shiny
    • ShinyLive
    • Websites and books
    • Other formats
    • Resources
    • Meeting Videos
      • Cohort 5
      • Cohort 6
      • Cohort 7
      • Cohort 8
  • Removed chapters
  • Model basics
    • 29.1 A bit of Mathematics:
    • 29.2 Linear models and non linear models
      • 29.2.1 Transformations
    • 29.3 Prediction
      • 29.3.1 Interaction
    • 29.4 Resources:
    • 29.5 Meeting Videos
      • 29.5.1 Cohort 5
      • 29.5.2 Cohort 6
  • Model building
    • 29.6 EDA vs Prediction
    • 29.7 Build a Linear Model
    • 29.8 Examine Residuals
    • 29.9 Another Diamonds Model
    • 29.10 Feature Engineering
    • 29.11 Learning More
    • 29.12 Meeting Videos
      • 29.12.1 Cohort 5
      • 29.12.2 Cohort 6
  • Many models
    • 29.13 Introduction
      • 29.13.1 Prerequisites
    • 29.14 gapminder
      • 29.14.1 Nested data
      • 29.14.2 List-columns
      • 29.14.3 Unnesting
      • 29.14.4 Model quality
    • 29.15 List-columns
    • 29.16 Creating list-columns
      • 29.16.1 With nesting
      • 29.16.2 From vectorised functions
      • 29.16.3 From multivalued summaries
      • 29.16.4 From a named list
    • 29.17 Simplifying list-columns
      • 29.17.1 List to vector
      • 29.17.2 Unnesting
    • 29.18 Making tidy data with broom
    • 29.19 Meeting Videos
      • 29.19.1 Cohort 5
      • 29.19.2 Cohort 6
  • Published with bookdown

R for Data Science Book Club

str_detect() related

  • str_subset(): returns a character vector containing only the strings that match.
  • str_which(): returns an integer vector giving the positions of the strings that match.