Character classes

  • character class or set []: match any character in a set.
  • construct your own sets with []
    • [abc] matches “a”, “b”, or “c”
    • [^abc] matches any character except “a”, “b”, or “c”.
  • two other characters that have special meaning inside of []:
    • - defines a range, e.g., [a-z] matches any lower case letter and [0-9] matches any number.
    • \ escapes special characters, so [\^\-\]] matches ^, -, or ].
x <- "abcd ABCD 12345 -!@#%."
str_view(x, "[abc]+")
## [1] │ <abc>d ABCD 12345 -!@#%.
str_view(x, "[a-z]+")
## [1] │ <abcd> ABCD 12345 -!@#%.
str_view(x, "[^a-z0-9]+")
## [1] │ abcd< ABCD >12345< -!@#%.>
# You need an escape to match characters that are otherwise
# special inside of []
str_view("a-b-c", "[a-c]")
## [1] │ <a>-<b>-<c>
str_view("a-b-c", "[a\\-c]")
## [1] │ <a><->b<-><c>