This document provides a cheat sheet for regular expressions in R. It lists common patterns used in regular expressions to match different types of characters including digits, letters, whitespace, punctuation. It also describes various functions in base R and stringr packages for extracting matches and replacing substrings using regular expressions. These include grep, grepl, regexpr, gsub, and their stringr equivalents. The document concludes with explanations of metacharacters and quantifiers used to specify repetitions in regular expressions as well as lookaround and greedy/lazy matching.
This document provides a cheat sheet for regular expressions in R. It lists common patterns used in regular expressions to match different types of characters including digits, letters, whitespace, punctuation. It also describes various functions in base R and stringr packages for extracting matches and replacing substrings using regular expressions. These include grep, grepl, regexpr, gsub, and their stringr equivalents. The document concludes with explanations of metacharacters and quantifiers used to specify repetitions in regular expressions as well as lookaround and greedy/lazy matching.
This document provides a cheat sheet for regular expressions in R. It lists common patterns used in regular expressions to match different types of characters including digits, letters, whitespace, punctuation. It also describes various functions in base R and stringr packages for extracting matches and replacing substrings using regular expressions. These include grep, grepl, regexpr, gsub, and their stringr equivalents. The document concludes with explanations of metacharacters and quantifiers used to specify repetitions in regular expressions as well as lookaround and greedy/lazy matching.
This document provides a cheat sheet for regular expressions in R. It lists common patterns used in regular expressions to match different types of characters including digits, letters, whitespace, punctuation. It also describes various functions in base R and stringr packages for extracting matches and replacing substrings using regular expressions. These include grep, grepl, regexpr, gsub, and their stringr equivalents. The document concludes with explanations of metacharacters and quantifiers used to specify repetitions in regular expressions as well as lookaround and greedy/lazy matching.
Cheat Sheet extract first match [1] "tam" "tim" string regmatches(string, gregexpr(pattern, string)) extracts all matches, outputs a list [[1]] "tam" [[2]] character(0) [[3]] "tim" "tom" stringr::str_extract(string, pattern) extract first match [1] "tam" NA "tim" [[:digit:]] or \\d Digits; [0-9] stringr::str_extract_all(string, pattern) \\D Non-digits; [^0-9] extract all matches, outputs a list [[:lower:]] Lower-case letters; [a-z] > string <- c("Hiphopopotamus", "Rhymenoceros", "time for bottomless lyrics") stringr::str_extract_all(string, pattern, simplify = TRUE) [[:upper:]] Upper-case letters; [A-Z] > pattern <- "t.m" extract all matches, outputs a matrix [[:alpha:]] Alphabetic characters; [A-z] stringr::str_match(string, pattern) [[:alnum:]] Alphanumeric characters [A-z0-9] extract first match + individual character groups \\w Word characters; [A-z0-9_] \\W Non-word characters grep(pattern, string) regexpr(pattern, string) stringr::str_match_all(string, pattern) [[:xdigit:]] or \\x Hexadec. digits; [0-9A-Fa-f] [1] 1 3 find starting position and length of first match extract all matches + individual character groups [[:blank:]] Space and tab grep(pattern, string, value = TRUE) gregexpr(pattern, string) [[:space:]] or \\s Space, tab, vertical tab, newline, [1] "Hiphopopotamus" find starting position and length of all matches form feed, carriage return [2] "time for bottomless lyrics stringr::str_locate(string, pattern) \\S Not space; [^[:space:]] sub(pattern, replacement, string) grepl(pattern, string) find starting and end position of first match replace first match [[:punct:]] Punctuation characters; [1] TRUE FALSE TRUE stringr::str_locate_all(string, pattern) gsub(pattern, replacement, string) !"#$%&()*+,-./:;<=>?@[]^_`{|}~ stringr::str_detect(string, pattern) find starting and end position of all matches replace all matches Graphical char.; [[:graph:]] [[:alnum:][:punct:]] [1] TRUE FALSE TRUE stringr::str_replace(string, pattern, replacement) Printable characters; replace first match [[:print:]] [[:alnum:][:punct:]\\s] [[:cntrl:]] or \\c Control characters; \n, \r etc. stringr::str_replace_all(string, pattern, replacement) strsplit(string, pattern) or stringr::str_split(string, pattern) replace all matches
\n New line . Any character except \n * Matches at least 0 times
^ Start of the string \r Carriage return | Or, e.g. (a|b) + Matches at least 1 time $ End of the string \t Tab [] List permitted characters, e.g. [abc] ? Matches at most 1 time; optional string \\b Empty string at either edge of a word \v Vertical tab [a-z] Specify character ranges {n} Matches exactly n times \\B NOT the edge of a word \f Form feed [^] List excluded characters {n,} Matches at least n times \\< Beginning of a word () Grouping, enables back referencing using {,n} Matches at most n times \\> End of a word \\N where N is an integer {n,m} Matches between n and m times
(?=) Lookahead (requires PERL = TRUE),
e.g. (?=yx): position followed by 'xy' By default R uses POSIX extended regular Metacharacters (. * + etc.) can be used as By default the asterisk * is greedy, i.e. it always (?!) Negative lookahead (PERL = TRUE); expressions. You can switch to PCRE regular literal characters by escaping them. Characters matches the longest possible string. It can be position NOT followed by pattern expressions using PERL = TRUE for base or by can be escaped using \\ or by enclosing them used in lazy mode by adding ?, i.e. *?. (?<=) Lookbehind (PERL = TRUE), e.g. wrapping patterns with perl() for stringr. in \\Q...\\E. (?<=yx): position following 'xy' Greedy mode can be turned off using (?U). This Negative lookbehind (PERL = TRUE); All functions can be used with literal searches switches the syntax, so that (?U)a* is lazy and (?<!) position NOT following pattern using fixed = TRUE for base or by wrapping (?U)a*? is greedy. patterns with fixed() for stringr. Regular expressions can be made case insensitive ?(if)then If-then-condition (PERL = TRUE); use using (?i). In backreferences, the strings can be lookaheads, optional char. etc in if-clause All base functions can be made case insensitive converted to lower or upper case using \\L or \\U ?(if)then|else If-then-else-condition (PERL = TRUE) Regular expressions can conveniently be by specifying ignore.cases = TRUE. (e.g. \\L\\1). This requires PERL = TRUE. *see, e.g. http://www.regular-expressions.info/lookaround.html created using rex::rex(). http://www.regular-expressions.info/conditional.html