Regular Expressions Basics

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 11

Regular Expressions Basics

What and Why


Common search versus pattern search

Common replace versus pattern replace


Regular expressions (also referred as regexp or regex) give you power to search, match, extract and replace string of text

Flexible patterns
Simple patterns Advanced expressions

Available in most languages, scripts, editing tools, shells


Origins from UNIX editor ed / grep Different RegExp engines, not fully compatible Perl is most popular

Basic Syntax

Regular expressions are Strings themselves: "1 word"

RegExp are case-sensitive, if not specified otherwise


JavaScript accepts also non-string expressions: /^[a-z]+$/gi Expressions have symbols: \n\b([{}])+?*

Symbols need to be escaped when used as plain characters: \{


Symbols can be grouped in types: position, literal, character classes, repetition, alternation and grouping, back references, pattern switches

Position Symbols

^ Matches the beginning of a string $ Matches the end of a string \b Matches a word boundary \B Matches a non-word boundary

Literal Symbols

alphanumeric All alphabetical and numeric characters match themselves (e.g. /2 apples/) \n New line \r Carriage return \t Horizontal tab \f Form feed \v Vertical tab \xdd Hex numbers \uXXXX Unicode representation of characters \ Escape special characters used as symbols: \$

Character Classes

[] delimit a character class Accepts ranges with dash: [a-f] [2-5]

Can be negated: [^abc] . (any) Matches any character except new line terminators \w (word) Matches alphanumeric equivalent to [a-zA-Z0-9_] \W (non-word) Matches non-word characters equivalent to
\d Matches any digit equivalent to [0-9] \D Matches any non-digit equivalent to [^0-9] \s Matches any white space equivalent to [ \t\r\n\v\f] \S Matches any non-space equivalent to [^ \t\r\n\v\f]

[^a-zA-Z0-9_]

Repetition and Quantification


Repetition symbols follow other symbols or patterns {x} Matches exactly x occurrences {x,} Matches x or more occurrences {x,y} Matches x to y (inclusive) occurrences * Matches 0 or more occurrences equivalent to {0,} + Matches 1 or more occurrences equivalent to {1,} ? Matches 0 or 1 occurrences equivalent to {0,1} [a-z]+ [A-Z]* L?evi

Examples: \w{3} \d{6,9}

Alternation, Grouping and Backreferences


() are used to group characters toghether: (hubba\s)+ | is used as OR operator to define an alternation:

(ab)|(cd)|(de) (a|c) \n where n is 0 to 9, matches a previous group counted from left (\w+)\s+(\d+)\s+\2\1

Groups are also useful for extracting matched portions of a pattern

Pattern Switches (Flags)


i : ignore case make the expression case-insensitive g : global search searches for all occurrences of the pattern not just the first m : multiline changes the meaning of ^ and $ symbols between matching beginning and end of line and matching beginning and end of string
Pattern switches are used as parameters when constructing a RegExp object or at the end of the literal expression: new RegExp("JavaScript", "gi") /JavaScript/gi

JavaScript Usage
String methods: match( RegExp ): Array replace( RegExp, String ): String split( RegExp ): Array search( RegExp ): Number

RegExp methods: test( String ): Boolean exec( String ): Array

Examples and Practice

You might also like