Regex in Your SPL: An Easy Introduction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Regex in Your SPL

An Easy Introduction

Michael Simko | Sr. Engineer, Instructor

September 2017 | Washington, DC


Forward-Looking Statements
During the course of this presentation, we may make forward-looking statements regarding future events or
the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC.

The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in
the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
Basics of Regular
Expressions
What is this Regex thing all about?
© 2017 SPLUNK INC.

1. Filtering. Eliminate unwanted data in


Regex in your searches
Splunk SPL
What’s in it for me? 2. Matching. Advanced pattern matching
to find the results you need

3. Field Extraction on-the-fly


What Is Regex?
What People Say

“A regular expression is “Regular expressions


an object that “A regular expression is
are an extremely
describes a pattern of a special text string for
powerful tool for
characters. Regular describing a search
manipulating text and
expressions are used pattern. You can think
data…
to perform pattern- of regular expressions
If you don't use
matching and as wildcards on
regular expressions
‘search-and-replace’ steroids.”
yet, you will...”
functions on text.” – Mastering Regular Expressions,
– Regexbuddy.com (and others –
Original source unknown)
– w3schools.com O’Rielly, Jeffery E.F. Friedl
Regex Basics
The Main Elements

Control Characters: Character Types: Operators:


^ Start of a Line \s White Space * Zero or More
$ End of a Line \S Not white space + One or More
\d Digit ? Zero or One
\D Not Digit
\w Word Character (letter, #, or _)
\W Not a Word Character

These elements work together to specify a pattern


Regex Basics
The Main Elements

Control Characters: Character Types: Operators:


^ Start of a Line \s White Space * Zero or More
$ End of a Line \S Not white space + One or More
\d Digit ? Zero or One
\D Not Digit
\w Word Character
\W Not Word Characters

Sample Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+

: is the literal character colon


\s without a + or * is a single space
\w+ is one or more word characters
\d+ is one or more digits
^ Regex is Anchored to the beginning of the line
Regex Basics
The Main Elements
Control Characters: Character Types:
^ Start of a Line \s White Space Operators:
$ End of a Line \S Not white space * Zero or More
\d Digit + One or More
\D Not Digit ? Zero or One
\w Word Character
\W Not Word Characters

Sample Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+

Matching String: 22 Aug 2017 18:45:20 On this date, Michael made BBQ references
Regex Basics
To Protect and Give Options
Control Characters: Character Types: Protection Characters:
^ Start of a Line \s White Space \ The next character is a literal
$ End of a Line \S Not white space
\d Digit
Special Characters: \D Not Digit
| Alternative / “or” \w Word Character
\W Not Word Characters

Special Characters: Protecting Characters:


To give multiple options: | To escape or protect special characters: \
The pipe character The Backlash or back-whack
(also called “or”)
Protect periods, [],(),{}, etc when you want to
use the literal character
Regex Basics
To Protect and Give Options
Control Characters: Character Types: Protection Characters:
^ Start of a Line \s White Space \ The next character is a literal
$ End of a Line \S Not white space
\d Digit
Special Characters: \D Not Digit
| Alternative / “or” \w Word Character
\W Not Word Characters

Regex: Indiana|Purdue Regex: \d+\.\d+\.\d+\.\d+

Purdue 8w 3l .727 19w 5l .792 Login Failure From 192.168.12.145


Indiana 5w 4l .500 15w 8l .652 Login Success From 10.35.36.37
(we’ll do the above a different way later)
Regex Basics
Only Some May Pass
Control Characters: Character Types: Protection Characters:
^ Start of a Line \s White Space \ The next character is a literal
$ End of a Line \S Not white space
\d Digit
Inclusion Characters:
Special Characters: \D Not Digit
[] Include
| Alternative / “or” \w Word Character
[^] Exclude
\W Not Word Characters

Include Characters: Exclude Characters:


[…] [^…]

Example usage: [a-zA-Z0-9] Example usage: [^ ]


Regex Basics
Only Some May Pass
Control Characters: Character Types: Protection Characters:
^ Start of a Line \s White Space \ The next character is a literal
$ End of a Line \S Not white space
\d Digit
Inclusion Characters:
Special Characters: \D Not Digit
[] Include
| Alternative / “or” \w Word Character
[^] Exclude
\W Not Word Characters

Regex: server:[a-z0-9]+ Regex: server:[^ ]

server:253fsf2,host=23423
Keep going so long as
server: 253fsf2,host=23423
you hit
server:253f sf2,host=23423
characters that are
lowercase a-Z or 0-9
Go until you hit a space
Regex Basics
Say What Again
Control Characters: Character Types: Protection Characters:
^ Start of a Line \s White Space \ The next character is a literal
$ End of a Line \S Not white space
\d Digit
Inclusion Characters: Repetition:
Special Characters: \D Not Digit
[] Include {#} Number of Repetitions
| Alternative / “or” \w Word Character
[^] Exclude {#,#} Range of Repetitions
\W Not Word Characters

Repetition is used to define the exact number of characters


Or an upper and lower boundary of acceptable characters
(or the exact number of repetitions of a pattern)
Regex Basics
Say What Again
Control Characters: Character Types: Protection Characters:
^ Start of a Line \s White Space \ The next character is a literal
$ End of a Line \S Not white space
\d Digit
Inclusion Characters: Repetition:
Special Characters: \D Not Digit
[] Include {#} Number of Repetitions
| Alternative / “or” \w Word Character
[^] Exclude {#,#} Range of Repetitions
\W Not Word Characters

Regex: IP: \d{3}\.\d{3}\.\d{3}\.\d{3} Regex: IP: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}


IP: 172.106.190.100 IP: 172.16.19.1
IP: 10.24.255.2 IP: 10.24.255.2
IP: 224.252.2.52 IP: 224.252.2.52
Regex Basics
To Protect and Give Options
Control Characters: Character Types: Protection Characters:
^ Start of a Line \s White Space \ The next character is a literal
$ End of a Line \S Not white space
\d Digit
Inclusion Characters: Repetition:
Special Characters: \D Not Digit
[] Include {#} Number of Repetitions
| Alternative / “or” \w Word Character
[^] Exclude {#,#} Range of Repetitions
\W Not Word Characters
Logical Groupings:
() Wrap sets of the Regex

Later we’ll use these as


“capture groups”
Use to specify repetition for adjacent elements
in order to form patterns
Regex Basics
To Protect and Give Options
Control Characters: Character Types: Protection Characters:
^ Start of a Line \s White Space \ The next character is a literal
$ End of a Line \S Not white space
\d Digit
Inclusion Characters: Repetition:
Special Characters: \D Not Digit
[] Include {#} Number of Repetitions
| Alternative / “or” \w Word Character
[^] Exclude {#,#} Range of Repetitions
\W Not Word Characters
Logical Groupings:
() Wrap sets of the Regex

Revisiting the IP Matching from a couple of slides ago


Alternate Regex: IP: (\d{1,3}\.){3}\d{1,3}
IP: 172.16.19.1
IP: 10.24.255.2 Repeats \d{1,3}\. three times
IP: 224.252.2.52 Then tacks on the last \d{1,3}
Regex Basics
The Last (Not so Basic) Element
Control Characters: Character Types: Protection Characters:
^ Start of a Line \s White Space \ The next character is a literal
$ End of a Line \S Not white space
\d Digit
Inclusion Characters: Repetition:
Special Characters: \D Not Digit
[] Include {#} Number of Repetitions
| Alternative / “or” \w Word Character
[^] Exclude {#,#} Range of Repetitions
\W Not Word Characters
Logical Groupings: Named Capture Groups:
() Wrap sets of the Regex (?<CaptureGroupName>stuff)

This names the capture group (e.g., logical grouping).


Now when you return the capture, it has a name and not just
“Capture Group 1”
Regex Basics
The Last (Not so Basic) Element
Control Characters: Character Types: Protection Characters:
^ Start of a Line \s White Space \ The next character is a literal
$ End of a Line \S Not white space
\d Digit
Inclusion Characters: Repetition:
Special Characters: \D Not Digit
[] Include {#} Number of Repetitions
| Alternative / “or” \w Word Character
[^] Exclude {#,#} Range of Repetitions
\W Not Word Characters
Logical Groupings: Named Capture Groups:
() Wrap sets of the Regex (?<CaptureGroupName>stuff)

Regex: user:\s(?<username>[^@]+)

Go until we hit an @
Log 1: blah blah user: [email protected] Capture as field username
Log 2: more blah user: [email protected] Anchor off user:\s
Regex in SPL
Using Regular Expressions to improve your SPL
Regex in Your SPL
Search Time Regex

▶ Field Extractions ▶ Evaluation


• erex • Regex
• rex • match
• Interactive Field Extractor • replace
• Props – Extract
• Transforms - Report

Fields are fundamental Regex provides granularity


to Splunk Search when evaluating data
© 2017 SPLUNK INC.

Field Extractions
On the fly (No need to work ahead)
erex Command
Field Extractions Using Examples

Use Splunk to generate regular expressions by providing a list of values from the data.

▶ Scenario: Extract the first word of


each sample phrase from | windbag
• Step 1, find the samples
• Step 2, extract the field
erex Command
Field Extractions Using Examples

Erex Command: …| erex <newFieldName> examples=“example1,example2”

| windbag | erex firstwords examples="Unë, ‫ﯾؤﻟﻣن‬, Կրնամ"

Easter egg that New Field to create Examples from the data
creates sample data
erex Command
Field Extractions Using Examples

New Field created

| windbag | erex firstwords


examples="Unë, ‫ﯾؤﻟﻣن‬, Կրնամ"

The values erex generated based


on the samples
erex Command
Field Extractions Using Examples

▶ Erex is a great introduction to using


regular expressions for field
extraction.
• Erex provides the rex that it generated
• Going forward, use the rex in your saved
searches and dashboards.
• Rex is more efficient
rex Command
Extract Fields Using Regular Expressions at Search Time

Creates a Field Extraction

… | rex field={what_field} “FrontAnchor(?<extraction>{characters}+)BackAnchor”


rex Command
Extract Fields Using Regular Expressions at Search Time

| windbag | rex field=sample "^(?<FirstWord>[\S+]*)"

Specify the field to rex from

Front Anchor
Named Field Extraction Grab any non-space character
rex Command
Extract Fields Using Regular Expressions at Search Time

| windbag | rex field=sample


"^(?<FirstWord>[\S+]*)"

Named Field Extraction Grab any non-space character


rex Command
Use Rex to Perform SED Style Substitutions

SED is a stream editor. It can be used to create substitutions in data.

Splunk uses the rex command to perform Search-Time substitutions.


rex Command
Use Rex to Perform SED Style Substitutions

| windbag | search lang="*Norse"


| rex mode=sed "s/Old (Norse)/Not-so-old \1/g"

Set the mode


g for global
s for substitute (more than once)

() to create a capture group


\1 to paste capture group Substitute the stuff between the first /
and second / with the stuff between
second / and third /
rex Command
Use Rex to Perform SED Style Substitutions

Result:


Set the mode | rex mode=sed "s/Old
(Norse)/Not-so-old \1/g"
s for substitute

() to create a capture group


\1 to paste capture group
© 2017 SPLUNK INC.

Evaluation
Using Regular Expressions for Pattern Matching
Regex Command
Filter Using Regular Expressions

sourcetype=fs_notification | regex chgs="^modtime"

Field to evaluate Regex


Match Function
Filter Using Regular Expressions

match(SUBJECT),”REGEX”
… | eval n = if(match(field,”^MyRegex”, 1, 2)
sourcetype=access_combined_wcookie
| eval com = if(match(referer,"http:.*\.com"),"True","False")

Match. Returns 1 Field to evaluate The Regex


for it matches, 0
for not.
Replace Command
Switch Data at Search Time

Replace field values with the values you specify


… | replace “<whoever>” WITH “<whomever>” IN <target_field>
Replace Command
Switch Data at Search Time

Replace field values with the values you specify


… | replace “<whoever>” WITH “<whomever>” IN <target_field>

| windbag | replace "Euro" with "Euro: How is a currency a language" in lang

String to be String to replace Field in which to


replaced with make the
replacement

operator operator
Persistence
Regular Expressions That Exist Outside Your Search

Until this point, every one of our extractions have only existed
in the search. But, what if we want them to persist? Or to share them?

1. Interactive Field Extractor

2. Extractions in Props / Transforms


Persistent Field Extractions
Comparing The Persistent Field Extractions

Interactive Field Extractor Extract in Props Report in Transforms

– Walk-through UI – Straight editing in props.conf – Edit directly in transforms.conf

– You may want to rewrite the – Requires Admin Rights (or an – Invoked by props.conf
generated Regex admin to put in place)
– Requires Admin Rights (or an
– Does not require admin rights admin to put in place)
Q&A
Michael Simko | Sr. Engineer/Instructor
© 2017 SPLUNK INC.

1. Use Regex to create powerful filters in


Key your SPL
Takeaways
Regex in your SPL 2. Use Regex to create field extractions
3. Regex doesn’t have to be hard. You can
do this!
© 2017 SPLUNK INC.

Thank You
Don't forget to rate this session in the
.conf2017 mobile app
Appendix A
Caveats
rex Command – Caveat
Use Rex to Perform SED Style Substitutions

| windbag | search lang="*Norse" Caveat:


| rex mode=sed "s/Old (Norse)/Not-so-old \1/g"

The substitution from rex


comes after the lang field is
extracted.
So even though the event
data is showing us the
substitution, the field lang is
showing the original value.
Appendix B
Exercises to Practice With
Regex Basics
The Main Elements

Control Characters: Character Types: Operators:


^ Start of a Line \s White Space * Zero or More
$ End of a Line \S Not white space + One or More
\d Digit ? Zero or One
\D Not Digit
\w Word Character
\W Not Word Characters

Scenario Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+


Learn by Fire: A. 002421 Februari 1083 1:242525:22352
Which of these will the B. 07 Feb 17 12:53:36AM
sample Regex match? C. Feb 13 2017 18:46:56
D. 14 February 2017 07:45:47Z
(answers on next slide)
Regex Basics
The Main Elements

Control Characters: Character Types: Operators:


^ Start of a Line \s White Space * Zero or More
$ End of a Line \S Not white space + One or More
\d Digit ? Zero or One
\D Not Digit
\w Word Character
\W Not Word Characters

Scenario Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+


Learn by Fire:
A. 002421 Februari 1083 1:242525:22352
Which of these will the
sample Regex match? B. 07 Feb 17 12:53:36AM
C. Feb 13 2017 18:46:56
D. 14 February 2017 07:45:47:46
Regex Basics
The Main Elements

Control Characters: Character Types: Operators:


^ Start of a Line \s White Space * Zero or More
$ End of a Line \S Not white space + One or More
\d Digit ? Zero or One
\D Not Digit
\w Word Character
\W Not Word Characters

Practice: Create a Regex that describes all three of the following strings

06 February 2017 192.168.1.2


05 Apr 2014 10.2.1.150
31 July 2020 19..15.63
Regex Basics
The Main Elements

Control Characters: Character Types: Operators:


^ Start of a Line \s White Space * Zero or More
$ End of a Line \S Not white space + One or More
\d Digit ? Zero or One
\D Not Digit
\w Word Character
\W Not Word Characters

Scenario: Create a Regex that describes the following strings

A solution:
\d+\s\w+\s\d+\s\d*\.\d*\.\d*\.\d*

06 February 2017 192.168.1.2


05 Apr 2014 10.2.1.150
31 July 2020 19..15.63
Regex Basics
The Main Elements
1. Open up your Splunk
2. | windbag | head 20 | table _raw
3. Copy the _raw data
4. Paste the data in Regex101.com

Goals: Extract the following fields for each event:


lang
sample
The Date without Time
The Time

Perform these as “named” extractions


Replace Command
Switch Data at Search Time

Silly version to try on your own


| windbag | head 20 | replace "1" WITH "Uno" in odd

Try it, then click the down chevron to see the results

You might also like