Nuevo Blueteam

Download as pdf or txt
Download as pdf or txt
You are on page 1of 100

BlueTeam Handbook:

SOC, SIEM, and Threat Hunting Use Cases


Notes from the Field (VI.02)

XI condensad fíeld guide for


the Security Operations team.

Don Murdoch^
Blue Team Handbook Vol 2: SOC, SIEM,
and Threat Hunting Use Cases
Notesfrom the Field

A condensed field guidefor the Security


Operations team. (VI.02)
By Don Murdoch, GSE #99, MBA, MSISE
lllustrated by Bonnie Murdoch, BFA.
Table of Contents

Copyright © 2018 by Don Murdoch. AH rights reserved. Except as permitted under the United States
Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by
any means, or stored in a database or retrieval system, without the prior written permission of the
publisher and author.

This book is available at special quantity discounts to use as premiums and sales promotions or for
use in academic and corporate training programs. Please contact the author through
www.blueteamhandbook.com.

AH trademarks are trademarks of their respective owners. Rather than put a trademark Symbol after
every occurrence of a trademarked ñame, we use ñames in an editorial fashion only, with no
intention of infringement of the trademark. Where such designations appear in this book, they have
been printed with initial Caps.

TERMS OF USE: This is a copyrighted work and its licensors reserve all rights in and to the work. Use
of this work is subject to these terms. Except as permitted underthe Copyright Act of 1976 and the
right to store and retrieve one copy of the work, you may not decompile, disassemble, reverse
engineer, reproduce, modify, create derivative works based upon, transmit, distribute, dissemínate,
sell, publish or sublicense the work or any part of it without prior consent from the author, secured
via paper letter with a blue ink signature. You may use the work for your own noncommercial and
personal use; any other use of the work is strictly prohibited. Your right to use the work may be
terminated if you fail to comply with these terms.

THE WORK IS PROVIDED "AS IS." The author does not warrant or guarantee that the functions
contained in the work will meet your requirements, that its operation will be uninterrupted or error
free, or that the work will qualify as an expert witness. The author shall not be Hable to you or anyone
else for any inaccuracy, error or omission, regardless of cause, in the work or for any damages
resulting therefrom. Under no circumstances shall the author be Hable for any indirect, incidental,
special, punitive, consequential, or similar damages that resultfrom the use of or inability to use the
work. This limitation of Hability shall apply to any claim or cause whatsoever whether such claim or
cause arises in contract, tort or otherwise.

Versión 1.00: Initial Printing September 2018.


Versión 1.01: First Update: (v25) 11/10/18 - spelling, grammar updates.
Versión 1.02: Second Update: (v27b) 3/23/19 - contení updates.

1SBN-13: 978-1726273985 ISBN-10: 1726273989 V 1.0,1.01


ISBN-13: 978-1091493896 V1.02

If you would Hke to get in contact with the author or illustrator, please use the contact form on the
website at www.blueteamhandbook.com.

Art Notes: Art in the BTHb series is designed to be humorous and punny (yes, that IS a wordl). We
hope you enjoy it. If you would like to use the artist for your own books, presentations, or articles,
please use the contact form on www.blueteamhandbook.com.

ii
Table of Contents

Tabie of Conteras
Preface 7
Foreword 11
Introduction 13
Security Operation Center Field Notes 15
SOC Defined 15
SOC Charter 16
Business Valué Chain Tie In 16
Identify SOC Services 17
SOC Project Planning Outline and Field Notes (VI.02) 20
Consider, and be Prepared, forTough Questions 27
Collect the Bread and Butter Data Sources 29
Useful MBA Concepts: SWOT and PESTL 30
Funding the SOC 31
Getting into the Hunt 38
SOC Directly Supports the CSIRT Function 38
Metrics for the SOC 39
SOC Training, Skills, Staffing, and Roles 45
SOC Layered Operating Models 52
SOC Maturity Curve Using the CMMI 55
Example SOC Turnover Shift Check List 59
Security Monitoring Use Cases by Data Source 61
TheScenario 61
Defining the SOC Use Case 65
Organizational Considerations for Use Case Development 69
"Top Ten" Security Operations Use Cases 70
AntiSpam and Email Messaging 71
Email and Web: Interactions with Look a Like or Doppelganger Domains 73
Antivirus (A/V) Systems 74
Application Whitelisting 76
Command and Control 78
Data Loss Prevention (DLP) 78
Domain Ñame Services (DNS) 78
End Point Detection and Response 82
Data Islands or System Snowflakes 83
Windows Account Life Cycle Events (ALCE) 84
Monitoring Jump Boxes 92
NetWork Hardware Devices and Appliances 94
Printing 95
Operating System Security, Change, and Stability 96
Table of Contents

Data Leakage (USB Insertion) 98


Brute Forcé Failed Authentication Attempts 99
DHCP and Data Link Layer Analysis 100
Next Generation Layer 7 Firewalls 102
TOR Overlay Networks 103
DarkNet Unused NetWork Monitoring 104
NetWork Intrusión Detection / Prevention 104
Perimeter Security Focused Access 107
Top One Million Site Checks 112
Top Ten IP Address Use Cases 113
Web Application Firewalls (WAF) 114
Web Proxy and URL Activity (VI.02) 115
Webserver and Application Server Activity 117
Windows Firewall (VI.02) 120
Windows Process (Sysmon and EventID 4688) (VI.02) 120
Windows Process Execution Patterns and loC's (VI.02) 125
Windows Presence Indicators 128
X-Forwarded For, NAT, and the True Source IP Topics 131
SOC and SIEM Use Case Témplate 133
SIEM/SOC Use Case Development Process 133
Témplate Instructions 134
Use Case Témplate 134
Complete SOC and SIEM Use Case Example 139
Monitoring Elevated Access Group Membership 139
Partial SOC Use Cases 145
Partial Use Case: Windows NetWork User Presence 145
Partial Use Case: System Not Logging/Reporting 145
Partial Use Case: External (VPN) and Internal (Desktop/Server) Access 146
Partial Use Case: IDS Stacked Events 146
Partial Use Case: Policy Violation Issues 146
A Day in the Life of a SOC Analyst 149
Alarm Triage OverView 150
Dashboard or Summary Data Review 152
Security State Data Review 152
SOC Support System(s) Component Health Review 154
Identify and Report IT Operational Issues 155
Active Threat Hunting 156
Review Security Intelligence Data 156
Alarm Investigation Process 159
Techniques and Analysis Methods by Data Source 160
Performing Well Rounded Alarm Analysis 163
Alarm Statistics 169

ii
Table of Contents

Applying Threat Hunting Practices to the SOC 171


Leverage the MITRE ATT&CK Framework 174
Example Threat Hunt Check List 175
Hunting Historical Data Based on Current Intel and Alarms 176
Excessive, or Múltiple, Source IPs for User Logins 177
Web (HTTP) Transactions in Volume per Day 177
Command and Control Detection 178
Lateral Movement or Lateral Traversal 180
Using the Lockheed Martin Cyber Kill Chain 184
Indicators of Compromise and Attack Data Dependencies 187
SIEM Field Notes 191
General Principies to Run a Successful SIEM 191
Implement Synthetic Transactions 193
Severity, Priority, Urgency, and Reliability Gritería 195
loC Contributions and Threat Intelligence Feeds 197
NIDS Deployment and Data Collection 198
SIEM Deployment Checklist 198
Understand Why SIEM Deployments Fail so It Won't Happen to You 200
SIEM Event Categorization and Taxonomy 205
Networks, Assets, and SIEM Automation 205
SIEM Data Collection Methods and Considerations 207
Summary 211
Timekeeping and Event Times 213
Daylight Saving Time 215
NetWork Time Protocol (NTP) 216
NTP Device Configuration 216
Manual Log Anaiysis for IR and the SOC 219
Log Management 223
Log Record Data Elements 223
Logging System Components 225
Log Filtering 226
Log Times 227
Detecting NTP Issues Use Case 228
Log Retention, Audit, and Compliance Considerations 228
Logging and SOC Program Maturity from NIST 231
Security Onion: Effective NetWork Security Monitoring 233
NSM Platform Advice from the Field 234
Continuous Monitoring 236
Security Architecture Considerations 239
Useful Reports, References, and Standards 245
Industry Reports and Organizations of Note 245
Table of Contents

MITRE ATT&CK 245


InfoSec Standards of Note 246
Common TCP and UDP Ports 249
Bibliography and References 253
Index 255

List of Tables

Table 1 An Example SWOT Analysis..........................................................................30


Table 2 Example General SOC Metrics..................................................................... 41
Table 3 Example Incident Response Metrics........................................................... 44
Table 4 SOC Roles and Functions............................................................................. 51
Table 5 SOC Two Layer Model Roles and Responsibilities....................................... 53
Table 6 SOC Three Layer Model............................................................................... 54
Table 7 CMMI Five Level Maturity Model................................................................ 56
Table 8 Windows Defender Application and Services
Logs\Microsoft\Windows\Windows Defender\Operational and System Log........ 76
Table 9 Windows AppLocker: Application and Services Logs\ Microsoft\ Windows\
AppLocker.................................................................................................................77
Table 10 Security Log: Account Management Events............................................. 84
Table 11 Windows Events: Group Changes (Security Log) (VI.02).......................... 87
Table 12 4624 Logon Types..................................................................................... 89
Table 13 Other Logon Events.................................................................................. 90
Table 14 Account Logon Failures Status Codes for Event ID 4625.......................... 92
Table 15 RDP Events from Applications and Services Logs -> Microsoft -> Windows -
> TerminalServices-LocalSessionManager............................................................... 93
Table 16 RDP Events from the Security Log............................................................. 94
Table 17 Windows > PrintService > Operational.................................................... 95
Table 18 Windows OS Stability Events..................................................................... 97
Table 19 Microsoft-Windows-Kernel-Power .......................................................... 98
Table 20 USB-USBHUB3 Events............................................................................... 98
Table 21 Windows > DriverFrameworks-UserMode > Operational (USB, WinlO).. 98
Table 22 Audit PNP Activity USB events.................................................................. 99
Table 23 IP Next Layer Protocol Numbers (IPv4) Likely to be in Use..................... 110
Table 24 Example 4688 Event.................................................................................121
Table 25 Example Sysmon Event............................................................................122
Table 26 Powershell code to list Sysmon EXE's in Long Tail Analysis order......... 123
Table 27 Microsoft-Windows-Sysmon/Operational (v 7.01 as of March 2018).... 124
Table 28 Windows Presence and Process Indicators (Workstation focus)........... 129
Table 29 Analyst Action Examples..........................................................................151
Table 30 NetWork Based C&C Detection............................................................... 179
Table 31 Application Content Based C&C Detection............................................. 179
Table of Contents

Table 32 Indicators of Compromise Forensic Data Dependencies.........................187


Table 33 Example Compliance and Regulatory In Scope Log Retention Periods... 230
Table 34 NIST's Security Maturity Levels and SecOps............................................231

Table of Figures

Figure 1 SOC Roles and Relationships...................................................................... 52


Figure 2 Web Presence Attack Components and Attack Surface........................... 68
Figure 3 Example: End User Payload Focused Attack...............................................69
Figure 4 Perimeter Use Case lllustration................................................................ 108
Figure 5 Windows Sysmon Process LongTail Analysis...........................................124
Figure 6 Maintaining Inventory of Elevated Access Groups...................................140
Figure 7 Daily Analysis OverView............................................................................ 150
Figure 8 Alarm Triage OverView.............................................................................. 150
Figure 9 Decisions Driving the Opening Move........................................................164
Figure 10 Review Data Sources............................................................................... 165
Figure 11 Data Analysis Processes.......................................................................... 167
Figure 12 Graph Theory lllustrated......................................................................... 169
Figure 13 Lockheed Martin Cyber Kill Chain and Security Controls........................186
Figure 14 SIEM Urgency Score Influencers............................................................. 196
Figure 15 Time differences by time zones..............................................................215
Figure 16 Logging Generation, Timestamps, and CollectionComponents..............225
Figure 17 NSM Schematic....................................................................................... 233
Figure 18 www.osintframework.com with Legend................................................243

v
Preface

Preface
With the ever-advancing adversary, technology advancements, and a critical
need for more skilled security operations practitioners, it is imperative for
organizations to enhance their PDR cycle: Protection, Detection, and Response.
This book attempts to answer that cali by sharing experiences gained
implementing fíve different SIEM technologies for more than a dozen
organizations, running a MSSP división, and building several security operations
centers.

Who this book isfor: IT pros, cyber security pros, security operations staff,
security consultants, SOC staff, SIEM designers and consultants, and line
managers: those responsible for protecting information assets and teaching the
next generation of security professionals.

About the Author: Don Murdoch, GSE #99, MBA, MSISE (GISF, GSEC, GCIH,
GCIA, GPPA, GMON, GCFE, GCFA, GCPM, GPEN, GSNA, GPPA, GCWN, GCUX,
TOGAF Enterprise Architect, SABSA Chartered Architect. CISSP, ISSAP), is a
thirty-year veteran of information technology, with more than half a career
devoted to information, Computer, and network security. Don started his career
with a boutique contracting firm located in eastern Virginia, writing COBOL and
FoxPro code. For the next twelve years, Don took on role in a different aspect of
IT as he grew his career: managing a small network, writing oíd school Perl/CGI
software, developing international billing software for a startup ISP, and
managing IT for an Application Service Provider right at the time of the Dot Com
bubble. Don transitioned into Information security where he started a DRP
practice for an IT commercial spin off from a televisión production company.
Things started cooking when Don entered his "digital cornbat training" phase in
the "Wild, Wild, West" of academic computing for one of Virginia's largest
institutions for higher learning. Don wrangled bots, tangled with well-equipped
adversarles, and discovered what today are described as nation State grade
attackers who were using the University network as a training ground. His
University was the first in Virginia to implement a SIEM, user managed anti-
spam technology, and an active countermeasures network based on Tom
Listons' Labrea TarPit. After that experience, Don took on managing SIEM,
conducting employee investigations, and security architecture for a Fortune 500
healthcare firm. That firm was acquired in 2012. His career took a security
hiatus for a few years when he was an Enterprise Security Architect and then
started running Infrastructure Strategy and Planning team for a Fortune 50
Corporation. In 2016, an opportunity to develop an MSSP practice carne up with
a different boutique Consulting firm. After two years, Don left to run the Cyber
Range at Regent University, where he is today coaching the next generation of
Cyber Defenders.

7
Preface

Don started working with the SANS Institute in 2002, taking courses, earning
certifications, developing Stay Sharp courses during the mid-2000's, and
currently teaches courses at the Community level. He earned the GIAC Security
Expert (GSE #99) certification in 2014, as was later vetted as a Cyber Guardian:
Blue Team in 2016 (#38).

About the Reviewers: Each of the technical content reviewers is a seasoned


InfoSec pro with múltiple certifications. The group represents a cross section of
the community ranging from security operations and management, vendor
product development and implementation, penetration testing, security
engineering, and architecture. These reviewers are directly responsible for 42
pages of additional text from the original draft and collectively provided over
700 suggestions as a testament to their skills and passion for helping me to
produce a well written book. I cannot thank. them enough. In alphabetical order,
the reviewers are:

• Christopher Beiring: Lead Security Operations analyst, network penetration


tester, and all-around security engineer for a Virginia based Consulting firm.
® Chris Crowley, Montance, LLC. Chris is a well-known Information security
consultant, a Principal Instructor with SANS, and a course authorfortwo
courses, including Management 517: Managing Security Operations.
® John Hubbard: John is a SANS Instructor and author, a SOC Lead for a large
pharmaceutical company, and an all-around dedicated blue-teamer.
• Seth Misenar, GSE #28: Seth is a Cyber Security Expert who serves as a
Faculty Fellow with SANS. Seth is a co-author for the bestselling SANS
course SEC511: Continuous Monitoring and Security Operations. Seth
provided the initial technical review for this book, when it was in its infancy.
« Ryan O'Connor: Ryan is an InfoSec security operation engineer for a leading
security products company.
• Phil Plantamura: Phil is the COO for Security Onion Solutions. Phil has a 20+
year distinguished career in InfoSec working for defense, IR firms, and
education.
• Chris Sanders, GSE #64, Applied NetWork Defense. Chris is a well-known
security analyst, author, and educator. Chris reviewed the sections titled "A
Day in the Life of SOC Analyst" and "Alarm Investigation Process."
• Johanna Schafer, M.A.C.E.: Johanna provided a layperson read through,
checked it for readability, grammar, and punctuation for this book during
the technical review process. My favorite error she found was the word
"bacon" instead of "beacon", which for some strange reason, was a repeat
occurrence in early drafts.
® Peter Szczepankiewicz: A long term colleague and SANS Instructor.

8
Preface

• Martin Tremblay, GSE #80: Martin has 20 years of combined red and blue
team experience. He works for a leading international Consulting firm and is
based in Cañada.

Update Notes
Major updates are indicated with (VI#) in the section heading.

Versión 1.01: Corrected grammar, spelling,

Versión 1.02: Expanded the project plan section beginning on 20, and in
particular the EDIS discussion based on request from a State InfoSec team.
Reviewed and added various Windows Event IDs*, corrected a few errors.
Updated the list of suspicious EXE's.

9
Foreword

Fo reword
The choirs sang and the trumpets blared as a joyous parade marched down the
avenue amid the blinding confetti thrown from the high-rise Windows above.
Sweet smells of cotton candy and funnel cakes permeated the air. The feeling of
triumph flowed through us all...at least it flowed through a much younger me. I
also may have been the only one who heard the marching bands and the angelic
choirs. And, even though my excitement was palatable, my role in it all was
merely tangential. But it was a turning point. Our SOC team's first (somewhat)
successful SIEM deployment. From the ashes of web-based syslog, convoluted
database exports to spreadsheets, and tools with ñames like ACID and BASE,
aróse the Colossus of Logs. From my little comer and my basic use case, I
frequently paid homage to this wonder of the computing world, mostly through
conducting searches that evolved over time, and crafting basic tools to help
analysts.

Those crack analysts were applying some techniques covered in this book, but
they didn't have a copy from which to work. At that time, our team spent
countless hours thinking about our data, analyzing it, and determining ways to
find evil. Our engineers built processes, tools, and dashboards so the SOC could
work more effectively and empower the júnior analysts. Over time, our SOC
innovated, building more focused and custom dashboards for múltiple use
cases, ingesting intel and other content, writing Scripts to pivot, and more, all to
improve the analysis process. That team and toolset continually improved and
never quite lived happily ever after, but l'll always remember how much my
career outlook changed after that first experience with a good team and a
decently implemented SIEM.

But, now that l've seen more SOCs and log management implementations than I
can count, both good and bad, I have since realized that maybe the choirs I
heard were a little fíat and that the trumpets might have been blaring
something other than fanfare. If our group only had a book like this at the time,
we would have used the SIEM with greater success. The most effective SOCs
with the most solid SIEM implementations got there through thoughtful
strategy and skill, with seasoned pros who had spent years in the fight. They
answered the right questions: Which Information is important? Which logs
produce that Information? Who needs it? Why? Encapsulated in this book is a
fifteen-year career building SOCs and implementing SIEM technology for finding
evil every day. Many of the thoughts in here come through in my work today.

At Security Onion Solutions, we routinely consult with organizations of all sizes


which use our well-known free and open source platform as a core component
of their network and enterprise monitoring solution. During many of those

11
Foreword

deployments and in classes, we are often asked, "What should we monitor?


What makes a difference? How do I find the evil lurking in the network? Am I
logging the right things?" Security Onion is one of many technologies available
to help answer some (clearly not all) of those questions and, with hundreds of
thousands of downloads and implementations across every industry, it keeps us
pretty busy.

I met Don and many other thought leaders in our community at our annual
Security Onion Conference in 2017, where Don delivered a very well-received
talk on building security operations use cases. Don later asked me to be one of
the technical content reviewers because of my experience as a technician,
consultant, and leader. I was humbled and excited to particípate. As I read
through the draft, I found myself waxing nostalgic on my first SIEM deployment
in the early- to mid-2000s. I later commented to Don that it felt like the words
on the page were things l've been saying for a long time and were topics on
many Client engagements, but never written down in one place. In "BTHb: SOC,
SIEM, and Threat Hunting Use Cases", Don covers how to use security focused
data sources to their fullest, how to write a solid SOC focused Use Case, security
metrics you can actually use, and how to build engagement plans and practices
so you will be successful.

You might not hear choirs singing; but, if you're about to embark on the journey
of building a SOC and/or SIEM, whether to implement in a green field, to
valídate your position, or simply to improve your security posture and
capability, you have the right book in your hands.

Phil Plantamura, COO Security


Security Onion Solutions LLC
[email protected] Solutions

12
Introduction

Introduction
This idea for this book actually predates the first book in the series, BTHb
Incident Response Edition. In 2011, our team needed to replace our commercial
SIEM platform. We headed down a path that lead to my fourth major SIEM
implementation. We needed an outline to develop use cases, document all of
the attributes of a use case and SOC procedures to fully use the new platform. I
wanted our chosen vendor to have the best possible chance of bidding on the
work and completing our use cases on time, and on budget. After vendor
selection, we engaged a major firm, and set about replacingthe legacy platform.
The vendor estimated they could achieve 26 to 28 use cases. After 12 weeks, we
exceeded project expectations. They implemented 35 of 37 fully defined use
cases that totaled 497 pages in print once the paperwork was done. The vendor
liked the use case témplate format that they adopted it, and they still use it
today. We added fourteen new right click integrations. We even had a custom
Ul extensión that pulled in a dozen account attributes for every user account
listed in an alert when we opened an alarm. Lastly, we also went from 1.5
people monitoring the prior solution to four full time analysts.

As a result of all that work, the idea for BTHb:SOCTH was born, and I started
collecting notes that eventually became the book you now hold. Along the way I
started a MSSP practice with a good friend working with a Consulting firm. We
won 78% of our POC's in year one, and had a 100% renewal rate during year
two, so several of those life lessons are incorporated herein.

This book will cover many topics related to the Security Operations Team from a
"Field Notes" perspective. It is based on a log career implementing múltiple
SIEM technologies, building SOC's, conducting all manner of cyber investigation,
developing and running an MSSP. The major topics are:

1. Building a Security Operations functiona/ unit, including provisioning plan,


budget considerations, thought habits, analyst skills, and tiering structures.
2. Deciding how to structure your Security Operations capability and the
Services it will offer.
3. An extensive discussion of security focused use cases organized by their
respective data source. This chapter describes what to monitor from a given
data source, as succinctly as possible. Many of these use cases have a threat
hunt theme to them.
4. Building Security Operations Use Cases using my own Use Case Témplate,
followed by a complete use case to use as a model in your own work.
5. Critical SOC analyst skills and investigation processes, which my own team
used while I managed a MSSP operational unit for two years.

13
Introduction

6. A discussion on applying modern Threat Hunting to the Security Operations


team.
7. And a host of other topics that relate to security operations, SOC analyst
skills, and SIEM.

It is my sincere hope that this self-published book delivers on the Blue Team
Handbook motto: "a zero-fluff reference guide for the security practitioner,
written with the intention of sharing real life experience". I trust that you will
learn something useful as you read it as many readers of BTHb:INRE have
shared with me over the years.

Thank you for our support,

Don Murdoch, GSE #99, MBA, MSISE

14
Security Operation Center Field Notes

Security Operation Center Field Notes


SOC Defined
A Security Operations Center (SOC) means different things to different people.
Some say they "run the security platform", others say "they handle incidents",
and still others say "they monitor the security of the network". The definition
for a SOC used in BTHb:SOCTH is:

"A centralized team in a single organization that monitors the information


technology environment for vulnerabilities, unauthorized activity,
acceptable use/policy/procedure violations, intrusions into and out of the
network, and provides direct support of the cyber incident response
process."

In a nutshell, the SOC is the first Une ofdefense. This definition incorporales
several important strategies for a successful SOC. First, a SOC must be under a
single management and reporting structure so that it has a clear line of
authority, funding, reporting, and accountability. Second, a SOC must have
awareness of all aspects of both the business and the IT environment, from the
smallest workstation to the largest supercluster in the cloud. Third, a SOC need
to understand its area of operation (AO), how they will support the business,
monitor business applications, and infrastructure. These criteria must be
covered in the SOC charter. Fourth, SOC budget needs to be large enough to
continually invest in people and support cross training instead of super
sophisticated software. That concept leads to the fifth strategy: train and
encourage analysts to be calm, correctly interpret alerts and their supporting
data. This requires that SOC analysts are well trained.

One point deserves some elaboration. There are few different ways that the
SOC team can establish its AO. The SOC can use the IT General Controls
program, corporate policy/procedure, guidance from standards like the ISO
2700X series, or follow the Center for Internet Security's 20 critical Controls.
When designing, building, staffing, and operating a SOC, you need to develop a
charter and mission
statement.

In order to achieve these


various strategies, the SOC
needs to know the network,
the application to server
relationships, what is
happening on the network,

15
Security Operation Center Field Notes

and be able to determine if that activity presents a significant enough risk to the
organization that the activity needs to be effectively dealt with. SOC teams
don't solve security issues with complex SIEM software. They solve it with
knowledge, skill, and ability. Complex SIEM tools help - but they are not a
technological panacea.

SOC Charter
Every security operations center needs a "charter". The SOC charter defines
how the SOC serves the business, mandate(s), and define governance and
operational rules, what its areas of operation are, and how the organization
needs to respond to the alarm conditions and monitoring the SOC performs.

Note that the SOC charter is not the same as a project charter. The SOC Project
Implementation Charter is the formal document that authorizes the project to
develop a SOC, possibly implement a SIEM, and empowers the project manager
to apply resources and create the SOC.

The SOC charter is often developed in tándem with the SOC/SIEM project
implementation charter. The SOC charter should be scoped properly, whereas
an implementation charter is a Project Management Institute (PMI) defined
document. The term comes from the Project Management Body of Knowledge
(PMBOK) as a type "project artifact". Don't get the two confused.

Business Valué Chain Tie in


One concept that IT people don't often embrace is the business "valué chain".
The valué chain is the set of activities that take inputs and convert them into an
output that brings a valuable Service or product to their market. Valué chains
consist of: resource generators, inbound logistics, manufacturing or Service
operations, marketing, outbound logistics or Service delivery, and after "sale"
Service and support operations. Today, there are very few aspects of the valué
chain that aren't dependent on some form of information technology, which
must be monitored and fully secured following an IT General Controls Program.

In order for the SOC, and IT in general, to be relevant to and communicate with
the business, they must understand how the business speaks, and the
businesses' context and concept of operations1. Formally, a valué chain should
create some form of competitive advantage in the marketplace.

1 These concepts are well defined in the Sherwood Applied Business Security Architecture
(SABSA.)

16
Security Operation Center Field Notes

identify SOC Services


A Security Operations center can provide numerous Services to the business and
to IT. As you consider each of these Services, be sure to incorpórate them into
your SOC planning process as well as the supporting skill, data sources, response
patterns, and staffing to realize that Service over the lifetime of the SOC.
Further, as your organization considers the Services it will offer the business, be
careful to build out Services which will be successful by only taking on a Service
that you can successfully deliver. The core Services of a SOC operations team are
Usted out below. Your organization will certainly implement these Services
based on your own capabilities, funding, and staffing level.

Reactive Services Proactive Services


Monitor Security Posture (Alerts) NetWork Security Monitoring
Command Function (IR/Analysis) Threat Hunting
Initiate & Manage Incident Response Platform Health Monitoring & Support
Vulnerability Management Cyber Threat Intel
Forensics/eDiscovery Threat Intel Integration
Reporting
Malware Analysis Other Services
Intrusión Detection Policy Procedure Support
Audit/Assessment Internal Training and Support
Notification Refinement

Monitor Security Posture: This is the primary role of the SOC: monitoring the
environment for security conditions, alarms, health of the security platform, and
responding through the organizations various technical solution(s).

Command Function: This may be a recurring activity, as the SOC coordinates


alarm response, incident response, and forensic processes. Incident command
can be a very intensive process. Incident command means that your SOC will
identify incidents, work with handlers, coordínate containment operations, will
assist in eradication efforts, take Information from the incident and use it to
better implement ¡nternal systems based on newly found intelligence, and may
also support pushing out updates or other fixes.

Initiate & Manage Incident Response (identification and remediation support):


A significant portion of the activities and instrumentation of a SOC focuses on
finding and validating security incidents based on alarm and NSM work. The SOC
may be empowered to initiate specific IR support from vendors, contractors,
secondary business units-a wide variety of staff outside of the SOC and IR
function. In these cases, an operational process needs to be defined with a set
of releasable data provided to those outside of the SOC/IR team. Don't

17
Security Operation Center Field Notes

freelance or make up these points on the fly- plan ahead. To start planning,
review the application inventory and determine if IR support can be handled
internally or if a third party needs to be engaged. Once you have planned,
exercise your plan at least twice a year using a tabletop exercise format. Once
that is stabilized, intégrate various real data or activity components into testing
the IR plan. Then gradúate to engaging an externa! pen test team, outline an
engagement structure, and put the blue team to the test.

Vulnerability Management: The SOC manager may be asked to assist, or even


run, a vulnerability management program. The SOC manager should be very
cautious not to take on tasking the SOC may not be able to handle: developing
and deploying a round trip, full scope VA/VM program. Working through the
process of safely finding, notifying, tracking, and attempting to identify the
System owner and custodian, and then gain system custodian and data owner
support on remediating vulnerabilities in a timely manner can be a labor-
intensive process. Further, an effective VA/VM program needs to be executed
within the business context and concept layer, meaning that the focus of the
program should be oriented following a business criticality model. These are all
complexities of running a program that can really stretch a SOC.

Forensics/eDiscovery: Depending on the size of the SOC, forensic support may


be conducted in-house, or the SOC may coordínate and support forensic
examinations with a third party. eDiscovery within an organization often uses
the same or similar tools, requires chain of custody during the collection of case
specific Information, and will also analyze the results of data collection. A key
difference is that eDiscovery is focused on collecting search specific Information
from live, in use data and information repositorios that is generated and used by
people. Forensics goes deeper, examining system artifacts from the file system
that show intent for users to interact with files and data, malicious software
residing in memory, or data deleted from disk.

Reporting: Run reports to support compliance requirements and IT General


Controls monitoring. Run reports to support alarms, incidents, and other
reporting requirements. Respond to additional data requests.

Malware Analysis: If a SOC analyst can safely recover a malware sample, then
they may be inclined to perform some lightweight malware analysis using
Services like VirusTotal, JoeSandbox, or ThreatExpert. That advice was useful in
ten years ago, and is no longer considered best practice. In 2017, the better
course of action is to run samples through a local malware analysis engine built
on Cuckoo sandbox to prevent informing the attacker, who is Hkely monitoring
online Services, that their malware wasfound. These tools allow a user to
upload a suspect binary and then advise if it is known bad and provide varying

18
Security Operation Center Field Notes

levels of activity analysis such as registry changes, new Services, file system
changes, IP addresses in use, ordomain ñames looked up. If the analysis reveáis
something suspicious, then the SOC analyst would take that operational
intelligence and be able to better search security data. More complex reverse
engineering beyond this cursory level is a very specialized skill and requires
environment setup for this purpose.

Intrusión Detection: There are several detection systems can be deployed on


the network or on a host. These detection systems (Snort, Suricata, Bro,
PassiveDNS, etc.) all require care and feeding in orderto make sure they are
operating properly. Winning budget to implement a NIDS platform that doesn't
maintain the ruleset isn't an optimal solution.

Notification refinement and improvement: For alarm conditions that are


deemed valid, create notification with sufficient supporting information for the
recipient(s).

Network Security Monitoring: NSM is the collection, detection, analysis, and


escalation of indications and warnings based on network level data that indícate
an intrusión.

Threat Hunting: Threat hunting is a proactive process that inherently assumes


that there is some form of intrusión or breach. Threat Hunting begins with
generating a hypothesis of a compromiso and then tests that hypothesis. It
includes the systematic review of flows, account activity, and event review both
from a longitudinal perspective and in the aggregate. Threat hunting sees to
detect security threats, intrusions, misuse, and breaches by data mining.

Platform Health Monitoring: Monitor SIEM dashboards and alert stream,


reviewing and acting on alerts following a priority basis. Monitor SIEM platform
and other supporting data sources in order to detect issues and work with data
custodiaos to ensure data survivability. Update platform definitions (assets,
networks, privileged users, alarms, etc.) as the environment changes. Ineludes
maintaining source data availability and quality by checking to make sure that
events are parsed and creating new or refined alarms.

Cyber Threat Intelligence: This is the analysis of adversarles, their capabilities,


motivations, and goals. Cyber threat intelligence (CTI) is the analysis of how
adversarles use the cyber domain to accomplish their goals. When considering
CTI, you should use múltiple sources. Not all CTI sources are the same or offer
the same degree of coverage. Also, CTI (in my opinión), ineludes understanding
software vulnerabilities and ready-made attacker capabilities. For example,
what are the new Metasploit exploits added this week? Metasploit makes the
process of exploiting vulnerabilities significantly easier because exploits are

19
Security Operation Center Field Notes

encapsulated into reusable code. How quickly does a new exploit appear after a
vulnerability is announced in a technology you depend on? By keeping aware of
attack tools, vendor announcements, and postings from major vendors from the
IR community such as SANS, TrustWave, IANS, FireEye, CrowdStrike, AlienVault,
and EMC/RSA, you can build a very low-cost CTI program and then make a
purchase decisión.

Threat Intelligence Integration: This is the process of carefully selecting and


bringing in threat intelligence feeds into the System to improve alerting and
better identify suspect or malicious sources, destinations, domains, and other
patterns. Threat Intel sources and the information they provide should be on
the detection roadmap.

Polícy and Procedure Support: Many of


the monitoring Controls and capabilities
should tie directly to established polícy
and procedure. As use cases are
implemented, ensure that there is a tie-
in to how the SOC will support PnP
enforcement. More specifically, as this
Service area matures, ensure that SoP's
are written to define how the SOC will
properly engage with the user,
supervisor, HR, and Legal in response to
violations of PnP's.

Internal Training: Iron must sharpen


¡ron, so the SOC management team
must ensure that as the SOC changes
the line staff must be trained and kept
current. For example, as a new data
source is integrated into the SOC, all members nced a briefing on the data
source and how to use it properly.

SOC Project Planning Outline and Field Notes (VI .02)


"Ifyoufail to plan, you plan tofail. "
- Commonly attributed to Benjamín Franklyn

Instead of repeating any of that contení in BTHB:SOCTH, a condensed outline


for planning a SOC based on the PMI PMBOK2. Also, do not shy away from using
the PMI PMBOK because "project managers are annoying", "project

2 This section was significantly updated for BTHb:SOCTH V 1.01

20
Security Operation Center Field Notes

management is useless", or "it's just not that hard". A solid PM that


understands how to drive a project to completion on time and within budget is
a tremendous allyfor anyone building a SOC or implementing a SIEM. This
section provides a no frills, just facts, discussion on SOC and SIEM planning. As
you read through this section, many of the statements will become elements on
a project plan as a "plan the ítem, conduct the ítem" line item entry.

Develop key business focused understanding of the organization and how the
SOC can support its goals and objectives.

1. Understand the organizational need for a SOC, which means that you need
to understand your organization's goals and objectives. By being able to
articúlate how the SOC protects what the organization produces, sells, or
the Services provided to others, the SOC will have more credibility, be
relevant to the business, and support your organization's mission
statement.
2. Understand the business problem(s) the SOC needs to address and valué
chain resources that the SOC needs to monitor. You may need more of a
"compliance" focused SOC, a tactical SOC, an Incident focused SOC, or some
combination of these. The SOC will monitor several components of the
valué chain in addition to general IT resources. The SOC that intelligently
targets the valué chain for monitoring will be more successful and relevant
to the business.
3. Identify the SOC sponsor. The sponsor may have an uphill struggle to
initiate, build, and deploy the SOC. The SOC manager must be sure that the
sponsor relationship is well maintained. The "customer" should want SOC
Services, and not have them dumped in their lap. The other operational
roles will need to be well staffed. Evaluators and regulators are examples of
"external stakeholders". These roles will be staffed by auditors with varying
skill levels who are attempting to measure and report on risk and the
degree of compliance within the organization. Understanding the questions
stakeholders are likely to ask will inform use cases, reporting, and data
sources that should be implemented to report to the SIEM platform.
4. Ensure there is an actual need for a SOC and its supporting logging
infrastructure. Be ready to articúlate that need, and explain how the staff
and technical capabilities meet the need. Here, you should develop a formal
business case. Be prepared to justify the staff, resources, access, and
software needed to build a SOC.
5. Develop key "Security State" understanding (the "as is" versus the "to be"
State). This understanding is technical in nature and corresponds to various
use cases and monitoring needs from the traditional IT perspective.
Wherever possible, connect a security State monitoring capability with a
valué chain component and the IT General Controls program. Refer to the

21
Security Operation Center Field Notes

most applicable standard for your industry, such as the ISO 27002. See page
245 for more information.)

Build your initial business case, charter, project plan, budget request, and
justification to support buildingthe SOC.

This process will likely be two to eight months' worth of effort. Design the
phases, identify the key inputs and outputs per phase per the PMBOK, and who
will support each project phase.

1. Define the organizational ownership, responsibility, and SOC location.


Attempt to lócate a physical space that will accommodate twice the head
count you will have in year one, so that you don't have to move in year
three.
2. Identify the key roles for SOC: "architect", "engineer", "analyst", "manager",
"customer3", "sponsor4", and "stakeholder5". Several of these are nearly
¡dentical to the roles defined by PMI's PMBOK (definitions in footnotes,
more Information on page 51).
3. Identify the relevant Policy, Procedure, and Governance - in place, or new
PnP's that need to be written and adopted. Review existing PPG and
determine if they support the SOC. Ensure that the SOC function is
integrated into IT processes, particularly new application acquisition, server
provisioning, and change management process. Also, the SOC will need to
consume the forward schedule of changes, maintenance window updates,
and notifications that changes were successful6. As a monitoring Service, the
SOC team needs to know about changes so that they don't over react during
change failures or other OS and app changes that may seem suspicious.
4. Document necessary staffing levels, training, and educational process(es)
(more information on page 45). Here, concretely plan for the first year.
Once that's done, develop a three-year plan and assume that you will have
above average turnover. SOC Analysts are in high demand, and incident
response tends to burn people out. Note that a SOC of one person isn't a
SOC. It is often a highly motivated person who will perform heroic acts and
will eventually burn out, or a single person running a SIEM.

3 Customers and users. Customers are the persons or organizations who will approve and manage
the project's product, Service, or result (from PMBOK V5).
4 Sponsor. A sponsor is the person or group who provides resources and support for the project
and is accountable for enabling success (from the PMBOK V5).
5 Stakeholder: an individual, group, or organization who may affect, be affected by, or perceive
itself to be affected by a decisión, activity, or outcome of a project (PMBOK V5).
6 These are ITIL V3 Change Management terms.

22
Security Operation Center Field Notes

5. Conduct a current data source survey.


Identify the data sources, their logging
configuration, assets, applications,
application to asset mapping, data or
logging suitability for the SOC. You
should not assume that every candidate
data source is well instrumented and
has the level of auditing your SOC will
need. As you prepare your data source
survey, preserve vendor and product
documentation that describes how the
logs work and what their valúes mean.
You will need this detall later. During
this process, you will need to inventory
how each data source can provide
information to a future SIEM: syslog
(UDP or TCP), file write, database table,
SNMP traps, etc.

Conduct an Environmental Data Inventory


Survey (EDIS). (V1.02)

Not only do you need source system data,


you need metadata about the network, organization, users, applications, and a
mapping of the business processes that depend on the organizations
applications. EDIS7 begíns with developing an inventory of major business
processes along with the business process owner. From there, define the
applications that enable said processes, and then the servers that support the
applications - similar to BIA, BCP, and DRP planning. The difference between
SOC/SIEM focused EDIS is the depth of information. BIA, BCP, and DRP are
focused on bringing an application, data, and servers back into Service, whereas
SOC/SIEM is focused on enabling monitoring, understanding who to contact for
an incident, establishing baselines, and being able rapidly investígate an incident
Both processes collect similar data sets, and can complement one another. Data
includes users and their demographics, network maps, address ranges,
applications in use, app to server mappings, app to RDBMS (or other data
storage), input/output streams, web Services that the application uses, and the
overall organization chart. Many of these data sources will provide information
to the SOC and the SIEM through automation, so ensure to get at least "read

7 The steps are nearly the same done in the Business Impact Analysis (BIA) phase of a traditional
Business Continuity Plan (BCP), and then the Disaster Recovery Plan (DRP). If your organization as
a BCP, DRP, or TOGAF7 style EA team, then consult with them for the application and server
inventory

23
Security Operation Center Field Notes

only" credentials for the systems that house this Information such as a
Configuration Management Data Base (CMDB).

From a Project Management perspective, the major steps for the EDIS process
are outlined below.

1. Identify and develop an inventory of major business processes and


departments. Note that this Information may be readily available from a
BCP and DRP plan.
2. Review the asset and network attributes necessary to best popúlate the
targetSIEM in orderto maximizethe data collection process.
3. Identify the applications which support business process, along with the
data owner and system custodians, and from there document an application
to server model, and thus the inventory of technologies in use. In many
SIEM platforms this relationship will be implemented in an asset model,
which supports more accurate alarm rule development.
4. Develop an inventory of every security focused or IT support technology. A
sample list is shown below
a. Network devices: Firewalls, IDS, VPN, DNS, DHCP, NAC, WiFi, WIDS,
switch logs
b. Technology Support systems: Mobile device tracking, Anti-Virus,
Enterprise Detection and Response (alerts), Vulnerability Scanner,
Password management system, web proxy, Email, virtualization
platform, database systems
c. Windows focused event logs: Application, Security, System, Sysmon
Operational log. Note that as a subproject, the SOC implementation
may need to spin up a sepárate project to implement WEC/WEF.
d. Application logs: these usually require a database query or some
other method of data collection
e. Other relevant tools: Email security tool, Insider threat tool, System
Backup logs
5. With the iist of applications and security technologies, a iine ítem for each
ítem can be created in the project plan.
6. Estímate the number of hours to incorpórate the data source for the use
cases - these ítems will expanded in a subsequent phase, following the
"Progressive elaboration" model.
7. Inelude a project specific Une Ítem to develop a briefing for the SOC team
that explains each data sources field set and field valúes.

For each of the identified data sources, you will need these planning and
implementation elements:

24
Security Operation Center Field Notes

1. Determine how the data source will be collected. Consult SIEM Data
Collection Methods and Considerations on page 207 for more information
on SIEM data collection methods.
2. Review the current auditing and logging configuration for fit.
3. Estímate to the extent possible the volume of data. For this point, try to get
an average daily volume over at least a five-week period, which should
catch any surges that naturally occur across a month boundary.
4. Determine if data can be trimmed, meaning review the data to find out if
there are low to no valué records provided by the data source that can be
pruned or dropped either at the collection point or the arrival point.
5. Inventory the data fields from the data source, and develop an internal SOC
training program so that all SOC staff understand the data source.
6. As needed, implement the necessary change control to configure the data
source to report to the SIEM.

Plan the Technology provisioning process to support the SIEM, and another
identified SOC Services (see p. 17). Plan for twice the data you think you will
need in year one.

1. Hardware: Including disk and disk controller architecture, as influenced by


logging requirements and SIEM platform.
2. Virtualization Layer: Modern virtualization technology makes virtualizing
your SIEM a very attractive option. When considering this option, it is
critical to articúlate the data speeds in terms of IOPS necessary for
databases and/or data storage - don't assume that this will be handled by
your infrastructure team.
3. Log storage architecture, scripting, and long-term storage requirements. For
long term storage, you really need space over speed, because you rarely go
back to data past a 90-day threshold. Reliable, safe, and large long-term
storage is more important that blazing speeds. Blazing speed is needed for
the past three days' worth of logs.
4. SIEM and supporting software. Note that most major vendors have their
own predefined project plans for implementing their software, which you
should leverage to the fullest.
5. Spend time on the Budget Process. A SIEM is actually a major enterprise
wide application, and it deserves the same budgetary rigor as with any
enterprise project. This means build a first-year model to get started, 3-year
projection, and then a 5-year projection model. A significant component of
the budget development process is developing the Total Cost of Ownership
model. You will need to know your organization's technology refresh mode
to plan for system replacement. You should assume 50% log storage growth
year over year.

25
Security Operation Center Field Notes

6. Application and IT resource data provisioning and possible development.


This phase is where you will design how each application and data source
will be integrated into the SOC and SIEM. It usually involves some significant
custom development efforts. Each data source brings its own capabilities
and will need some form of alert support.
7. In order to make this process work well, find the gaps in the security
posture of your organization and work to quantify risks. To get this done,
find your risk management subject matter expert (SME) or Point of Contact
(POC) and partner up with them.

Build your log architecture, source data collection delivery, and SIEM and
logging deployment plan.

There is more information in the SIEM Deployment Checklist section beginning


on page 198. Also, Briefly:

1. Perform software and vendor selection based on a scoring model built from
use cases that correlate to your business model, unique data sources,
compliance requirements, and InfoSec program.
2. Review the auditing stance and build out the Events Per Second (EPS) rating
for each of the systems in the environment that will provide data. Then plan
for a 50% increase so that your solution can weather an "event storm". Its
critical to determine the EPS after the source system has had its auditing
level configured!
3. Provisión the hardware and storage platform and implement.
4. Monitor your data feeds, reporting, and system response time.
5. Build your data integration plan for commodity sources, and carefully select
customized sources. For example, an ERP application is not likely to be
supported, so you will likely need to develop a database query to pulí data
from the audit table, implement auditing, test, develop a method to archive
current data to a historical table, and monitor to ensure that the query
process has minimal system impact.

Build out Use Cases.

1. There is an entire chapter in this book on building out use cases. Review all
of that material and compare it against the use cases in your chosen
platform.
2. Plan how to implement the vendor-defined use cases as these should
provide baseline coverage.
3. Forecast the effort required and data sources to implement your own
custom use case.
4. From there, prioritize the implementation so that you will have project
measurement that supports defining earned valué.

26
Security Operation Center Field Notes

Build your response processes.

Response processes are enabled by the variety of data arriving, applications,


business processes, and your requirements.

1. Response processes will be driven by your security program and the


applicable standard you are following.
2. This part of the planning process should answer incident resolution
questions like this: "When we get condition A from system B, what does the
analyst do and what data is necessary for the system custodian to resolve
the incident?"
3. In effect, the process of pulling data into a SIEM will provide the SOC
function with dozens of scenarios that need to be worked through. As you
build these processes, ensure that you are outcome focused- what
objective needs to be achieved based on security condition X, Y, or Z
presenting itself?

Build your SOC Metrics, as defined in Metrics for the SOC on page 39.

Many technical platforms have reporting and measurement that supports SOC
and SIEM metrics. There are many organizational metrics that need to be
developed and collected. This aspect of developing a SOC and SIEM
implementation plan will evolve over time.

Build, and implement your continuous training program.

Training is a constant. SOC skills need to advance as the attacker's skill and
determination advance. Ensure that there is budget for at least two tiers of
education. Provide premium education for the more sénior tier, and then
develop OTJ training for the júnior tier. OTJ should consist of knowledge
transfer, short course, and job skills focused education. Investígate your local
community college work forcé education program and capabilities in area.

There are several open source or very low-cost options. Consider ENISA,
SecurityTube, SANS Cyber Aces, local BSides conferences, DerbyCon, and the
annual Security Onion conference as more inexpensive education options.

Consider, and be Prepared, for Tough Questions


In order to fund SecOps, SIEM, and a SOC, you will undoubtedly face many
questions. Here are a few of them that I have been asked over the years,
condensed for publication. Determine the answer when building your funding
request.

27
Security Operation Center Field Notes

1. Nothing has happened yet. Why do we need to do this? How can you be
sure that nothing has happened yet? As a possible answer, try these out:
"It's not if, it's when."
2. How will the team detect and respond to security issues, incidents, and data
breaches? How did we do this before? Isn't that what the sysadmins do?
3. Did the organization incur any costs from an incident last year? Virus
outbreaks? What costs incurred from our peers and competitors?
4. How many users at what "level" were negatively affected (as in lost
productivity) from an incident?
5. How are you going to measure yourselves and get on the IT Balanced
Scorecard? As a possible approach, ask if you can be on the business
scorecard during the SOC charter development process.
6. How will the team determine what alarm conditions are prioritized over
something eise - who wins? (Hint: asset valué tied to critica! business
process and revenue stream protection).
7. I thought we spent X on Security last year. Why do you want more?
8. I know We don't have anything worth stealing. Why do we need to do more
and more of this security "stuff"?
9. Don't those things cost millions?
10. We are doing vulnerability assessments. Isn't that enough? If you are not
doing active and timely remediation, then no, it isn't.
11. Those security people keep saying no, so l'm going to say no to them this
time. So there.
12. We can successfully outsource that for 1/8 the cost, right? After all, that's
what the vendors say. Why do you disagree with them? Aren't they
experts?
13. How will this SOC solve business problems for us?
14. What does this SOC thing look like year 1, year 3, and year 5?
15. I don't want to buy more expensíve security people only have them quit.
What are you going to do about staff retention? Burnout? Attrition? Internal
transfer? I recently heard at a security conference when people take a SOC
job they plan to quit in 18 months.
16. How will you know when you have had a success? What does success and
failure look like for a SOC? (or a major security purchase?)
17. Have you been talking to the auditors again? They said something about this
last year. I bought a new firewall.
18. Show me a playbook first - can you do that? Come back when that's done.
19. IT Is outsourced, it is "company X's" responsibility, not ours. We have no
liability because that rests with the outsourced vender and it's in the
cloud/vendor contract8.

8 I would encourage not to blurt out in response to this question that that is "A Guaranteed
Orange Suit Acceptance Posture". It does not go over well.

28
Security Operation Center Field Notes

20. What can you do with a third of that? Because that's all we have.
21. We spent $3.5M on SOX last year. No more!

Collect the Bread and Butter Data Sources


There are many baseline systems that need to be monitored because they
represent key data sources that you need in an incident and support
compliance. This is part of the EDIS process. At an absolute mínimum, these
information systems and data sources to collect are:

1. DNS activity, with a focus on internal to external activity first (about 8% to


10% of your networks' DNS request/response traffic).
2. Windows Domain Controller security log.
3. Most, if not all, Windows member servers.
4. Account Ufe cycle, process execution, and presence indicators from
workstations. This ítem is best accomplished using Windows Event
Forwarding and event subscriptions because this is a native built in
capability in Windows, and prevents the need to deploy yet another agent.
5. Perimeter firewall. At a mínimum, any outbound 'denies', accept and deny
traffic to the DMZ, and platform changes. If you have capacity, collect
outbound accept events as well, provided you cannot get a better data
source for the communication flow. For example, if you have a proxy, you
can consider not recording firewall data to/from the proxy if you can get the
proxy logs. Proxy logs are superior to firewall logs as they are application
aware and are user attributable whereas firewall data is not usually user
attributable.
6. Database Account activity and account management.
7. For Linux, the mínimum to collect are the sudo, auth, and authpriv logs.
8. Antivirus centralized consolé data.
9. Forward (outbound) proxy data. For the proxy, valídate that the system
records the user agent, referrer, the URI query string, and the allow/deny
decisión. If the proxy understands the sitetype, that is also useful.
10. Document editing "in the cloud", such as Google's GSuite or Office 365. This
means who touched which file and how.
11. Shared Storage file system activity, as in who touched which file and how
for user and process exposed shares.
12. VPN activity.
13. DHCP transactions.
14. Network device authentication which usually arrive through RADiUS or
TACACS+. Further, network change detection, which usually comes from
Syslog events.

29
Security Operation Center Field Notes

Once you add your own "must have's" to this list, your next task is to get the
daily data volume for each source. Volume has three factors: events per day,
average event width, and the typical peak or spike times. From there, you can
estímate the capacity you will need for your platform.

Useful MBA Concepts: SWOT and PESTL


There are two business management concepts that help when designing,
planning, and building a SOC: SWOT and PESTL.

SWOT Analysis
SWOT is a strategic planning technique used to help an organization identify the
Strengths, Weaknesses, Opportunities, and Threats that every manager should
understand, and be prepared lo use in strategic planning exercises. Building a
SOC is an internal business ventare, which is affected by both infernal and
external pressures. SWOT analysis will improve your business case for your
SOC, will also help you plan, and ifdone well can help identify adversarles that
will launch attacks against the organization. Below is a very brief example to
give you an ¡dea what a SWOT analysis for a SOC project could look like.

Table 1 An Example SWOT Analysis

Traditional Business Mgmt. Security Operations Example


s Characteristics of the Technical Controls and monitoring
organization that provides capabilities; strong perimeter Controls;
advantage over others Policy/Procedure in place and followed;
valuable IP in place is protected (and
targeted).
w Characteristics at a Modérate Funding; staff and skills; volume
disadvantage to others and quality of log data; some log sources
(competing projects) or logging capability missing from critical
applications.
o Items in the business Improving security software; Implement
environment that can be awareness training; local University has
exploited (does not mean Cyber program; Current VA program is
technical exploit!) semi structured.
T Items that can cause trouble Increasing stealthiness of malware;
or thwart the project trusted insider can defeat Controls
(accidental or can be disgruntled);
ownership; resistance to response from
alerts; management expectation to "find
the bad guy" is high.

30
Security Operation Center Field Notes

PESTL Analysis
PESTL (Political, Economic, Socio-cultural, Technological, and Legal) analysis is a
framework of the macroeconomic and macro environmental factors pushing
against the organization. It is used by strategic management, marketing, and
business development teams who need a solid understanding of how the
organization will perform given a particular business venture. PESTL analysis will
help SOC planning along two dimensions: the change and pace of technology
that the organization consumes or produces, and the legislativo or regulatory
environment where the organization operates and has a presence that requires
monitoring.

Funding the SOC


Always remember that an
organization has a specific
mission or goal, and it
articúlales a set of objectives
to achieve its goal in a mission
statement. Since the SOC team
is rarely a profit center, it should ensure that funding requests are aligned to the
organization and the applications that enable the business.

Understand the App Stack. The SOC leadership team needs to understand how
the organization is funded by its valué chain, and in turn how much of the IT
spend that SOC may receive in both Capital Expense (CapEx) and Operational
Expense (OpEx). Practically, this means that your team needs an inventory of
the business applications by their criticality, and then a map of the servers,
database(s), storage, and network connections that enable and support these
applications. With this model in mind, you can then work to align your
monitoring Controls, capabilities, and instrumentation to support the
availability, integrity, and security of those applications. Your Disaster Recovery
Planning / Business Continuity Planning (DRP/BCP) team can be highly valuable
as they have a criticality-based view of the application stack. If you don't have a
DRP/BCP team, the build the app inventory yourself with the intention that the
work can be used for SecOps, as it will assist IT when there be a need for a
recovery event.

For example, your organization is likely to have an eCommerce presence. There


are several components on the eCommerce chain: digital storefront, order
Processing, messaging to the customcr and supplier, the WAN link(s), back-end
data storage, staff that manage all of these IT components, and protecting
credit card transactions. In that list alone, there are dozens of servers,
technologies, people, and processes. Therefore, if the security team can put

31
Security Operation Center Field Notes

monitoring and incident response in place so it can detect violations in baselines


and provide assurance that the components are working, it is supporting the
company mission and the ability of the business to sell goods and Services.

Finalize the Reasons to Fund SecOps: There are many, many reasons to fund
security operations and the logging infrastructure that SOCs and SIEM platforms
require.

1. Regulatory compliance (HIPAA/HITECH, Sarbanes Oxley, and others).


2. A prior incident may initiate a funding event.
3. Management Directive out of a genuine desire or a fear response.
4. Your chosen standard that defines how IT is structured, and are therefore
audited against, may require SOC and a logging platform.
5. Logs within a given system are volatile. Some systems, like a Windows
domain controller, only hold log data for a few hours and then the data rolls
and is lostforever. Some systems hold data in memory, in a buffer, and
when power is lost or the power is lost, so is the data.
6. Without Logs, you have no ability to go back in history and find issues -
security, operational, orchange related.
7. It is, in fact, "the right thing to do".

Security Operations Centers Cost Components


There are numerous cost components to consider when building a Security
Operations Center. Below are many of the common cost components. For each
of these costs, carefully analyze your current environment as you develop a
"build vs. buy" analysis.

Direct Costs - There is more information on this below this list.

1. Intemal Staffing Level. A 24/7/365 requires at least 5 whole people, using


the barest bones staffing model. Target 9 staff people and one SOC
manager.
2. Vendor neutral Training. Some examples are SANS, Security University,
CompTiA CASP, and ISC2 SSCP.
3. Product Training: SIEM Solutions have training provided by the vendor.
4. Tools: Desktop Infrastructure, OS, Office, SIEM license, and investigative
tools.
5. Subscriptions: SOC's will have several subscription Services, and in
particular, threat intelligence Services.
6. Hardware: Server, storage, and network Infrastructure. Be aware or the
typical hardware refresh cycles - usually 4 years.
7. Forensic Hardware: Forensic hardware has unique requirements because
these systems are usually isolated onto a small LAN in a locked room. For

32
Security Operation Center Field Notes

example, a customized forensic workstation known as FRED can cost $5K


and up, storing images on central storage can take many terabytes, and
write blocker kits can easily cost $1K and up.
8. Software licensing costs which ineludes annual maintenance: Software
includes SOC support, additional licenses for various management consoles,
SIEM platform, ingest costs, forensic packages, PDF generation applications,
Bl9 tools, and eDiscovery capabilities.
9. Facilities: SOC Room, furniture, shared large format monitors or projectors,
and proximity card or possibly biometric door control. Also, the forensic
analysis space should have its' own sepárate locks and proximity card
control.
10. Upgrades: Annual upgrades - often handled on a SoW basis with the
vendor.
11. Vendor assistance: Over time, you will need vendor assistance for new
content, improved reporting, more training, and possibly upgrades.

Indirect Costs:

1. Recruiting costs such as a portion of building, recruiting, and staff pay


increases.
2. SIEM Selection costs for the initial project, which includes labor expended to
specific, review, and select the primary SOC toolset.
3. Developing on the job training for your SOC staff. Note, this will tie up key
SME's and the time commitment cannot be underestimated.
4. Integrating new data sources, which may require customized data parsers,
alerts, and reports to be created within your platform.
5. Periodic internal or external audit support.

Staff Cost Considerations: A Security Operations Center needs to be staffed by


skilled Information Security Analysts. Period. Shiny SIEM Solutions don't solve
cases, educated and seasoned people do. Afterparticipating in InfoSec since
2001,1 can confirm It's just that simple. Highly skilled people can produce
more accurate and timely results with a modérate product set than novices
with a super expensive shiny toolset.

Base pay, benefits, and the inevitable staff turnover disrupts the cost model.
The US Bureau of Labor Statistics lists Information Security Analyst base pay at
$92,500 in 2016, and $95,510 in 2017. If your internal overhead load is 30% for
administrativo costs, a loaded position costs $124,163/year. At this rate that is
$620,815 for five people per year in 2017 dollars. A 2016 study published on
glassdoor.com can help to understand the hiring climate: Companies spend
$4,000 to fill a position through open recruitment with a 52-day vacancy period,

9 For example, Advizor Analytics and/or Tableau.

33
Security Operation Center Field Notes

47% of candidatos decline an initial offer. Further analysis found 50% of


employees left a position due to their manager. These numbers help to define
how much a temporary contractor would cost if you cannot find FTE's or need
to replace an FTE. To minimize this cost, concéntrate on getting SOC staff
through the investment zone period and into the return zone period by
onboarding them into handling specific SOC Services and IR tasks as rapidly as
possible.

The staffing cost is further influenced by the coverage model. If the team
operates with 24/7/365, that requires 4.52 people, or 5 FTE's at a bare
mínimum to staff the SOC with a single person staffing the fácility. A lonely job
indeed. This valué is based on 8,760 hours per year, two weeks paid time off and
8 holidays which yields 48.4 work weeks, or 1936 hours of coverage per person.
In reality, any 24-hour operations team should plan on at least nine people to
accommodate vacations, sick time, and staff turnover. Five people will cover the
shifts, and the remaining three will provide additional coverage during high
activity hours, such as 7AM to 7PM and some portion of Saturday. Missing from
this estímate is the percentage of "admin" time. Admin time consists of all other
company required tasking that detracts from conducting actual heads down job
duties.

Incident response has a very high burn out rate compared to other technical
professions. Therefore, to compénsate rotate your SOC front line through
different SOC Services so that they have variance in job duties. Also, look for
analysts that aspire to move up and not stay doing the same thing every day.
SOC managers should always evalúate opportunities to vary the job duties in
orderto retain people.

Facility/Space: Most organizations have a per square foot rate for office space
that should be in the cost structure. As an example, the average 2016 cost per
square foot in Atlanta, Georgia was $20.01 for Class A space and $16.36 for
Class B space10. If you assume 90 square feet for shared workgroup areas with
two workspaces available for a five-person team on rotation, the monthly office
space cost is $2,944.80 for a two-person SOC. There are other single purchase
costs, however. For example, you may want two large format monitors and PC's
to run them with everything mounted on the wall. That could easily be a $4,000
single event cost ítem.

10 From Offices.Net, August 2017.

34
Security Operation Center Field Notes

Vendor Neutral Formal Education: Assuming you can find security people, you
will still likely need to train them in SOC operations. As of August 2018, one of
the best courses from SANS Institute is "SEC511:
Continuous Monitoring and Security Operations" with its
corresponding GIAC GMON certification. This course covers
the practical skills needed for every SOC staff member. I can
attest that there is a measurable improvement to the
quality and speed of each analyst who completed this
course and the corresponding verification. The August 2018
cost weighs in at $6,939 USD. Also, ensure thatthe travel and hotel costs are
included when looking at the cost of training, and estímate $180011. Stay for Day
Six and compete for the SEC511 coin12!

Product Training: Every major SIEM vendor has a series of product training
course. These are usually included in the initial proposal and implementation.
There will be a cost for new staff as they come onboard. To defray the cost, l've
had success by asking for a training credit with an upgrade, system
enhancement activity, or adding in new product component.

Organizational specific training: On the job training will be a continual process.


There are numerous studies that quantify the cost to develop robust and
reusable training. Data from The Association forTalent Development13 listed the
time to develop an hour of instructional delivery (a formal class) between 43 to
185 hours for stand-up professional instruction, with numerous factors affecting
the time. Don't count this lightly, and don't ignore it. For concise insight into the
learning organization and how valuable it is to the company, review Chapter
Seven in Paid to Think by David Goldsmith.

Desktops: Analyst hardware, monitors (the more, the merrierl), monitor arms,
client-side software, analyst licenses for dozens of applications, furniture, and
lighting. Think Quad 24" or27" displays, and an Ergotron type quad arm. Nice!

Vendor Support: Dedicated vendor support for the security product suite, use
case implementation, and continual upgrade processes. These support
relationships are usually priced on a per hour basis with a minimum number of
hours per week. If your SIEM vendor charges $225/hour and sells this support
arrangement with a minimum of 4 hours, the annual support cost is $46,800.

Infrastructure: Back end servers, sensor platforms, network instrumentation


such as a TAP, and multi-tiered storage. Plan for a per server hardware refresh

111 have had good luck shaving a bit by staying in the hotel next door.
12 Coin Image provided by the SEC 511 course authors and is used with permission.
13 https://www.td.org/newsletters/learning-circuits/time-to-develop-one-hour-of-training-2009

35
Security Operation Center Field Notes

at 3-4 months before your server maintenance period to ensure you are not
paying a premium for keeping a server online outside of its maintenance
window. If you need six servers at $8,000 each, that's an initial capital outlay of
$48,000 just for the hardware. Plan for 20% annual maintenance, and
technology refresh atthe four-year mark. Most of the SIEM implementations
l've done were virtualized using VMware 5 and 6. This model can be quite
successful - but you will need to add in the cost for the Hypervisor. When it
comes to actual capacity, spec 2 more CPU's and 4 more GB of memory that you
think you need and virtualize your platform on just that server. The long-run
benefits you will gain are tremendous. These include volume snapshots, copy
over to the new platform during tech refresh, and the ability to more easily mix
and match drive configuration.

Integrating new data sources: To minimize impact as new systems are brought
online, incorpórate the SOC engineering staff in the IT provisioning process. The
objective is to ensure that new systems or major system updates can provide
relevant log data to the SIEM/SOC team. This labor charge should be assigned to
the application or system, not SecOps, and is a recurring cost ítem for the Ufe of
the SIEM.

Content Development: Developing new and refining current use cases within
the SIEM solution. Current use cases will also be updated based on
improvements from the threat hunting team. The better you define the input
data, content needed, analysis, rules, notification, SOC actions, and outcome
desired, the more accurate a cost you will have and the higherthe opportunity
for success.

SIEM Software: SIEM platform licenses are most often driven by a sizing factor.
Typical factors are the "ingest" rate in events per second (EPS), GB per day, or
monitored device counts. You will find there is a class of non-security relevant
events that arrive at the SIEM along with the data that's really needed.
Depending on numerous factors (event cost, Processing horsepower, log
storage, event width) you may want to develop a tiered logging method. Costs
can vary widely here, from $20,000 per year and on up. The better you define
your environment, the better an estímate you will get from a vendor.

SIEM Software Upgrade: Some upgrades for enterprise systems can be


performed through an update process, and some cannot. Experience with five
different platforms leads me to advise that a complex upgrade, such as a major
upgrade, may be better off outsourced to the SIEM vendor. A typical SoW will
be 40 to 80 hours at the vendor's rate plus travel and expense. Ensure that you
investígate this fully with your vendors and intégrate at least one annual
upgrade event to your platform.

36
Security Operation Center Field Notes

Audit Evidence Support: The SOC is often asked to support reporting on security
event data and incidents in direct support of an intemal or external audit.
Staffing this specific role is closely related to the regulatory environment and
how often auditors make requests. To estímate this, determine how many
audits your organization responds to per year and the reporting needed to
support those audits. The SOC team should always record their time to support
audits, as this is a Service to the business. If you have recurring audits tied to a
particular unit, then ask for accounting charge codes to document costs for the
SOC to support operating units.

In House vs. Outsourced vs. Hybrid SOC


Now that you have a structured outline of the costs and most of the long-term
factors involved in building a Security Operations Center, you are in a better
position to consider the pros and cons of outsourcing part or all of the SOC
function. Some empirically based observations on engaging outsourced
Managed Security Service Provider (MSSP's) are listed here:

1. Startup time will have an impact. MSSPs in effect deploy a partial to full
SIEM solution on your network. Each data source needs to be integrated
into the platform, hardware will need to be deployed, and your organization
will still need to define your own incident response process.
2. An MSSP will only be able to go just so far when investigating alerts. If you
are fortúnate, the MSSP can cover 50% to 70% of the alarm conditions well
and will engage your organization on 15% of the observed alerts.
3. MSSPs will never know your network like you do, and you cannot easily
quantify this impact to their quality of Service delivery. MSSPs also are
unlikely to know what changed on the network, as they rarely particípate in
change control.
4. MSSPs work with you through a defined SLA and reporting relationship.
They cannot replace your own staff who can reach out directly to a system
custodian - this is an invaluable benefit to having in-house staff functioning
in a security operations role.
5. Their opinión on alarm sensitivity and configuration /$ not your opinión
because they tend to look at "genuine threat" conditions, and will ignore or
tune out many other conditions. Your use of SOC should inelude policy
issues, threat hunting, audit reporting, and gleaníng operational valué from
the mountain of data the SIEM will consume.
6. There are some tasks that should be outsourced ifthey are infrequent, like
system forensic analysis. However, the battlespace of today tells us that we
need a memory image more than a disk image. This is nigh impossible for a
third party outsource MSSP to collect but may within their ability to analyze.

37
Security Operation Center Field Notes

7. You cannot delegate responsibility to a third party for the security of the
assets under your organizations care, no matter how much someone tries to
convince you that you can. You may be able to delegate authority to
opérate, but not the responsibility of system security.
8. 7/24/365 monitoring by a third party will cost you less for the labor
component than building your own 6-8-person team. There is no getting
around that fact. If you you are being pressured to outsource, realize this
argument and devise ways to respond to the argument.
9. MSSP's may also be able to perform system upgrades and very likely have
done more upgrades than you. Factor in the cost of your deployment a
week or two to perform an annual upgrade.
10. Lastly, you get out of a MSSP relationship what you put into an MSSP
relationship. If you do not invest time, then don't assume that they will give
you stellar resuits.

Getting into the Hunt


Historically, SOC would monitor a variety of prevention-oriented systems and
respond if one, or many, of these platforms alerted the team. Then they would
spend time validating the alert, communicate with the system custodian, owner,
or the end user, and if the situation were an incident, they would respond.

The "reactive or detective only model or posture" from the 2000's is no longer
effective today. Today's SOC teams need to change their focus, assume that
there is a likely compromiso, become detection oriented, and proactively mine
the vast amounts of data coming into their systems and actively look for
patterns of intrusión and misbehavior. Proactive threat hunting is an ideal
career and skill development path for SOC analysts. Once they understand all of
the organization's data sources, know how to handle alerts, and demónstrate
that they have established research skills, capitalizo on that and get them
involved in threat hunting. Depending on how the SOC team operates, you
couid have a SOC analyst can perform hunting one day a week, one week per
month, or take a particular hunt pathway. Threat hunting is further defined on
page 171, with numerous use cases beginning on page 61.

SOC Directiy Supports the CSIRT Function


Today, the need to develop some form of Computer Security Incident Response
Team (CSIRT) function can't be ignored. In many organizations or industries, a
CSIRT is mandated by specific regulation. The need for a CSIRT is especially true
with modern malicious software running rampant, automated ransomware,
industrial espionage, criminal elements, nation State grade hacking teams, and
host of other aspects of digitally based asymmetric warfare. Sounds sensational,

38
Security Operation Center Field Notes

doesn't it? Today's cybercriminal will exploit any weakness they find to extract
untraceable digital crypto currency from any potential victim. Regardless of the
degree of sensationalism, there are several reasons to advócate for and build a
CSIRT function in your organization, which is in turn supported by the SOC
function.

1. The SOC provides an active detection capability that should enable early
response and limit the long-term impact from an incident. The CSIRT can
then coordínate responding to an incident with the goal of identifying,
containing, and reducing incident impact. Further, the CSIRT function can
return Indicators of Compromise or loC's back to the SOC so that the SOC
can perform historical data analysis, such as searchingfor internaI systems
that communicated with a found IP or domain ñame. Thus, a more mature
CSIRT and SOC team can capitalize on a Threat Hunting capability in order to
seek out and find a malicious agent that was recently on the network.
2. The CSIRT influences, supports, and fully leverages the security spend. It
helps to ensure that the SOC tooling is in place will support incident
response and Security Operations. Or put a different way, having a single
CSIRT should ensure that the best tool for the environment is in place, can
be leveraged, and others are decommissioned in order to maximize the
dollar investment.
3. Maintain objectivity when interacting with ¡nternal staff, classifying
incidents, and prioritizing the response process. One of the challenges that
a CSIRT will need to deal with is staff relationships and maintaining
objectivity. Like the HR function, which is charged with enforcing company
policy and procedure in a uniform and consistent manner, the CSIRT needs
to be objective, perform its work to protect the business, and avoid playing
favorites.
4. Lastly, regulation. There are numerous aspects of the business environment
that mándate an incident response capability. These include, but are not
limited to: HIPAA/HITECH, PCI DSS 3.2, and Sarbanes Oxley compliance.

Metrics for the SOC


"What cannot be "Not everything that counts can be counted,
measured, cannot be and not everything that can be counted
managed." counts."'
- W. Edwards Deming. - William Bruce Cameron

Mature business operating units and enterprises utilize various methods to


measure the operating units effectiveness. The SOC is no exception. The
question is how do you get there and avoid toxic metrics that demotivate your
staff?

39
Security Operation Center Field Notes

In the book Pra^matic Security Metrics, W. Krag Brotby and Gary Hinson make
several key points about metrics. Some of their key definitions are Usted here
from chapter 1.6:

1. Instrument: short for "measuring instrument," that is, a device for


measuring.
2. Measure: (verb) to determine one or more parameters of something.
3. Metric: a measurement in relation to one or more points of reference.

In Section 2.6, they State "Having valid metrics enables business managers to
make rational, sensible, and, for that matter, defensible decisions about
information security. No longer mustthey rely entirely on advice from
information security professionals or generic good practice standards, laws, and
regulations."

In Section 3.2, they State ^—-——x


"Metrics are primarily a / \
decisión support tool for A
management. Good metrics ■ XJ ' ki
provide useful, relevant C/^ A
information to help * / |
people—mostly, but not \ k___S¿krckk.J Ug-v.^ ^g e-iút-x
exclusively, managers— '---- -------- -—-—1Z----------------------------------^
make decisions based on a
combination of historical events (the context), what's going on right now
(including available resources and constraints), and what is anticipated to occur
in the future (the change imperative)."

There are numerous metrics and measurements that can be developed and
applied to a SOC. When it comes to designing a metric, there are several criteria.

Take these criteria into account as metrics are developed.

1. Metrics should be relevant to the goals, objectives, and the mission of your
SOC as a business unit. This means you should be able to describe what you
do numerically, and also explain what you don't measure.
2. As you evalúate your data, security use cases, and metrics be sure to
develop a roadmap of what you will measure, how those measurements will
be captured, the tie in to the IT General Controls and security program, and
the business valué chain where possible. Metrics can then provide guidance
for decisión making.
3. Be sure that what you are measuring will drive towards a a measurable
outcome. Don't measure for the sake of measuring. Every metric should
inform a consumer and seek to either to change behavior or demónstrate

40
Security Operation Center Field Notes

that current behavior is operating well within established procedures. Here,


you should be clear on any action that you expect a consumer of the metric
to take based on the measurement.
4. A metric should support a "control", and therefore should match up to your
ITGC program. If you do not have one, look to standards like the ISO 27002.
5. Bad data is marginally better than no data at all. If you are not collecting any
source data for what you need to measure, start there, but do not stop. Use
good data to its fullest.
6. Avoid burdening the analyst with the need to record an excessive amount of
using some artificial means to track what they do like a complex
spreadsheet. Instead, develop methods to mine the SIEM platform, the
workflow system, opening investigation coding, and alarm closure codes as
they work through alerts. These methods provide an economy of
mechanism (EoM), because the analyst is using their native tool. Also,
following EoM principies pushes you to consistently leverage an internal
capability of the SIEM platform, the less likely you are to cause mistakes.
7. Tell your story in terms of your business / organization. When teliing stories,
it is very important to remember the audience and use terms and
definitions that they will understand.
8. Two key acronyms come to mind:
a. Be SMART: Specific, Measurable, Achievable Relevant, Time-bound
b. KISS, or Keep It Simple, Sam/Susan. This isn't actually meant to be
cute. Rather, ensure that the ñame of the metric and the
measurement scale is obvious. A metric the requires explanation is
not likely to be an effective metric.
9. Determine how you can build a score card that measures the Information
and technical security posture of your organization. Whenever possible,
build a tool to demónstrate how effective the technical tools work. You may
not show this tool to management - but you should be ready to.
10. Work hard to avoid any toxic metrics, which are one that can end up
punishing someone who "doesn't cióse alerts fast enough" or "only works
on five cases per day". Instead, focus on metrics that

Table 2 Example General SOC Metrics

Metric14 Definition and Notes


# Unique Data Defines how many different information system data source
So urce Types types are consumed and available for analysis. This
providing measures how many of your technical systems and
SIEM data applications are instrumented. There is a corresponding
percentage of coverage in unique sources / total sources.

14 In the table, MTT means "Mean Time To".

41
Security Operation Center Field Notes

Metric14 Definition and Notes


MTT Detect How long does it take to detect that a data source is not
Data Source functional, which is affected by the volume and velocity of
Issue data arriving from the data source?
MTT Correct How long to resume data delivery once an error is detected
for Data in data delivery?
Source Issue
Time to The Security operations function should be able to check
sweep the every host in the enterprise for loC's, or the host's security
enterprise State. Ideally the average time to interrógate the enterprise
should decrease over time, and the percentage of
completeness should increase (# investigated / total #).
% of apps Measures how many shared applications report account life
under ALCE cycle events. Metric assumes that the app inventory is
15monitoring known. Applications that defer the central directory either
(Non-AD through native integration or LDAP query are counted.
Integrated) Desktop apps are not normally included in this metric - only
shared, server based, or SaaS applications.

This metric has an implicit assumption: you don't need to


measure applications that are AD integrated for ALCE
events. You will need to validóte this premise for your own
organization.
% ofSaaS Integrating SaaS into a SOC or a SIEM can be problematic. As
under a compensation and at a mínimum, all SaaS applications
periodic ALCE should be periodically reviewed to ensure that users defined
monitoring in the application both have accounts in the central
directory or can be accounted for, and that there are no
disabled organizational accounts which are enabled in the
SaaS application.
MTT Cióse an Measures the decision-making process by the SOC anaiyst to
alarm by cióse an alarm when it can be explained, processed,
Cióse escalated, is non-reportable, or non-actionable (example
Category cióse codes).

WARNING: This metric and ones similar to it can easily


become TOXIC to your staff. Be very careful in adopting this
metric. Instead, search for alternatives to show that the staff
can effectively respond to a portion of alarm conditions on a
daily basis.

15 ALCE: Account Life Cycle Event

42
Security Operation Center Field Notes

Metric14 Definition and Notes


MTT Forward Measures the lowest level analyst response time to identify
an alarm up that an alarm requires further action or resolution by the
Tier next level analyst or application SME.
MTT Open a The SOC function may open up a formal incident at any
formal point in the alarm review process. This measures overall SOC
Incident and Incident Response team's capability to detect that an
alarm or another condition is indeed a "security incident".
MTT Measures how long it takes to define, document,
Implement a instrument, and train on a specific SOC Use Case (see
use case Security Monitoring Use Cases by Data Source beginning on
page 61 for more Information on SOC Use Cases).
#of Each organization will have a defined set of security
Implemented conditions that SOC is able to handle with a supported use
Use Cases case or a response playbook. This metric measures SOC
capability and coverage.

Cautions: do not count the number of "detections" from


your SIEM platform, or "alerts". Rather, take those into
account and define your SOC playbook. This metric also
guides how much training is required for new analysts and
how well you are doing at mining your source data.

Also, "# of "new" event conditions converted to Alerts can


count here, but may not warrant a full use case.
# of Use Cases While it is true that you can create, and test, a notification
(rule) that process for a rare condition, the SOC should keep an eye on
neverfire use cases that never cause an alarm or don't prove out over
a reasonable period, say one month. Try to keep use cases
that never fire minimized.
# of Events As a raw number, this metric isn't tremendously valuable. A
Received better number is to measure events by severity, priority, or
criticality - and don't get confused here.
# of Alerts by Based on a combination of your SIEM platforms.
Severity
# of high Measures how well SOC does at putting attention on all
severity alerts "high severity" or "high priority" alarms, per shift, and per
not reviewed day. If the frontline analysts are not capable of keeping up
after 8 or 24 with alarms, then consider adding more staff, improving
hours. automation, and put attention to defining "close/escalate"
criteria on these alarms. Once done and under control, this
metric/measurement would push down the severity levels.

43
Security Operation Center Field Notes

Metric14 Definition and Notes


Rules Tuned Every SOC should have at least one staff member who
to Minimize spends some time improving the notification rules within
False Positives the platform. The better a SOC tunes the platform, the
(per better it is demonstrating understanding of the
week/month) environment.
ATT&CK The MITRE ATT&CK matrix (page 174) is a knowledge base
Coverage by for understanding adversary behavior and the attack life
Phase cycle. This matrix can be used to evalúate how well the SOC
and current instrumentation can identify presence of an
attacker on the network.

Incident Response Metrics are not the same as SOC metrics because they pick
up where alarm processing ends in many cases. In other words, when the SOC
identifies a true incident, they will turn that overto an Incident response
function.

Table 3 Example Incident Response Metrics

Metric Definition and Notes


Cost per There are two dimensions to cost: One is an accumulation of
incident non-FTE costs contributions, such as paying for credit
monitoring during a data leakage event. The other is the
number of FTE hours lost.
MTT to Detect How long does it take for the SOC to review an alarm and
a Security determine that it is, in fact, some sort of incident?
Incident
MTT for Once a security incident has been verified, several steps are
Detect to taken to determine how to "stop the bleeding". Some issues
Contain can be easily contained, such as removing an infected single
Computer and replacing it, some cannot such as changing
the codebase for a complex application.
MTT to expel Once a true intruder is identified, meaning a real adversary,
an intruder how long does it take for the security team as a whole to
push the intruder out of the network. Be careful to analyze
and report this correctly for your environment to ensure
that consumers understand that there are more decisión
makers than just the SOC involved in this metric.
Incidents These should be trailing numbers, meaning as incidents are
opened and opened there should also be incidents being closed.
closed These are measured per day, week, and month.

44
Security Operation Center Field Notes

Metric Definition and Notes


Avoidability of Incidents should conclude with some form of Lessons
an Incident Learned function, meaning that as a result of an incident the
security posture of the organization is improved to the
extent possible. If it is determined that the incident was
avoidable if a common security practice was in place, repprt
it.
Thoroughness Measures whether or not the original compromiso event, or
of eradication one that is substantively the same (like a remote exploit) is
practices observed subsequent to the first occurrence.
MTT Notify The recipient of an alarm condition may be a little hard to
Principie, track down. There are at least three possible recipients - a
System principie within the organization, such as an operational
Owner, or director responsible for the affected system(s), an actual
Custodian designated system owner, or a custodian such as a system
(Incident administrator. SOC should define its escalation points and
metric) determine how to measure how well and quickly it
communicates to the designated recipient.

SOC Training, Skills, Staffing, and Roles


Effective security operations teams require technical skills, should possess
certain personality traits, and require product training by role. Staffing our SOC
team is critical to success. It is important to understand that you cannot "make"
ninja grade incident handlers and SOC analysts who can synthesize a dozen data
sources in real time and find "the bad guy" after completing a one-week course
and passing an exam. That type of skill only comes with "time in the game".
What you can do is develop people, provide them the opportunity to grow, and
develop strategies to keep them in the Analyst seat. You can train people to
respond to specific alarm types, handle specific cases, and work specific
processes. There will always be "task driven" work that needs to be done by
some level of SOC analyst. Playbooks make this level of staff effective, which in
turn gives them success, and that leads to staff who want to do more. Those are
the people you want to identify and grow.

SOC Onboarding and Initial Training


Training for SOC will come in several flavors:

1. Product skills: These are actual vendor solution training. This is achieved
through vendor offered on the SIEM platform itself or a major technology
vendor such as the Cisco CCNA Cyber Ops program, because that course
material targets the actual products in use.

45
Security Operation Center Field Notes

2. Vendor neutral skill development: These skills should cover the job role,
tasking, and concepts that the analyst needs to do the job regardless ofthe
technology platform. There is no shortage of education available such as:
college courses, certification courses like EC-Council Certified Security
Analyst, ISACA's Certified Information Security Manager, CompTIA
Network+, Security+,
and CompTIA
Advanced Security
Practitioner; SOC
focused training by
SANS; CyberAces; and
various Q courses
offered by Security
University. For readers
in Europe, look into
European Information
Technologies
Certification Academy
(EITCA) and European
Union Agency for
Network and Information Security (ENISA).
3. On the Job: Intemal training relevant to the position and team, internally
developed and delivered.
4. Success and Failure: there's nothing quite like "getting it right" and stopping
the bad guy or missing the bad guy and finding out afterwards.
5. Cyber Range Operator training: intensive exercises offered on a dedicated
large-scale lab with a focus on hands on skill development.
6. And many more.

Your training program should include a mix of ítems from the Ítems above, with
the goals of ensuring that your analysts develop all of the skills listed in the next
section. As new staff are onboarded onto the SOC team, you should follow a
well-defined structure in order to ensure that they will have the best possible
opportunity to succeed. Below is a sample onboarding model for a SOC analyst
l've used in several organizations. The primary goal of your orientation program
should be to develop the analyst so that they can be self-sufficient at their
particular level after four to five weeks through an onboarding program.

46
Security Operation Center Field Notes

Week SOC Analyst Orientation


1 • Organization onboarding
• Operator Level product specific orientation (usually 2d)
• Side by side observation by new person with a current analyst
• New staff reviews a "good" and "bad" write up (report) of each
major alarm type that they will work at their level (OTJ)
2 • Side by side alarm review and analysis - new staff works through
alerts and is partnered with sénior staff
• By the end of the week, new staff should be able to handle
several alarm types on their own
3 • New staff is introduced to more complex case types by
reviewing "good/bad" reports, starts handling more of them
• More advanced product training focused on system health,
uptime, and data flow monitoring
• Schedule time w/ each next level analyst/case lead to expand
knowledge of various SOC areas
4 • Shift and responsibility rotation - work various shifts
• Longitudinal "weekly" report contribution
5 • Skills acquisition testing, which should consist of scenarios that
support assessing how much the analyst has learned

SOC Analyst Skills


Historically, successful IR and SOC people need to have a diverse IT background,
should have a few years in the IT game, do require continua! training, and have
a variety of skills in order to handle the breadth of cases a SOC will face. In
particular, an analyst needs to understand how to "connect the dots".
Connecting the dots is a skill developed over time. Analysts can be educated on
understanding a given data source, but must acquire the skill to understand one
event in context of another over time.

Analysts also need to learn how to efficiently preserve case relevant information
as they work through an alarm investigation. An effective technique is to
capture relevant data while they write a ticket or draft an incident report,
instead of attempting to reconstruct data after an incident is over.

Today, in order to address the skills gap, find people who are: naturally curious,
can think abstractly, often took things apart and put them back to together
during their formative years, have a stronq attention to detail, can "connect the
dots", and lastly who can perform research.

47
Security Operation Center Field Notes

Below is an Analyst Skill Development Recipe which carne from surveying over
30 SOC and Security Managers during Ql/2017, and then refined as I
implemented that advice. SOC Analysts need to understand:

1. The "Attack" process and phases: Recon, Sean, Initial compromise, Establish
Persistence, Command and Control, Lateral Movement, Target Attainment,
Act on Objectives, Exfiltration, CoveringTracks, Leave without Trace. The
MITRE ATT&CK framework is invaluable learning tool in this space as well as
the Cyber Kill Chain as described on page 184.
2. Ethics: Ethical behavior means that they work within their limits, ask for
assistance, do not overstate conclusions, never fabrícate an opinión on
weakfacts, keep confidential all of the data they use, and other professional
responsibilities.
3. Organization specific data familiarity: Familiarity with your organizations
source data (event types and fields) so analysts can tease out fact data that
can "connect the dots" to the alarm event in the overall context of the
event stream for the suspect machine or user. This is one ofthe hardest
skills to develop. As a new data source is added into a given system, engage
the EDIS process: the sénior level analyst must prepare an overview ofthe
data forthe remainder ofthe team.
4. System: Technical system access control capabilities and how a technical
control can be applied.
5. Firewall: Firewall Principies such as log types and what the log row records;
rule attributes; actions such as block, allow, reset, drop silently; interface
meaning as it relates to flow; SNAT/DNAT, byte sent/received; and
understanding security zones.
6. Security zones: In particular, security zones are specific to each
organization. One can infer the meaning of a zone, but should not because a
zone term may not be consistently applied by the various admins or
engineers. As firewall rule sets are built, the firewall team must document
what they mean by a given zone and their underlying assumption about
traffic flow. Active Directory admins need to do the same thing for
organizational unit definitions. In this way a firewall zone named
"eComDMZ" can be connected to an ADOU named "WebSalesDMZ" when
the definitions State "severs used to support web sites used for
ecommerce".
7. PCAP collection and analysis: NetWork data collection and the use of
tepdump as the data collection to and then WireShark/TShark as the
analysis tool.
8. Forensics: Forensic principies
a. Gathering data on system following the order of volatility
b. Establishing and maintaining Chain of Custody

48
Security Operation Center Field Notes

c. Documenting actions taken, data extracted, time shift, and timeline


reconstruction
d. Locard's Exchange Principie
9. Hardware: Hardware platform: PC, Server, Router, Switch, TAP, disk boot,
MBR, and identifying volumes.
10. Incident Handling: Incident Handling process: Preparation, Identification,
Containment, Eradication, Recovery, and Lessons Learned. Once someone
understands these points, they need to be able to write an executive
summary of a case.
11. Investigation: Investigative processes as described in the Alarm
Investigation chapter.
12. NAT Issues: NAT translation and the complexity of locating the true source
IP.
13. NetWork Application Protocols: DNS, SSH, HTTP/S, SMB, NTP, DHCP, FTP,
SSL, SMTP, P0P3, IMAP, and SIP.
14. NetWork Protocol understanding: ICMP, QUIC16, IP, TCP, UDP, GRE, BGP,
SCTP, and ARP.
15. NIDS: NetWork Intrusión Detection System (NIDS), ruleset categories and
how to read events generated by an IDS using rulesets such as the Talos
Snort rule set (formerly known as SourceFire VRT) and the Emerging Threats
Pro feed.
16. Application Portfolio: Organization Application inventory, application
behavior, and PoC's
17. Policy: Organization Policy/Procedure and the ability to politely articúlate
and enforce PnP.
18. OS proficiency: Windows and Linux: Services, startup, log files, network
connections, registry, and process identification, and how rights and
permissions are applied to both Windows and Linux.
19. Scripting: Programming, focused on admin and data reduction scripting
skills using PowerShell, Python, shell, Perl, or a similar utility focused
scripting language
20. Report writing17, including spelling, grammar, Word usage, and the ability
to write a summary that answers "Whom, What, When, How, and Where",
and make a reasonable assessment of "Why".
21. HTTP, HTTPS, and Web Browsers: Understanding how the most dangerous
application, a browser, works is critical to analyst success. The application
itself more commonly attacked than the perimeter, because it is more
successful. As evidence, consider the rise in phishing tools and attention on
socially engineering the end user to entice them to click. The ability, and

16 Q.uick UDP Internet Connections, a protocol promoted by Google that supports multiplexed
connections between endpoints and is supported by almost 1% of webservers (August 2018).
17 Chris Sanders has the only course the author knows of on this topic.

49
Security Operation Center Field Notes

consistent habit, of identifying the so urce user when a proxy server is


involved in an alarm is essential. When an alarm occurs or a proxy is
identified as participating in an intrusión, the analysts need to always
attempt to find the user, user agent, or system that generated the traffic.
a. Understand user agents, HTTP status codes, URLs, and browser
redirection.
b. Research the URL data revealed through the proxy.
c. Discretion when researching user browsing habits.
22. OWASP Top 10: Following on the need to understand the browser is the
need to understand how an Internet facing application is attacked.
23. AH of the skills measured in the CompTIA Security+ certification and many
of the Network+ skill areas so that a SOC analyst is terminology compliant.

SOC Analyst Traits


SOC Analysts need to have specific personality traits in order to be effective at
their jobs, as listed out below.

1. Natural curiosity: SOC Analysts will be faced with an ever-changing array of


problems, situations, and new data sources. Find people who took things
apart and put them back together when they were young, especially if it
worked when they were done.
2. Organizational skills: An ability to perform "rapid research" that allows
them to sepárate wheat from chaff, and in particular the ability to
determine if an alarm is likely to be real, based on the alarm and data
surrounding the alarm. For example, if a NIDS rule fires from an attack that
was popular three years ago, what conditions must exist today to permit
that attack to succeed?
3. Abstract thinking: In particular, the ability to read intrusión events like
alerts from a Snort/Suricata system and a Palo Alto firewall and correlate
them in near real time to other data sources, visualizing activity patterns in
their minds.
4. Contextuaiize large data sets: The ability to reduce a larger volume of data
down into information, in context. More specifically, when faced with an
alarm right now, determine if that alarm is relevant to a larger context.
5. Communication: Perform data summarization and commonality detection
such that a group of original facts can produce information, and then
articúlate how the information proves or disproves a hypothesis.
6. Ego: A small ego, but not small enough that they don't take pride in their
work.

50
Security Operation Center Field Notes

SOC Roles
There are severa! roles that need to be staffed in a security operations center.
Depending on the size, scope, and budget, a SOC may have more roles defined. Roles
will also have defined interactions with other key roles, because a SIEM platform and
NSM platform actually have user community - the SOC analysts - who are supported by
the engineering, architecture, and process support side of the SOC team.

Table 4 SOC Roles and Functions

Role Duties and Responsibilities


Analysts As defined in SOC Layered Operating Models on page 52, this
role is the primary SIEM, NSM, and log management system
user. Analysts may function as incident handlers or may directly
support the CSIRT function.
SOC There will be a need to write "utility" software, such as a log
Developer parser, a monitor script, an add on component for a dashboard,
or a lookup tool. Many SOC's would benefit from being able to
utilize development talent and skills.
Shift Lead A sénior analyst who ensures that all shift responsibilities are
met and is a resource for SOC analysts. Shift leads also handle
Communications external to the SOC, and therefore should have
well developed people and communication skills.
SIEM Engineers install, maintain, and upgrade the SIEM platform and
Engineer its operating systems. They also implement use cases, provide
troubleshooting, and configure device support. Advanced SIEM
engineers can also build parsers for unsupported devices, which
usually involves knowledge of regular expression parsing and
SQL queries. Lastly, a SIEM engineer needs to document the
EDIS process and prepare internal OTJ training for all SOC staff
so that analysts properly interpret event data.
Security A process engineer has a security focused system analysis role.
Process They perform analysis to develop, define, and test use cases,
Engineer/ train the analysts on supporting use cases, and help to develop
Analyst reports.
SOC This is a day to day managerial role for the SOC team. The
Manager primary customer is the CISO. The SOC manager implements
strategy and process defined by the CISO.
CISO The CISO owns the information security management program.
The SOC team, SIEM, log management, and NSM platforms
support many aspects of the program, such as incident
response, continuous monitoring, and log management. CISO's

51
Security Operation Center Field Notes

Role Duties and Responsibilities


must understand business requirements and expectations
(without this understanding, they are likely to fail).

SOC Manager

Owns
Recruiting, budget, business 4— CISO
' SOC '
liaison, audit support, metric
design, architecture & change
mgmt

Usabilíty, UC &
SIEM Security
Tier N Content.
Engineer Process
issues Dev
Builds Use Engineer
Analyst Roles:
Ops, Alert Cases, research,
Mgmt, Intake troubleshooti intégrate threat
ng, upgrades, intel, designs
Tier 1 new data use cases and
feeds reports
Shift Lead

Figure 1 SOC Roles and Relationships

SOC Layered Operating Models


A small security team of just a few people rarely follows a hierarchy. The team
just work each other to monitor the environment and respond to alarms. As a
Security Operations Center matures and grows in size and breadth, it usually
develops into some form of a tiered analyst job structure that reflects staff skill
base, alarm conditions that are worked, pay level, SOC Service areas they
support, and training commensurate with their job level.

This section describes a two and three-layer approach, with the word "layer"
being used as a placeholder. Job titles may be Tier 1 Analyst, Tier 2, and Tier 3
Sénior Analyst, or titles like Júnior Analyst, Analyst, Sénior Analyst, Lead Analyst,
or SOC Shift Lead. Regardless of the actual title, the essential concept is that
there are layers that relate to the analysts' skill base, comfort level, and their
ability to respond to alarm and event conditions which in turn ties to pay and
responsibility. Layers models also provide a way to measure progress and
provide job advancement by title and pay commensurate with the skill base.

52
Security Operation Center Field Notes

Regardless ofthe model and title, there is almost always a formal management
layer for the Security Operations team.

What is essential to whatever stratification model your SOC uses is that each
analyst must be willing to ask for help if they find themselves outside of their
depth when working an alarm or handling a ticket.

Two Layer Model


In a two-layer model, the front-line staff is most often staffed with seasoned
security analysts while the second layer is staffed by engineer level staff who
handle more complex cases or require more complex analysis. Above these two
operational layers the SOC management. Each of these levels will have end user
and management interactions at some point.

Table 5 SOC Two Layer Model Roles and Responsibilities

Layer Example Duties and Responsibilities


1 • Real time event and alarm monitoring that follows a standard
opcrating procedure for a wide variety of alarms
• Phone intake for initial case support (phone, email, webform)
® Run reports
• Monitor SIEM system health, data feed checks, and keep an eye
on the system(s) as a whole
• Gather key data, feed many case types to the ServiceDesk, handle
and process more straightforward alarm conditions. and escálate
more difficult or complex cases to the next layer after they have
collected some initial data
• Cióse some cases based on well-defined criteria.
• Certain analysts may be asked to perform "longitudinal analysis".

2 • In depth analysis of alerts and events escalated by Tier 1


• Perform complex analysis and research on alerts and events, such
a previously unseen alarm condition from a new data source
• Take a longitudinal view of event patterns, searching for
longitudinal security issues
• Coordínate incident management
® Tendency to specialize for certain alarm types, Systems, or areas
ofthe business
• Synthesize vulnerability data
• Performs daily or weekly threat hunting activities

53
Security Operation Center Field Notes

Three Layer Model


In a three-layer model, responsibilities are more stratified as the organization
attempts to respond to increased need for more coverage and balance that with
staff costs and skills or a need to provide more coverage like adding in overnight
and weekends. Therefore, a higher separation appears to enable júnior analysts
to be effective for the SOC.

Table 6 SOC Three Layer Model

Layer Example Duties and Responsibilities


1 • Real time alarm monitoring, using a well-defined SoP or other
standardized Operational Guidance document
• Phone intake for initial case support (phone, email, webform)
• Monitor system health and data feeds
• Gather key data, escálate most cases to the next layer if they
can't resolve the case quickly or cióse some cases based on well-
defined criteria.

2 • Handle escalated alarms


• In depth analysis of alerts with a determination whether to
forward or not to Tier 3
• Review the inventory of daily event types received by day,
looking for patterns that may indícate a security issue
• Take a short-term view of event patterns in support of alerts (a
higher degree of alarm awareness)
• Support incident management data requests
• Tendency to specialize for certain alarm types, systems, or areas
of the business
• Perform data feed monitoring and basic system daily checks
® Synthesize vulnerability data
• Perform daily or weekiy threat hunting

3 • In depth analysis of incidents and cases, individually, by day, or


longitudinally
® Manages incidents, may function as an incident
coordinator/commander, or may be a second in command
• Take a longitudinal view of event patterns in support of alerts
and areas of the business or a client
• Has operational and longitudinal responsibility for specific areas
of the business, or a client area/business unit
• Performs specific daily threat hunting activities

54
Security Operation Center Field Notes

Layer Example Duties and Responsibilities


• May perform mernory or dead disk forensics, or may supervise
outsourced forensics

SOC Maturity Curve Using the CMMI


Security teams go through a variety of growth phases and stages, which often
happen very organically. SOC's usually start when management hires someone
either deliberately or as a response to an event because they need to "get a
handle on security". That first hire then hires or pulís in a few people from IT to
form the security team, and as a natural outgrowth some form SOC team is
formed. Muddled in these organic steps is a focus on "buying and implementing
a SIEM", which gets funded as a project18. Then you have a SOC in someone's
shared office, or a couple of people get some space in the NetWork Operations
Center (NOC). All of this happens while the people involved are doing their "day
job".

These types of organically grown teams often miss a critical factor in their
formation: the SOC function wasn't deliberately created following a needs
assessment or a formal model. As a result, varying aspects of SOC Services
described on page 17 opérate at different levels. The SOC staff members write
different reports in response to the same alerts so results are not predictable,
everyone has different skill levels, the operational structure is not internally
consistent, and staff may be frustrated.

How is this problem solved? One well respected method is to apply the Carnegie
Mellon Capability Maturity Model Integration (CMMI). There are five maturity
levels across in the CMMI, with each level representing an evolutionary plateau
in terms of process improvement. Normally, CMMI is applied holistically in an
organization across up to twenty-four process areas, so using it needs to be
adapted and focused for a Security Operations organization. It may not be
necessary, oreven desirable, to push all capabilities and processes to level 5,
because each maturity level is measured by meeting a set of objectives for the
process.

18 Remember - a project is a one-time event while running a security operations team ¡san
ongoing business and IT function. They are very, very different.

55
Security Operation Center Field Notes

Table 7 CMMI Five Level Maturity Model

Ñame Characteristics19
1: Initial Processes are ad hoc, chaotic; success depends on
heroics as opposed to following a defined process.
Services often work but exceed budget, time,
schedule. Success is not repeatable, and heroic action
frequently saves the day.
2: Managed Workgroups (the SOC) define processes, create work
plans, monitor their processes, and meet a set of
"contract requirements" with its customers (the
business). Configuration management is
implemented with quality assurance.
3: Defined Defined processes for managing work are used, well
understood, Services, procedures, tools in place.
Sound project management is implemented into each
process set. Difference in L2 and L3:
process/procedure can be quite different in each
instance of the process, whereas L3 procedures are
tailored for the workgroup.
4: Quantitatively There are quantitative objectives for quality and
Managed process performance, which are used to manage
processes. Measurement statistics are collected on
specific subprocesses.

Difference in L3 and L4 is focused on predictability.

5: Optimizing Focused on continuous improvement.

There is also a formalized model available to assist in assessing a SOC's maturity


leve!. The SOC-CMM was created by Rob van Os, MSc. as part of his Master's
thesis work20. Rob's methodology, tooling, and spreadsheet is setup to evalúate
the SOC across five dimensions and twenty-five aspects. The five domains are
Business, People, and Process, which are evaluated for maturity. The other two
are Technology and Services, are evaluated for both maturity and capability.
You can use this tool to evalúate your current State and develop a road map of

19 These points are adapted from the SEI CMMI for Services, 1.3. Note that as this book was
written, 2.0 was just released, so you should review 2.0 material. 1.3 URL:
https://resources.sei.cmu.edu/asset_files/Webinar/2010_018_101_22253.pdf (8/16/18)
20The SOC-CMM is available at Rob's website: https://www.SOC-cmm.com/

56
Security Operation Center Field Notes

how to mature your SOC. Rob's site is https://www.SOC-cmm.com/. Rob's tools


are gaining quite a bit of traction.

Measurinq Data Source Inteqration Maturity Levels


To apply the CM CMMI, the SOC can consider how it integrates a data source
into the SIEM and its operational process as a process that must be well
managed and repeatable. As a primary requirement, in order to move that ad
hoc process to a mature well-defined process, the SOC needs to systematically
accept data, mine that data, and ensure that the SIEM platform is maximized to
the fullest.

Data Input as an Initial Process (Ll) is characterized by:

• Getting data into the SIEM can be very ad hoc. A well-meaning system
custodian sets up a syslog feed and lets SOC know that new data is
arriving.
® Someone in SOC works with system custodian to make sure that the
systems data can be gathered into the SIEM.
• Someone else in SOC works with the SIEM vendor to make sure that
data is parsed.
• Data survivability baseline is established so that if the source system
stops providing data, it can be detected "quickly enough."

Data Input as a Managed Process (L2) is characterized by:

• Data input goes through a consistent process. Source system


instrumentation is well defined, if there is a need for custom by the
vendor it is planned, and a synthetic transaction is setup (see p. 193).
• The source system is fully exercised so that all of the security and
operationally relevant events are logged.
• Once the data arrives, SOC builds source specific alarm conditions based
on that data source.
• The SOC is trained on the breadth of event types so that the team can
fully utilize all of what the source system can provide.

Data Input as a Defined Process Characteristics:

• Organizational policy and process artífices require that data source


input is integrated into the organization. For example, policy requires
logging to the SIEM and/or log management platform is required, and a
check to see this is setup properly is integrated into the Configuration
Management process.

57
Security Operation Center Field Notes

• The SOC manager ensure that all users have completed onboarding for
each data source and confirms that there is equivalent understanding
how to use the data.

Measuring Alarm Processing Management Maturity Levels


To apply the CM CMMI, the SOC can analyze how it handles Alarm Processing.
Alarm processing is another good example of a Service that should be
consistently delivered. This Service as it is the result of processing data input
into the SIEM and then acted on by the SOC analyst. In order to realize this
Service, all aspects of the SIEM and SOC are part of the Service delivery: data
input, parsing, health monitoring, alerts raised in response to events, analysis of
alerts to valídate that they are true positive, severity is responded to, based on
the impact to the organization, incident response is activated to the degree
needed, and the resulting incident is tracked through to resolution.

That is quite a mouthful! Below is an example of how alarm management can


move through a set of maturity steps, going from an ad hoc maturity level to a
well-defined level. The primary requirement is that the SOC read, review, and
respond to alerts as rapidly and completely as possible.

Alarm Processing as an Initial (Ll) Process has these characteristics:

• SOC Analysts review alerts as they arrive, with some notion of prioríty
and impact. High priority alerts which are likely to be true positives
receive attention such as reporting for resolution, data owner and
system custodian notification, or routed to Tier 2 for further
investigation.
• SOC analysts may or may not be consistent with each other in the
decision-making process on alarm resolution. During times of high
stress, alerts are not consistently managed. Critical alerts may never be
evaluated in a timely manner.
• Alerts may occasionally be reviewed daily in the aggregate such that
critical alerts always receive some form of treatment as a "stop gap".

Alarm Processing as a Managed (L2) Process Characteristics:

• Event data is reviewed in order to find other alarm conditions in order


to improve detection capabilities.
• Alerts are tuned so that false positives can be minimized. This part of
the process should influence the source system in a feedback loop.
• Alerts are treated consistently by all members of the SOC, with most of
them mapped into incident playbooks or other supporting processes.

58
Security Operation Center Field Notes

• Alerts are resolved, routed, or investigated as they arrive within a


certain defined time frame - say 30 minutes for critical, one hour for
médium, and within 4 hours for low.
• The alarm board is reviewed every shift to ensure that all "Critical",
"High", and "Médium" alerts receive attention.
• Alarm analysis and processing enter a tuning process so that lessons
learned and system custodian feedback improves the alarm
management process.

Alarm Processing as a Defined (L3) Process has these characteristics:

• As data sources are integrated into the SIEM and SOC processes, they
are mapped into the taxonomy, the review processes are defined, and
the SOC staff is fully trained on the data source and the alarm
conditions it brings to light.
• The SIEM platform is augmented with orchestration and automation in
orderto improve analyst responsiveness and put better data in the
hands ofthe analyst.
• Alarm and Event data is used to guide a threat hunting program, and
the threat hunting program in turn guides and improves alarm
generation capabilities with the corresponding SOC processes.

Example SOC Tumover Shift Check List


Start with this list for your SOC tumover. Cover tumover at the beginning of
each shift. Each shift needs to summarize running alerts, incidents, and follow
up ítems for the following shift as these are critical for situational awareness.
Review this list and define what analysts do per shift, develop an organization
specific model, and then update the end of shift turn over. Some SOC teams
aren't staffed enough to provide an overnight function, so at the beginning of
the morning shift there should be an immediate review of overnight alarms.
Determine how to incorpórate this point in your environment.

1. Tumover from prior shift, which should include:


a. Key events
b. Status for ongoing incidents
c. Staffoutages
d. Major data issues
e. Any system stability issues
f. Relevant communication topics
2. On the normal maintenance day, review the scheduled changes so that the
SOC won't over react for alarm conditions that can be explained by a change
running through change management.

59
Security Operation Center Field Notes

3. Quick review of yesterday's alerts so that any recurrence or repeat


conditions which may reveal a repeat event are easily identified.
4. Review the daily briefing, which should cover topics like:
a. New alerts instrumented in the system
b. Data sources leaving the system
c. New data sources coming into the system
d. OTJ training gaps

60
Security Monitoring Use Cases by Data Source

Security Monitoring Use Cases by Data Source


As you read through this chapter you should:

1. Think about common scenarios for attacks and adversarles who target your
organization. To aide in this process, this chapter introduces a scenario, an
attacker plan, and a defense plan.
2. Ensure you know how to get the data to support the use case points.
3. Gather vendor and product documentation so analysts can easily look up
event detall as they use event data when researching an alarm.
4. Determine the risk that the use case is designed to address for the
organization.
5. Decide and document
how the analyst will
respond to the use case.
6. Review the enterprise
data inventory survey in
the SOC planning chapter.

Above all else keep looking


for "evil" because it is looking
for you (or, rather, your data
that makes up your
organization's Crown Jewels).

The Scenario
Every organization has a wide
variety of types of data that
they can collect. As the SOC
considers collecting data,
evalúate that data against the
actual needs of the
organization, how that data will be used to detect an adverse security condition,
how cióse the data is to the end user focused application, and how user
attributable that data is. To ¡Ilústrate these concepts, this section will walk
through an example of systematic intellectual property theft conducted against
VictimCo.

The Setup
First and foremost, VictimCo is in possession of valuable data. After surveying its
business environment, valué chain, and identifying its valuable data, VictimCo
makes several key determinations:

61
Security Monitoring Use Cases by Data Source

1. There are well-known users with a public presence, ripe fortargeting. Of


this user population, a small number of them frequently travel and speak at
conferences. There is also an active traveling business development group.
2. Critical Intellectual Property is spread out on the network, meaning it is not
Consolidated. Further analysis here indicates that there is a mix of storage
platforms. The list ineludes SAN, ÑAS, OS based shares, and web-based
collaboration sites. A significant portion of IP is "Trade Secret" and not
necessarily well marked.
3. Even though there is a web proxy, it is not configured with an overly
restrictive policy.
4. More than 30% of the user population have elevated rights on their
workstations, which is implemented by adding the user's primary login
account to the "Administrators" group.
5. The firewall has a default by deny stance and logging is pretty good.
6. There are easy methods of data egress because outbound FTP is not
restricted from the desktop and websites that allow users to paste data and
then get a URL to that pasted data are not blocked.

The Attackers Plan to Find Data and Exfiltrate


VictimCo's team develops an outline how an attack would progress. An attacker
who is after valuable intellectual property will need to build and execute a plan
similar to the one outlined below.

1. Reconnaissance: Website scanning, performing Internet based "doxing"


type research, and gathering candidate email addresses for a targeted
phishing campaign.
2. Weaponization: An attacker would use a variety of tools to craft malicious
payloads, such as PDF documents with droppers, macro enabled Office
documents, or a website with malicious content.
3. Delivery: The most likely chance for success for an attacker is to send email
with a convincing pretext. The pretext is written so the message appears to
come from a professional colleague or someone with interest in working
with them. For example, a phish campaign would indícate that they read a
book or article by the recipient, express an interest for some collaboration,
or suggest that the recipient may be interested in other articles, books, or
sites. The links for these sites would include malicious sites. Often malicious
sites would contain a link to a login site that looks like it belongs to the
recipient's organization - but in fact it didn't - with the goal of capturing
real credentials.

62
Security Monitoring Use Cases by Data Source

4. Exploitation, Part One: Once credentials were captured, an attacker would


use them on any exposed site or may even attempt to login to a common
VPN Service.
5. Exploitation, Part Two: While gathering the username pattern from email
addresses, an attacker may attempt to brute forcé or "password spray" any
potential account against any exposed web interface that VictomCo has.
6. Installation (Remote Access): Once the attackers had a minimal foothold
they were capable of establishing persistence on several users'
workstations, gather higher level credentials, move laterally within the
network, and gain access to shares and sites with trade secret data.
7. Command and Control: With a foothold, the ability to run Services and
enable persistence, proxy Service aware command and control agents that
communicate out to C2 nodes can be installed on target systems.
8. Act on Objectives: Once sensitive data is located, it can be exfiltrated in
numerous ways. Examples inelude uploading to file share sites, FTP on non-
standard ports, email, direct transfer to using netcat are but a few.

The Defense Plan


The primary underpinning of the defense plan is to prioritize capturing user
attributable data that matches likely attack vectors. Gone are the days of
collecting millions of firewall records in hopes of analyzing that data. Modern
data collection should be user attributable, be as cióse the application as
possible, and provide execution context because the modern attacker must Uve
offthe land, which means they need to change the OS and use scrípting
languages present on the system. To quote Alissa Torres21 from SANS, "Malware
Can Hide, But It Must Run." Prevention technologies are certainly valuable tools
and they do protect the network. However, we must assume that they will fail in
the digital arms race so we must instrument for post exploitation detection.
Furthermore, no advanced preventative technology can counteract a user who
knowingly, willingly, clicks on a link in an email and ignores all the warníngs.

Instrumentation SIEM and Security Architecture


Use the dnstwist algorithm to A search can be run to see if any
develop an inventory of domains that system that uses a FQDN from the
look like VictimCo. These domains can dnstwist list is hit. Examples - user
be checked historically for site visits, made a DNS request, attempted to
email communication, and blocked in visit a website, sent or received email
the proxy with a redirect. Twist to one of these domains.
domains can also be DNS sinkholed.

21 https://digital-forensics.sans.org/blog/2016/10/29/malware-can-hide-but-it-must-run

63
Security Monitoring Use Cases by Data Source

Instrumentation SIEM and Security Architecture


Confirm that email activity is logged Should a user succumb to a phish
and recorded at the SMTP mail, the SIEM/email system can be
conversational level, and configures queried to see if other users also
these data elements to be fed into received the same mail and then the
the SIEM from message platforms: email can be removed from their
Sent, From, Return Path Domain, To, inboxes.
CC, Subject Line, Attachment
attributes (ñame, size, date), type,
and Bayesian score
Implement sysmon on high valué Create alerting that will fire if an
user's workstations in orderto collect office application orthe email
process invocation (parent, child, application opens up a scripting tool
path, command line). or a command prompt, which is a
high valué alarm and indicator that a
Registry Change - Sysmon Event ID 13 user clicked through and opened a
malicious email payload.
Update Windows domain auditing to Nearly the same functionality can be
collect command line path from 4688 accomplished with command line
events (detailed Tracking). analysis which is provided by the
4688 event.
Collect same into the SIEM using The threat Hunt team can perform
Windows Event Collection and long tail analysis of software
Forwarding as a first phase approach. executed and over time build a
whitelist of known good applications
and command lines. After that is
done, they can then monitor by
exception.
Update all Internet facing site to Access attempts can enable several
minimally log access attempts account management use cases - a
(success and failure) to the SiEM. new account successfuily logged on,
(custom development in many cases). a brute forcé was successful, brute
forcing is being attempted, and a
spray attack (try one password for all
possible accounts) can also be
attempted. If the source machine is
on the inside, it can be thoroughly
investigated for C2.
Always on VPN (enhance security Traveling users can be redirected
architecture) back into the company network so
that security Controls and monitoring
is actually effective.

64
Security Monitoring Use Cases by Data Source

Instrumentation SIEM and Security Architecture


Update the Windows Workstation Windows has a native capability to
Presence Indicators as a second centrally collect audit logs. At a
phase approach, once it is proven to mínimum, several event types need
work and the issues are resolved. to be collected which are known as
"presence indicators": login, screen
New Service: Event ID 7045 lock, reboot, screen unlock. After
Scheduled Task: Event ID 4698 that: local group management.
Local Group Changes: Finally, Service State changes. These
4731,4732,4733,4734 event types can be used to detect
when workstations are used outside
of normal business hours and for
unauthorized changes, new
accounts, and Service installation -
all underpinnings of persistence and
lateral traversal.
Persistence detection: Autoruns An advanced detection technique is
(daily, for all workstations and to consume the output of
servers). "autorunsc" into the SIEM, sort the
data using Long Tail Analysis (or
stacking) in order to detect any new
persistence entries.

Once operating system data is collected, then focus on valuable network level
trace data. Network level instrumentation should focus on chokepoints, flow
data, and application support intelligence such as DNS activity, web browsing
activity, and network flows between network segments. For example,
workstation to workstation traffic is highly unlikely in most corporate networks.

Defining the SOC Use Case


After working through this scenario, the security team and IT work to jointly
define use cases. This is a process that, if done well, will pay huge dividends
down the road.

First, let's level set on the phrase "use case"22. A use case is "a set of actions or
steps which define the interactions between an actor, which can be a person, a
system, or a Service, to a system in order to achieve a particular objective." A
full-fledged use case témplate, tuned for Security Operations and focused on
instrumenting the environment presented in this book on page 133.

22 Adapted from the Wikipedia article on Use Cases

65
Security Monitoring Use Cases by Data Source

For a SIEM and a SOC team system, building a use case has several requirements
based on the definition from above:

1. You must be able to describe the observed condition that is relevant to your
security posture and realizes the use case (the objective).
2. The system providing data to the SIEM must be capable of actually auditing
the desired behavior (observe the action by person or system interaction).
3. The system must provide the event record with sufficientfidelity to the
SIEM (as defined under Log Record Data on page 223.) (define the action).
4. The SIEM must be able to process and present the event at the necessary
level of granularity for the Sec Ops fu nction (to measure or observe the
interaction for the actor).

Briefly, the security focused use case development process is:

1. Understand how the use case maps to or supports a Business Capability or a


Requirement: Uptime, brand protection, dozens of compliance
requirements, fraud prevention/detection, IP theft, or minimize disruptions.
2. Design the question that the use case should answer. How would the
attackergain needed access, cause damage, exfiltrate data, orwhat
accounts would they need to use?
3. Determine and test the data sources and the data elements that provide the
visibility needed to answer the question.
4. Evalúate the data by establishing normal baselines and other analysis
dimensions. Characteristics to understand include volume, peaks/lulls,
outliers, averages, frequencies of types of data or specific elements,
duration of normal behavior, and how do you fínd something "new".
5. Establish the SOC guidance and processes that will be used to filter out false
positives from the baseline data. Ensure that guidance is developed to
support identifying malicious use or operational issues.
6. Various techniques exist to visuaiize data such as bar chart analysis, graph
analysis, simple timeline presentation, and other summarizations.
7. Many data sources naturally lend to building correlation rules where one
data source compliments another.

With this definition and these criteria in mind, ITI define dozens of SIEM use
cases that should be built out along with the corresponding event data that
makes up the use case. Again, as in the rest of the content of BTHb:SOCTH, most
of these use cases are based on experience and were implemented to one
degree or another. This inventory of use cases is not exhaustive, and should not
be considered "the definitivo list". Hopefully as you look through these they will

66
Security Monitoring Use Cases by Data Source

assist you in implementing your own use cases based on your data sources and
valué chain.

Example: Web Presence Attack


Many organizations opérate a web presence. According to the January 2018
Netcraft survey, there were 213,053,157 unique domain ñames, 7,228,005 web-
facing computers on the Internet23, all hosting 1,805,260,010 sites.

These systems can be as basic as an externally hosted public Information sharing


site to a full-fledged eCommerce and customerengagement platform. The
illustration below provides a high-level view of the layers arranged horizontally,
with the Cyber Kill Chain steps and common avenues of attack cutting across the
layers vertically. At the business conceptual and contextual layer, the
organization operates a web farm with múltiple web servers. Frequently there is
a custom developed application, which may or may not include packaged
software or libraries as a component. For example, a site may have a product
catalog, a blog site, and a contact page, each of which can use its own
subcomponents. On the prívate, or login protected portion of the web presence,
the organization can host customer order request, acceptance, and processing,
customer engagement, and support ticketing. The organization may also host
some hybrid applications, such as a forum which requires some form of
registration that may be sepárate from the customer engagement account
database.

Cutting across all of these technologies is an exploitable attack surface emerges.


Some of the best guidance to understand this attack surface is the consensus
driven OWASP Top 10 Most Critical Web Application Security Risks list, which is
updated every few years. SOC and IT can leverage this list to determine what
type of monitoring may reveal an exploit, what type of security monitoring
capability needs to be deployed to protect the web presence. From a Cyber Kill
Chain perspective, the attacker needs to perform reconnaissance against the
site and its numerous components. Not shown is the weaponization step, as
attackers develop attack capabilities using other environments. Once the attack
technique is available and an exploit is developed, the attacker can successfully
exploit the site, a component, or an underlying library and install their own
persistence mechanism such as a backdoor shell. As mentioned elsewhere, this
scenario shows why it is necessary to keep informed about updates to
Metasploit.

23 https://news.netcraft.com/archives/2018/01/19/january-2018-web-server-survey.html

67
Security Monitoring Use Cases by Data Source

Business Layer- Internet Presence for Brand


Customer Facing Customer Facing
Web Server Web Server
I
Business Support Processes provided by
Internet Presance

Ul: Public Pages (catalog, Services,


Security
policies, press announcements)
Monitoring
Capabilities
Ul: Prívate Pages, Login Req'd Web App
BLL: Order management, support issues Firewall, Native
DAL: data storage, request/response OS Log, Web
Server access
BLL: Front end code behind, Software log, DB
librarles, BTO codebase, EJB/ASP.Net, Monitor, Flow,
numerous NoSQL/RDBMS, messaging,...

Figure 2 Web Presence Attack Components and Attack Surface

Example: End User Payload Focused Attack


Today, the most likely avenue of attack is against the end user through some
mechanism that entices them to visit a site, download a file, or click a link in an
email and then ignore security warnings. Collectively, these attacks all fall
underthe umbrella term "social engineering" which are leveraged through
email, email attachments, watering hole attacks, and manipulating text stored
in forum posts.

At the top. the attacker exercises the cyber kill chain to deliver some content by
some means that interacts with a user application. In the case of a malicious
email, the user may be enticed to open an attachment that could be a malicious
PDF file or an office application with a macro that has malicious script code in it.
Once the payload is opened, or successfully delivered and the user interacts
with the payload, the attacker can begin to leverage a wide variety of
techniques found in the MITRE ATT&CK matrix.

The elements illustrated in the next figure along with references to the Cyber
Kill Chain and the MITRE ATT&CK Framework.

68
Security Monitoring Use Cases by Data Source

Figure 3 Example: End User Payload Focused Attack

One of the first actions is to establish persistence so that the attacker can
return. After that, attackers vary their actions. They may attempt to crack
passwords, gather hashes from the SAM database or from an inbound
authentication token on a connection, search the network for an opportunity, or
if they are lucky, pillage the system they compromised because it was just the
right one. In any event, an attacker has numerous opportunities that can be
used to take advantage of Windows and Microsoft networks. The older the
operating system and the less out of date it is where they establish the first
foothold, the better (for them, not the blue team).

Organizational Considerations for Use Case


Development
Before we dig into dozens of data source specific technical use cases, security
operations really need to understand several key factors and answer several
questions. This chapter will lay out several models of the modern attack
landscape so that security operations can determine where to engage, what
logging sources will protect the business, and how to instrument data collection.

Questions to Answer:

1. Is there a security minded business analyst available to help develop


security focused use cases?
2. What is the business operating environment?
3. What is the application and data resources the resources the business
depends on to achieve its mission objectives?

69
Security Monitoring Use Cases by Data Source

4. Is there logging enabled for the applications and the platforms they depend
on?
5. Who is the application and platform owners and custodians SOC will need to
engage with?
6. What will the SOC do in response to an alarm?
7. How will you maintain your use case library?
8. At what point does a use case produce change control Ítems?

“Top Ten” Security Operations Use Cases


This section is here to provide a starting place for your SIEM and SOC efforts.
When formulating your organizations top ten use cases, consult standards like
the Australian Signáis Directorate Strategies to Mitígate Cyber Security
Incidents24. The ASDSMCSI is designed to assist organizations to help security
professionals to mitígate incidents. There are various other sources that can be
consulted. Below is a list of the top ten security use cases that a SOC team
should implement as early as possible.

1. Privileged Entity Monitoring.


2. Brute forcé Authentication failures.
3. Authentication Anomalies
a. Service Accounts used for interactive logon.
b. Service Accounts used from non-authorized source Systems.
c. User logon locally (on LAN) within a short window of a VPN logon.
d. User logon more than an hour before or after normal work periods.
e. Interactive User authentication from múltiple source systems.
f. Shared account usage (which should not be confused with accepting
the ¡dea that using shared accounts is acceptable).
g. Default account usage (same caution).
4. Session Anomalies. There are numerous examples in this area.
a. The typical end user should have a session beginning and ending
with ten (or less) hours from each other.
b. Significant profile change in web browsing habíts.
c. Spike in outbound firewall denies.
d. Workstation network to workstation network communication.
e. What is a reasonable clipping level for sessions?
5. Account Anomalies
a. Accounts used before the user's start date.
b. Accounts used after the user's end date.
6. Data Exfiltration indicators

24 https://acsc.gov.au/infosec/mitigationstrategies.htm (8/18/2018)

70
Security Monitoring Use Cases by Data Source

a. HTTP(S) Send/Receive mismatch. Data received from a site is often


many times data sent to a site, by byte volume, as most of the time
the browser is downloading a file and rendering it for the user.
b. File transfer protocol(s) used from end user sources or systems that
don't require these Services such as outbound FTP from a print
server.
c. Use of file storage sites (Dropbox, Box, Microsoft OneDrive,
GoogleDrive, SugarSync, Leapfile, etc).
d. Use of websites that allow for 'easy information sharing or text
storage', where users can cut/paste information in an unregulated
manner.
7. Signature Matches to known Vulnerability Sean Results.
8. Any excessive 'service failures', such as A/V agents that repeatedly fail or
backups that fail. Note that outage detection also provides operational
valué as well as security valué.
9. Insider Threat Indications -
a. Accessing "security research" sites.
b. Use of USB drives.
c. Authentication baseline violations.
d. Authentication failures against file shares, applications, servers,
internal SharePoint sites, etc.
10. Security Log Data failure conditions.

AntiSpam and Email Messaging


If there is one system people use every day, it must be their email system.
Typical email is person to person or person to a small group, with a low ratio of
email with attachments to those without. This is a good starting place to define
"normal" for your organization.

Use cases based on email are easier to implement for locally hosted email
systems than cloud email systems, because it is easier to get logs from on-
premises systems. Free email systems are highly unlikely to provide meaningful
data export and SIEM integration. Paid cloud email systems may not support
SIEM integration or may only provide user activity through a downloadable
report.

1. Email with attachments that have spaces or múltiple periods: Attackers


like to obscure their malicious content. Two methods are to add several
spaces and include múltiple extensions like ".docx.exe".
2. Email burst or flood: A rash of inbound email can easily be a phish, or some
other campaign you may not want.

71
Security Monitoring Use Cases by Data Source

3. Infected sent mail: Intemal users sending virus-laden messages internally or


outbound. This condition indicates a failure in the local A/V Client or
malicious software on the system.
4. Spam sent mail: Users who sent email that is identified as spam is not
normal. If the organization has an upline anti-spam system, the user's
messages were likely blocked. It may be advisable to let the user know,
assuming there isn't a further negative condition.
5. Non-authorized systems sending mail: The only systems that should be
communicating to any one of the messaging TCP ports should be well
known and understood (such as the infernal messaging systems). Below are
common messaging ports for email systems. For continuous monitoring, the
system should create an alarm for traffic outbound on these ports from a
non-authorized source. For Threat Hunting, a report that includes these
ports and the senders should be deveioped and periodically reviewed.
a. 25/TCP - Simple Message Transfer Protocol (SMTP)
b. 110/TCP - Post Office Protocol (POP versión 3). POP2 is on 109; but
the likelihood of seeing this is minimal.
c. 143/TCP - Internet Message Access Protocol (IMAP)
d. 209/TCP & UDP - Quick Mail Transfer Protocol (QMTP)
e. 220/TCP & UDP - Internet Message Access Protocol (IMAP), V 3
f. 465/TCP - Authenticated SMTP over TLS/SSL (SMTPS)
g. 587/TCP - e-mail message submission (SMTP)
h. 993/TCP - Internet Message Access Protocol over TLS/SSL (IMAPS) -
Apple systems use this port.
i. 995/TCP - Post Office Protocol 3 over TLS/SSL (P0P3S) - Apple
systems use this port.
6. Messaging loCs: Intelligence sources do list known email addresses as an
loC. Messages to and from these email addresses are suspicious. You cannot
necessarily prevent a malicious user from sending email inbound, but you
can monitor and communicate to a user who received it. Any traffic to an
email address identified based on a feed from a threat intelligence source
should definitely be investigational, whether it was blocked or not.
7. Significant volume changes: From a threat hunting perspective, the team
should look for volume-based changes, such as a user who rarely sends
attachments suddenly sends a large number of attachments to a competitor
may indícate intellectual property theft or industrial espionage.
8. Autoforwarding: Users who send large amounts of data to their home email
addresses may expose the organization to an unacceptable risk.
9. Email with competitors: Most, but not all, organizations do not routinely
send a large portion of email with a competitor. This pattern is also
subjective, but it may reveal an insider threat.

72
Security Monitoring Use Cases by Data Source

10. Users generating numerous Non-Delivery Reports: This condition may


indícate their account is being used to probe for valid email addresses at a
particular domain or some operational issue.
11. Constant email transmission: Users sending email every hour of the day,
which may indícate something on their system is attempting to use an email
capability for covert Communications.

Email and Web: Interactions with Look a Like or


Doppelganger Domains
Phishing scams sometimes use domains that look very cióse to your domain and
are changed by single letter replacement, fuzzing, omitting a character that the
mind will naturally fill in, or transposing two letters. For example,
blueteamhandbook.com can easily be changed to blueteamhandbook.com,
bluetaemhandbook.com, and numerous other variations that look like the real
domain ñame, with the goal of attacking at least one book author. Also,
domains can be regístered with Unicode characters to support foreign
languages and lead to a homograph attack. Most browsers attempt to mitígate
the exposure - but nothíng is perfect.

You can either develop an inventory yourself, or use a tool like "dnstwist" by
Marcin Ulikowski25. This tool is written in Python, and does have some
dependencies required for it to work properly on the command line.

The goal in using a tool like dnstwist is to generate likely look-a-like or domains
that a phishing campaign is likely to use against your organization. The dnstwist
tool can also be used to check and see if a twisted domain is regístered.

DNSTwist Domain Enabled Use Cases:

1. Investígate Domain Ñame: Pulí out the domain ñame portion of email sent
to/from the organization and compare it to a twist domain. If you have a
match, then drop the email. Even better, do not accept email from a twist
domain.
2. Check twist against browsing: Pulí the FQDN portion of a domain ñame
from web server logs, and alarm if a user visits a twisted domain ñame.
3. Alarm on DNS Lookups: DNS lookups against twisted domains should also
generate an alarm. Truth be told, DNS blackholing twisted domains is
actually a solid protective measure, so that if a user attempts to visit a
suspect site it will be directed to 127.0.0.1 (there is no place like home).

25 https://github.com/elceef/dnstwist

73
Security Monitoring Use Cases by Data Source

Implementing this type of a DNS sinkhole not only protects your users, it
also allows for an alarm condition.

Antivirus (A/V) Systems


Antivirus systems are bread and butter security component, even though
traditional A/V has a tough time keeping up in the digital arms race.
Nonetheless, maintaining a desktop security suite that includes A/V and other
HIPS components should be part of your defense in depth architecture. There
are several conditions should be monitored.

A/V detection: Finding and cleaning a piece of malware on a system, a USB


drive, a CD, an email attachment, download from the web, etc. should be
investigated. Just because the A/V system did its job doesn't mean that the
PC is "clean." The file ñame provides significant clues to the issue. For
example, if the A/V
system removes a file
with "drop" in the
ñame it may mean that
a "stage two" tool
successfully got onto
the system. The
immediate web, email,
and USB device history
right before malware
observance should be
checked. A ".ser" file is
most often a screen
saver, which are often
malicious. Realize that a Screensaver, by its very nature, captures the
username and password. Remember, even though the A/V protected the
user the maiiciousfiie got on the system somehow, so work to answer that
question and examining quarantined files in context.
2. A/V Potentially Unwanted Program (PUP) detection. The ñames of
executables that are classified as PUP's may be early warning indicators,
may reveal insider threats, uncover unauthorized software installation, or
may provide a clue that an attacker tried one tool and then moved onto
another when the first tool failed.
3. Password Dumpers: In particular, if a password dumper was seen and
subsequently quarantined means that someone tried to extract the hashes
from the SAM database, extract passwords from memory on a running
system, or gather passwords through some form of network extraction.

74
Security Monitoring Use Cases by Data Source

Example ñames are PWDUMP#, cain.exe, wce.exe, mimikatz.exe,


mimidogs.exe, and gsecdump.exe.
4. A/V agent status and updatefailure conditions. A/V agents should run
successfully, should receive their updates, and not crash. Example events
can range from an inability to sean a drive, inability to quarantine a file, an
agent that will not start, the agent starts and crashes, it cannot retrieve
updates, or is shutdown unexpectedly. A truly advanced use case can be
developed around the number of active systems in a Windows domain, end
user authentication from a given machine, and successful updates to A/V.
Here, the use case would focus on detecting that an end user machine was
"recently updated" as users authenticate (by ñame, not IP - DHCP and
portable computing would make IP addresses less reliable). If not, then you
could have a machine out of compliance.
5. Repeat A/V offenders or reinfections: Users who routinely get an infection
notice need more attention. This use case is longitudinal in nature, meaning
that you want to know if a machine is being re-infected over time, say 14 to
60 days. You can further differentiate this by end user logged on and virus
ñame or family. Same user means the user is causing higher risk, different
users may mean a kiosk or loaner is the victim PC, and the wider variety of
malware means the greater likelihood that the compromise is more severe.
Here, company policy should address what happens when users cause
repeat security threats to the enterprise. If a user keeps getting the same
type of infection (virus ñame or class), investígate likely causes: USB drive
insertion, particular websites, or maybe even a particular network share.
6. Múltiple Infections in a short time frame: A host that has múltiple different
infections based on the binary and/or the virus ñame may warrant more
rapid attention than an asset with a single virus.
7. Escalation based on Asset Valué: Assuming that your SOC team has valid
intelligence on the valué of an asset, an infection on a "critical" asset such
as one that falls under the Payment Card Industry Data Security Standard
(PCI DSS) warrants more rapid attention than an asset with a "médium"
valué.
8. Anti-Virus Infection notifications in cióse proximity to spyware, malicious
site, email attachment file open events: The A/V system may be the
indicator that gets your attention. Here is where an EDR application can be
very informative and assist in a rapid MTTD. An analyst should be able to
lócate the file ñame and then determine how the user triggered the
malware. Common methods are opening an attachment, inserting an
infected USB drive, or visiting an infected website. If the source is email,
then make an immediate search for other users who received the same
email by subject line and sender, then seriously consider forcibly removing
this email. In the case of a particular site, block access.

75
Security Monitoring Use Cases by Data Source

9. A user with Elevated Access logging into an infected system: This use case
requires that you maintain an inventory of users with elevated access, or
that all of these users have a particular naming convention so elevated
access accounts can be more readily detected. If they login to an infected
machine, then at a minimum an automated notification advising them to
change their password is in order.

Windows Defender has its own set of Event ID's. In many cases, Windows
Defender also ineludes a suspicious file ñame, a unique threat ID, a severity
rating, file path, error codes, and other useful details. These details are
representativo attributes to use when building dashboards, alarms, and reports.

Table 8 Windows Defender Application and Services Logs\Microsoft\Windows\Windows


Defender\Operational and System Log

EventID Ñame
1000 An antimalware sean started
1001 An antimalware sean finished
1002 Sean stopped (canceled) before finished
1005 Sean terminated due to error
1006 Detected Malware
1008 Action on Malware Failed
1010 Antimalware could not restore an ítem from quarantine.
1115,1116 Malware detection
1117 Malware remediation or action taken
1119 Remediation error
2001 Failed to update signatures
2003 Failed to update engine
2004 Reverting to last known good signatures
3002 Real time protection failed

Appiácation Whitelisting
The security posture of the endpoint is more important than ever before. These
systems are designed to monitor executables running on a system when running
in detection mode. When running in prevention mode, they will stop running an
executable that doesn't match an approved policy. Application whitelisting can
identify several adverse conditions. Further, since this capability records end
user activity, it can be very useful in an employee investigation case.

1. Unauthorized Installation: Recording a setup or install process should


provide the identity of the user installing the software if it set up Services,
and the directorios where the software installed. The installation process

76
Security Monitoring Use Cases by Data Source

can be checked against an authorized change in the change management


system to determine if it was authorized. Also, application whitelisting can
help detect if the system's integrity was violated.
2. Unauthorized drivers: When a user inserts a USB device into a system, the
OS will respond to that notification event and attempt to install the
appropriate driver software. Removable storage devices can be an avenue
for data exfiltration or can provide an avenue for malicious software
entering the environment.
3. First Observed Binary: Introduction of a new binary in the environment may
indícate an adverse condition. A user running several new binaries on their
system may also be of note. An application whitelisting suite may not
cleanly detect this specific ítem, unlike an EDR platform. Alternately sysmon
can be deployed because it will generate a file hash as each process is
invoked. Note that first observed binary analysis needs to be done by file
hash, not file ñame because while the ñame may change, the hash will not.
You may need to instrument some other method, such as manual review to
realize this condition.

Windows has two facilities of note: AppLocker for Windows 7 and above, and
Software Restriction Policies for Windows Vista and below. Both technologies
record control binary usage. AppLocker events in the AppLocker event log and
can be enabled using group policy. There are at least sixteen different events
recorded. EventID's 8020 to 8027 are focused on package deployment issues, so
they are not listed here.

Table 9 Windows AppLocker: Application and Services Logs\ Microsoft\ Windows\


AppLocker

EventID Level Ñame


8000 Error Application Identity Policy conversión failed.
This condition indicates issues applying policy
to the system.
8002 Information FileName was allowed to run.
8003 Warning FileName was allowed to run but would have
been prevented if policy enforced. (EXE's).
Audit Only.
8004 Error FileName was not allowed to run.
8005 Information FileName was allowed to run.
8006 Error FileName was allowed to run but would have
been prevented if policy enforced.
Audit Only. (Script/MSI's).
8007 Error FileName was not allowed to run (by policy).

77
Security Monitoring Use Cases by Data Source

Command and Contro!


There are several methods to detect command and control, aside from using an
loC list, an IDS rule, or a domain block list. Rather than duplícate information
here, please refer to the Threat Hunting chapter for a discussion on command
and control as described on page 178.

Data Loss Prevention (DLP)


DLP systems come in two broad types. Data in Motion systems are deployed to
monitor traffic as it moves through a system and the network. Data movement
can be email, FTP, copied to a USB orCD, saved off to cloud storage, copied to a
network share, and therefore DLP needs to be integrated into an OS level
Service. For example, DLP software that analyzes email is plugged in to the
messaging pipeline as a Message Transfer Agent (MTA). Data at Rest systems (or
agents) find files of interest-based searching, such as a a file share or a web
repository like SharePoint.

Once alerts from a DLP system reach a certain threshold they may need to be
investigated by the proper internal team. Realize that in some organizations,
DLP events can be quite normal. For example, sensitive patient data is routinely
handled at a hospital or an Insurance company. If a user emails ten
spreadsheets and the DLP system intercepts them and encrypts them for
delivery to the recipient, the user may be aware this is normal and wanted that
action to occur because they are trusting the DLP system.

Before going too far down the road with DLP, the SOC team will need to
determine if there is a more applicable ¡nternal team who are a better business
fit than the SOC (there should be...). For example, an Insurance company likely
has a Member Privacy team, or HR may perform investigations using the DLP
system. One argument in favor is that by sending in alarm data to the SIEM, an
end user activity report will have a more complete picture to present during an
employee investigation. Further, since the DLP system identifies potential IP
loss, an analyst can incorpórate these alerts to gain a better picture of end user
activity and notify the appropriate internal team if warranted.

Regardless, in motion DLP systems identify data exfiltration, whether intended


or not. At rest DLP systems identify where valuable data resides. Today,
attackers are interested in the data, because that's where the money is.

Domain Ñame Services (DNS)


Gathering DNS data presents a few data collection and data reduction
challenges that you will need to work through. DNS detection requires detecting

78
Security Monitoring Use Cases by Data Source

ñame queries that are outside of the norm and being able to detect the true
source IP address ifat all possible. One issue that will prove to be difficult is a
lack of infernal reverse DNS lookups and stale DNS entries. If you can't reliably
lookup an IP to a ñame, there will be a small impact on alarm Processing time.
The situation can be a bit worse when an IP comes back to múltiple systems.

Collecting DNS: Collecting DNS from a DNS server can be problematic. For
example, Windows DNS requires that you enable "debug logging", and then
fully parse that data through a either a local or remote file reader process.
Another problem with DNS is that most (90%+) of the traffic on the network are
local queries. Local queries are normal. When considering how to collect DNS,
focus on collecting internal to external queries, find where those queries are
resolved, and collect data at that point using network extraction as the
collection method. If there is a mirror port available at the perimeter, DNS query
and responses will be logged/rom the internal DNS server(s), as they are
forwarding queries on behalf of the end user. If you collect DNS traffic via a
mirror port on the same switch as the DNS server, you will collect a significant
amount of normal query traffic for the intemal network that will have low to no
valué for identifying attackers. There are at least 30 defined record types
available for use, with the more common being A, CNAME, PTR, SPF, AAAA, NS,
and MX. TXT records are seen, but in low volume. There are at least two well-
known tools to collect DNS: PassiveDNS and Bro IDS.

DNS Monitoring Use Cases and Detection Patterns:

1. Young (< 7d oíd) or recently registered domains (and thus, websites):


Malware is increasingly using sophisticated DNS lookups and query types to
signal their command and control network. Attackers, and in particular
Phishers, are using recently registered domains as spreader points.
Techniques vary in exactly how recently created domains are used for an
attack. Domains that are less than a week oíd are more likely to host
malware than established domains. If the "Created on" or "Creation Date"
field from a whois lookup is less than seven days, look very closely at the
domain registration details. As an example, on 11/05/17, a check of
domainpunch.com found 85,794 dot-com domains registered on the prior
day. There are also several sites that provide lists of newly registered or
expired domains, every day, usually for a charge. Examples inelude
whoxy.com, whoisxmlapi.com, domainlists.io, domains-index.com, etc.
2. Ñames not in the Top 1 Million List: As described on page 112.
3. Long, misshapen, or weird second level domain ñames: Most second level
ñames should be less than 24 characters. DNS ñames have a máximum of
255 characters in total. In practice, some analysis should be performed on
DNS ñames that are 72 characters or longer. Really long ñames (>128

79
Security Monitoring Use Cases by Data Source

characters total) and continued query/response is most likely DNS tunneling


or a DGA. You will need to establish these two thresholds for your
environment.
4. Hexadecimal Domain Ñames: Domain ñames should be readable by people;
after all, they are designed to help people lócate resources. Hex is not
usually human-readable26. Malware uses Hex valúes as beacons, may have
Base32 encoded commands disguised as a ñame component, and usually
require specific query and answer resource records set to specific valúes.
Examples include FrameworkPOS, FeederBot, Morto, etc. Based32 encoding
is used because the characters in a DNS ñame are effectively limited to 37
possible unique characters.
5. TXT Records/Lookups: DNS can provide freeform lookup information from
a domain. Historically, the most common uses for TXT records are to help
valídate email delivery with Sender Policy Framework (SPF). Other normal
uses are DomainKeys (DK) and DomainKeys Identified E-mail (DK1M).
Query/response outside of these purposes is not normal, and further,
¡Ilegible data in a TXT query or response is suspect. In contrast to ñames, the
data returned from a TXT response can be Base64 encoded.
6. SRV Records: Server Resource Records are used to define a network
location for a server that provides a specific Service. They are actively
queried by infernal Windows systems within AD for many resource types.
From the Internet, they are commonly used for communication-oriented
Services like SIP, email, some games, Session Traversal for NAT (which, in
turn, support real-time audio/video/messaging), among other Services.
Again, you would want to establish a "normal" baseline and then be advised
of "new" Services queried. Also, a high volume of different quedes to a
particular DNS site where the request/response types are not the same type
of lookups would not be normal.
7. Prívate IP addresses returned: Ñame server quedes to Internet sites should
rarely return prívate (RFC 1918) IP addresses. NetGeads "routerlogin.com"
is one of the few examples of a prívate IP returned from your loca! DNS.
8. TXT without A Records: A direct query for a TXT record without a preceding
A record lookup is not normal. Further, domain ñames that don't have A
records that support their TXT and SRV records is also not normal.
9. Long TXT record quedes: Assuming that you can monitor for query types,
excessive queries or long queries returned from an Internet server may be
used for command and control. Look for Base64 encoded data. TXT records
are used for SPF, so they do occur. Tools known to use TXT records inelude
dns2tcp or DNScapy.

26 Note, though that you may see DE:AD:BE:EF:CA:FE on the network. And there are a few humans
who can natively read hexadecimal network traffic.

80
Security Monitoring Use Cases by Data Source

10. Look-a-Like or fuzzed domains: Review the section Email and Web:
Interactions with Look a Like or Doppelganger Domains on page 73 when
working through DNS use case development.
11. DNS queries not from authorized servers: An enterprise should only have a
small number of internal DNS servers that can forward queries to servers on
the Internet. Any DNS query outside of this boundary should be
investigated, if for no other reason that ensuring the sender is properly
configured in order to provide operational assurance.
12. Volume and volume profile changes: Establish a baseline profile for DNS
traffic. These indicators can become alarm conditions once baselines are
established. Examplesare:
a. Average queries per hour during working hours/off hours.
b. First time use domain queries (new domain ñame seen).
c. Volume of SRV RR, TXT, and MX queries.
d. Interna! failures - lookup for domain fails.
13. Ñame analysis: High volume queries with hostnames that are random for
the same 2nd level domain and the same length indícate a DNS tunneling
tool is sending data to the attacker's site, because the DNS server is
consuming the host ñame as encoded data.
14. Foreign countries: You should study your organization's communication and
operating model to determine how much communication occurs to
countries outside of your own country. For example, a University with a
varied foreign student population would considerthis normal, but an
Insurance company that operates in a few States in the US would consider
several q ueries to foreign countries abnormal. Note that if you are reading
this book and you are in a foreign country, queries to ñame servers in the
US and several European countries may be very common and may make this
analysis more difficult.
15. Queries to Dynamic DNS providers: There are several dozen dynamic DNS
providers operating today27 who provide nearly free or inexpensive ñame to
IP DNS resolution. A common model is for a home user to register their IP
address and allow certain Services through, with a ñame unique to them
that their ISP would not provide. For example, a VPN client. Attackers can
easily use these Services as an avenue for hosting malicious Services such as
C2 DNS Service because DDNS providers allow for rapid changes of a ñame
to an IP address and can be used at nearly no cost.
16. Abused Top Level Domains (TLD's): Spamhaus maintains an ever changing,
evidence-based inventory of the top ten most abused domains ñames28,
which is expresses as an aptly named "badness índex". Integrating this

27 Lists inelude : http://dnslookup.me/dynamic-dns/, GitHub: Nate Guagenti / neu5ron, and


http://mirrorl.malwaredomains.com/files/dynamic_dns.txt (3/26/18)
28 The interactive list is available here: https://www.spamhaus.org/statistics/tlds/ (8/18/18)

81
Security Monitoring Use Cases by Data Source

functionally into the SIEM may not be practical, but integrating a check of
the domain TLD into the incident response process and the analyst checklist
certainly is. As of June 28, 2018, there are 1,503 TLD's.
17. Traffic to external IP without DNS query: Direct HTTP, HTTPS, FTP, SSH, and
likely other protocols directly to an IP address is suspicious. It is not
common for an end user to type in https://#.#.#.#/. With whatever method
you have, review which end systems are communicating outbound directly
to an IP without a ñame. A caution: a reverse lookup could be performed,
with some risk of alerting the site owner that you are trying to get a ñame
for an IP. It is best to use an intermediary, like a cali to any site that offers a
NSIookup function. (Root DNS servers don't count!)
18. Use of non-authorized DNS: There are several free DNS Services available
on the Internet other than the DNS that the sites ISP provides. Queries to
these DNS servers, such as Google's al 8.8.8.8 and 8.8.4.4, may indícate a
condition that needs resolution.

End Point Detection and Response


Entering the desktop protection field are highly capable software platforms
focused on threat hunting for the endpoint. Vendors include FireEye Endpoint
Security, Carbón Black Cb Response, Guidance Software EnCase Endpoint
Security, Cybereason Total Enterprise Protection, Tanium, CrowdStrike Falcon
Insight, and CounterTack Endpoint Threat. The Gartner 2018 magic quadrant for
this space lists more than twenty vendors.

Here, alerts from an endpoint protection system can have a high "signal to
noise" ratio because those alerts are pre-validated. These events occur because
a binary matched an entry on a known feed list or matched a detection
criterion. By consuming and analyzing endpoint protection system alerts only
(and not all endpoint activlty), you can actually perform threat intelligence and
assessment on the user population. If users are predominantly becoming
iníecled and Ihen bringing their notebook back into work, or if users insert USB
drives in their computers and there is infectionware on the USB, then SOC has a
better-defined path for security awareness training and remediation.

Microsoft's EMET is a free tool from Microsoft. It can perform a similar


detection and mitigation, but it logs locally and doesn't have a comprehensivo
consolé like the vendors listed above. According to the EMET 5.5.1 User Guide,
EMET reports to the local event log, so if there isn't a method of consuming that
log then these events would stay on the system. The application whitelisting
policies, which are also configured with Group Policy are excellent candidatos
for forwarding to the SIEM.

82
Security Monitoring Use Cases by Data Source

In order to get data into the SIEM platform for SaaS EDR, some form of
encrypted syslog Service or some sort of REST API needs to be configured to
send data into the SIEM. Do not send all endpoint data that the EDR platform
collects to your SIEM. Rather, send the "condition detection" data to the SIEM.

End Point Detection Use Cases:

1. loC hit: loC hits when the EDR


system detects a connection to a
suspicious or nefarious IP address or
domain ñame, or a file that matches
a known bad by hash valué.
2. Binary first observed: A "first
occurrence" of a binary, never seen
before in the environment, once
baselining is done can detect
unauthorized software installs,
malicious software, unauthorized
downloads, or software executing
from removable media.
Hash Checks: A locally configured
alerting list, such as a hash valué of a particular binary. These don't have to
be malicious. For example, you could have a deception system or practice in
play to detect if a user opens or copies a "Top Secret look alike" file (this is
an example of a HoneyToken).
4. ASEP Registry Key: Modification of a specific registry key used to establish
persistence, such as the Run, Run Once, or RunOnceEx.
5. Printer: Also, the printer key can be configured to load an arbitrary DLL:
(HKLM\SYSTEM\CurrentControlSet\Control\Print\Monitors).
6. Specific directories: Modification of directory or file within a file system.

Data islands or System Snowf8akes


For Windows, there are at least four security contexts that need to be
considered for Microsoft Windows systems. Domain, domain member server,
domain member workstation, and standalone workgroup systems. Each of these
systems need to provide data for the SIEM. For Linux, most systems are their
own source data island, which usually provide data via syslog. For an application
its own localized and application specific audit system is yet another data island.
those systems represent a challenge - little to no standards, they may not even
generate enough logs, and they will require greater customization. However,
this last group contains the ítem the attackers want most - your Crown Jewels.

83
Security Monitoring Use Cases by Data Source

Windows Account Life Cycle Events (ALCE)


These events record new accounts, account modifications, deletions, disable,
and enable changes. These events all have discrete event IDs, record the
account that was changed, and the user who made the change. Account
management should be done by a specific account management team with a
supporting request and authorization process. Also, realize that any Windows
system other than a domain controller can have local accounts defined and
therefore used. This is a common requirement for most IT General Controls
programs (ITGC).

Table 10 Security Log: Account Management Events

EventID Ñame
4720 A user account was created.
4722 A user account was enabled.
4723 An attempt was made to change an account's password. **
4724 An attempt was made to reset an accounts password. **
4725 A user account was disabled.
4726 A user account was deleted.
4738 A user account was changed.
4781 The ñame of an account was changed.
** These events should be monitored differently.

Account Lifecycle use cases:

1. Short cycle account create and account delete events: This use case
catches accounts that are created and removed within a very short time
window. As a bonus, the severity would be raised if the account was used,
such as a logon event between the create and delete event.
2. Short Cycle elevated group add and group remove events: This use catches
accounts added to highiy priviieged groups like "Domain Admins" and then
quickly removed from the group. Extensive damage can be done in a short
time.
3. Accounts created/modified/disabled by staff other than designated
account managers: This condition helps identify policy violation, rogue
admins, or attackers who gain access to a domain admin level credential. In
more mature organizations there will be a few IT that manage user accounts
and group changes. When this situation exists and others manage accounts
there may be a policy violation, a social engineering event, or true
maliciousness.
4. Accounts managed by users other than the designated Service account: A
Service account is used by a mediated access application such as NetlQ DRA,

84
Security Monitoring Use Cases by Data Source

SailPoint, PeopleSoft AD integration, and CyberArk. Mediated access


applications handle account and group lite cycle events by Processing a
request through a workflow system and using a specific account to
implement the change on a domain and/or member system. This alarm
assumes that one Service account used by a specific application is used to
manage accounts and any ALCE outside of that realm is unauthorized.
5. Accounts that do not follow an established naming convention: Detection
can be accomplished through regular expression pattern matching or
account length checking. In the weakest case, simple account ñame length
checks may work, or a daily human review of accounts created, enabled,
disabled, or removed from the network and the AD forest. In the more
sophisticated case, an organization will have a naming convention that
supports a regular expression check to determine if accounts follow a
pattern such as MXitmm#, SVC_*, U######/í, or DA#####t. Note that this
should also be generalized to workstation additions to the domain, where
only workstations that follow a naming convention are allowed to be added.
6. Account deletion: Some environments may choose not to delete account.
Rather, they change the password to something very complex, remove all
groups (¡ncluding Domain Users), and disable the account permanently. This
method allows for Security IDs in the NTFS file system to resolve to an
account ñame, but effectively prevents account usage. Also, this method
allows for a re-enabled account for an account held by a user who
terminated can be more easily identified when these accounts are added to
a tracking list.
7. Accounts created and disabled or deleted this week for new users and
users termínate employment: This use case can be satisfied with a daily and
a weekly report for all systems where user accounts are created, like an
active directory domain or a constituent application.
8. Accounts used prior to the authorized use date: It is common for
organizations to create accounts ahead of a new users' arrival. These
accounts should not be used before the user arrives. This use case implies
that the organization has a method of connecting account creation, account
ñames, and the relevant dates in order to conduct this analysis.
9. Accounts created outside of the domain context: There are two cases here.
One case is when accounts are created on a workstation - this should be a
very rare occurrence. The other use case is when accounts created on
member servers. Local accounts may be needed for a specific application.
These use cases look for ALCE's not from the set of domain controllers.
10. Observed default accounts/credentials: Default accounts should not be
observed, because use of a default is inherently not attributable to a
person, and a default account starts its life with a default password. There

85
Security Monitoring Use Cases by Data Source

are several websites with lists of default accounts and passwords29. Default
account usage has been in the OWASP Top 10 for several years.
11. Local account creation and elevated access. One of the key tenants in an
Active Directory domain is centralized authentication. In reality, practices
vary widely when it comes to local accounts. For example, an organization
may grant administrative access to a workstation by creating a local account
for the workstation user to accomplish an administrative task, or a domain
level account and then add that account to the local administrators' group.
The SOC should understand how elevated access is applied and be able to
detect elevated account usage.

Advanced Monitoring Rules and Alerts Which May Require External


Scripting

1. Accounts created in a constituent system which do not match an account


in the primary directory. This type of rule requires that systems managed
with local accounts need to be cross indexed with the primary directory. In
order to have this rule function, some sort of lookup in AD would be
required and a constituent system will need to push an ALCE events to the
SIEM. This use case will mature into having an artificial identifier added to
all constituent systems and the directory itself, such as an employee ID so
that each account can be attributed to a single account holder.
2. Post ALCE events to an account tracking database. For example, an Identity
Management system which has more metadata about an account than a
directory holds. Since the SIEM gets the actual event as they occur, it would
be a better use of system resources to push that event to the IdM rather
than have the IdM deploy yet another monitoring agent to the domain
controller.
3. One over One30 manager notification on creation, modification, or disable.
This notification would normally come from an IdM, because while AD user
accounts do have a manager field, that field may not be populated. If and
IdM is not in place, and the manager attribute is populated at the time of
account creation in the directory, then a notification would allow for the
user's manager to know when the account was actually created.

Windows Group Life Cycle Events


Windows has two group types, security and distribution, and four group scopes
local to a system, universal, global, and domain. The scope defines how the

29 For example, the aptly named www.defaultpassword.com, or defaultpasswords.in.


30 If you haven't heard of this term, it means that each person's direct supervisor, the one who is
responsible for annual review and pay action is informed. It does not mean a dotted line
relationship.

86
Security Monitoring Use Cases by Data Source

group is used in an AD forest, while the type defines the intended usage. This
model translates into dozens of event IDs that need to be monitored. In practice
AD groups are often used to control access to a resource that must be
monitored, such as a directory where financially significant data is stored in a
publicly traded company31. Further, there are several groups within Active
Directory that provide elevated access. Even worse than that, groups can be
nested. For example, an organizaron may have an "Administrative Service
Accounts" group embedded within the Domain Admins group. If you are only
monitoring the Domain Admins group, you will miss a user being added to or
removed from a nested group which has Domain Administrative privileges.
When monitoring group changes with a SIEM, a subset of changes may prompt
a notification. There are many other means to support the intent of a control,
such as creating a report for group membership through scripting out a report
or using a purpose-built application.

AGDLP is an abbreviation for "Account, Global, Domain Local, Permission that


summarizes the recommended method by Microsoft to provide Role Based
Access Control (RBAC) with any resource that can leverage Windows
authentication and Active Directory. User accounts should be members of
Global groups, which are then assigned to Domain Local groups. The DL group
should describe the access permission and be applied to the specific resource.
Depending on where the resource group is in the forest and the group's scope
within the forest, a different security event is created on a domain controller
within the forest. Given that there are so many event IDs from the same source,
the IDs are summarized below in a table rather than listing them out once per
line.

Table 11 Windows Events: Group Changes (Security Log) (VI.02)

Security Distribution
Local Global Universal Local Global Universal
Created 4731 4727 4754 4744 4749 4759
Changed 4735 4737 4755 4745 4750 4760
Deleted 4734 4730 4758 4748 4753 4763
Member 4732 4728 4756 4746 4751 4761
Added
Member 4733 4729 4757 4747 4752 47620
Removed

31 In the United States, this requirement is derived from Sarbanes Oxley Controls.

87
Security Monitoring Use Cases by Data Source

Group Based Monitoring Alerts


NTFS access is normally managed by applying a domain local group to the
resource on the directory itself. Also, a group can be used at share level itself to
apply permission. If you compare permissions to your front door, share
permissions are like a screen door and NTFS permissions are like a bolted
security door.

Changes to a select set of NTFS and application control groups may be in order.
When you create a SOC alarm or an email notification for a resource owner
make sure that the message explains what the group Controls access to -
meaning the resource Itself or the application right managed by the group.
Don't automate a notification that a user was added to "NTFS_g45_Direc_RW".
Instead, find out the purpose and path of the directory and us that instead. for
example, "Shared Drive, Monthly Financial Summaries, path ñame
\\Storage\NY\FinRep_Monthly" for the monthly financial performance reports
for the NY business unit.

Applications can also use Active Directory groups to control access by mapping
an AD group to an ¡nternal role within the application. Following the same
example, a notification that States a user was added to "PeopleSoft Production
Admin Members" is betterthan "AppCtlFinPplAdminsPRD".

Special Group Changes


Windows 2008 introduced a new monitoring capability called "Special Groups"
which is used to record when someone who is in a set of defined groups logs
into the network. The event ID is 4964. In effect, this event ID records when
accounts that are members of any administrator defined group logs in. This
function can be used to monitor Service accounts, members of elevated access
groups, users who have put in their notice, or any other security focused valid
reason. In order to implement this, assign the Security Group ID field to the
registry key
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Lsa\Audit as a semi
colon delímited list. The default administrative groups should be added to this
key, such as Domain Admins (S-l-5-21domain-512) and the local Administrators
SID (S-l-5-32-544). For more information, see Microsoft article 947223. If you
couple this feature with Windows Event forwarding on the workstations, then
there is an audit record when any user who is a member ofa group that
provides someform of elevated access logs in.

Account Usage Events


Standalone System: When a user logs into a standalone system, there are a
distinct 4624 Type 2 and 4776 events written to the security event log

88
Security Monitoring Use Cases by Data Source

(assuming that the policy is set or the system is Windows 10 or Server 2016.)
The "Security ID" field is set to the local system ñame and the local user
account. The account domain is set to the local system, meaning that the SAM
database used to authenticate the user is one resident on the physical system.
By default, a standalone system has the default valué "WORKGROUP" in the
Account Domain field unless the NetBIOS workgroup ñame is changed. For a
standalone system where someone logs in with a Microsoft account (a cloud
account used to connect the system to the Windows Store and other identity),
such as a home user's system, the process is similar to a standalone login.

Domain System: This process is different when a user logs onto a workstation
and the user is authenticated from the domain. There is a 4768-event written to
the DC that authenticates the user, a 4624 event is written to the local security
log with the domain ñame in the Security ID and the Account Domain fields.
When users termínate their session, there will be a 4647 followed by a 4634
event. As users authenticate to Windows file shares, there are 4624 Type 3
events record on the serving system. As users authenticate to other Kerberos
integrated Services, there are 4769 events registered on the DC.

Table 12 4624 Logon Types

EventID Level Ñame


4624 Informational An account was successfully logged on. The
Logon Types are:
2: Interactive (keyboard/screen on system)
3: NetWork (shares)
4: Batch or Scheduled Task
5: Service (services applet)
7: Screen Unlock
8: NetworkCleartext
9: New credentials such as using RunAs
10: Remote Desktop or Terminal Services or
Remote Assistance
11: Cached credentials (off domain)

89
Security Monitoring Use Cases by Data Source

Table 13 Other Logon Events

EventID Level Ñame


4740 Informatíonal A user account was locked out
4624 Informational An account was successfully logged on
The process ID in a 4624 event can tie into
4688 events.
4634 Informatíonal An account was logged off
462532 Informational An account failed to log on.
Note: your SIEM solution should supply you
with the underlying reason in the alert, based
on the subcodes listed in Table 14 Account
Logon Failures Status Codes for Event ID 4625.
4648 Informatíonal Logon attempted using explicit credentials
(RunAs, scheduled task runs as a specific user,
uses altérnate credentials, and a user runs a
program requiring admin rights and User
Account Control enabled.)

Account Lockouts (Security: 4740,4625/0xC0000234): The applicability of this


alarm will depend on the time of day and the account time. Numerous lockouts
Monday morning are "normal", while account lockouts significantly outside your
organizations normal working hours are suspicious. Also, monitoring account
lockouts conditions are an example where SOC can provide operational valué.
For example, if a Service account is locked out, then the application itself is very
likely down, degraded, the Service capability is under attack, or a script is
misconfigured.

Account Logon Use Cases:

1. Concurrent consolé logons (4624, type 2) from múltiple sources within a


short timeframe: This condition indicates an account is being used from
múltiple systems. For most users, any count above two is out of the
ordinary. For example, an instructor in a classroom may make a change to
all the classroom PCs, but an accounting staff member is unlikely to logon to
three PCs at once.
2. Logons from internal and externa!, within a short window, not over RDP
(4624, type 10): This condition may indícate account misuse, credential
theft, account sharing, or a behavioral issue. Note that to make this use case
effective, you will need to correlate specific account types that indícate user

32 For Windows 2000/2003/XP, the Event IDs are 529, 530, 531, 532, 533, 534, 535, 536, 537, and
539. Hopefully you will not need this information @.

90
Security Monitoring Use Cases by Data Source

presence such as a PC consolé logon and a VPN login, not an RDP login and a
VPN login.
3. Geographically improbable VPN Logins: Modern VPN systems can provide
country and city of the source IP for a connecting IP, or the data can be
enriched with geo lookup data as it arrives at the log collection point. More
sophisticated platforms like Splunk actually have built-in functionality to
detect geographically improbable access. This means that a user logged in
from one location and subsequently logged in soon after from another
location that they could not travel to in the time allotted. For example, from
France at 10 AM and then Cañada at 10:15 AM, same day. If your SIEM
doesn't have this functionality, then as a simple check, check the country
code to make sure it matches the country for your user population.
Depending on the user population, this check may reveal a compromised
account. For systems that have a tracking list functionality, check the
current login State or province and country with the prior login. If they are
different and an insufficient time has elapsed from the prior login to allow
for travel, there may be a problem.
4. Network Switches/Routers/Devices: These devices represent the network
infrastructure beneath the network operating system and application stack.
Thus, they must be kept secure. Even in large corporations, there are often
a small group of well-known users that access the network support fabric.
Regardless ofthe authentication method (RADIUS, TACACS+), if a user not
from this group attempts to access network hardware, an alarm should be
raised due to an unauthorized user access attempt.

Account Lockout use cases:

1. Lockouts that origínate from an externa) source: Once lockouts reach a


certain level from an externally facing login point, such as a Citrix remote
desktop server or a VPN server there is external password guessing in
process.
2. Rhythmic lockouts: Múltiple periodic or rhythmic account lockouts can
indícate password guessing, a Service with outdated credentials, or a script
attempting to logon with outdated credentials.
3. Múltiple lockouts from different sources: These events will occur when the
Workstation Ñame and/orthe Source Network Address are different. If the
count of unique sources is greater than 2, investígate why. The various
reasons behind an account logon failure and the status codes. Review the
table below to create more specific alerts. For example, a few events with a
code of 0xC0000064 or 0xC000006A simply indícate user error. However,
varyíng unique user ñames from the same source workstation ñame and/or
the source network address indícate that there is account reconnaissance in
progress.

91
Security Monitoring Use Cases by Data Source

Table 14 Account Logon Failures Status Codes for Event ID 4625

Status/Sub Status: Ñame


0xC0000064 User ñame does not exist
0xC000006A User ñame is correct, but the password is wrong
0xC0000234 User is currently locked out
0xC0000072 The users account is currently disabled
0xC000006F The user tried to logon outside time of day
restrictions
0xC0000070 Workstation restriction, or Authentication Policy Silo
violation, which needs to be correlated with Event ID
4820 from a DC
0xC0000193 Account expired
0xC0000071 Account has an expired password
0xC0000133 System docks too far out of sync (DC to PC)
0xC0000224 User is required to change password at next logon
OxcOOOOlSb User has not been granted the requested logon type
on the specific machine

Service accounts, interactive logins: Once an application that requires a Service


account is stabilized, Interactive login (RDP or Interactive) should not be
required. If a Service account is used for Interactive or RDP logon and the
designated account holder cannot explain its use right away, the account is
compromised.

Microsoft Routinq and Remóte Access


Microsoft has had dial in and VPN functionality since at least NT 4.0. There are
dozens of RRAS events which provide quite a bit of logging. Today, chances are
that remote access is provided with a different VPN technology. If you are using
RRAS, be aware that the tracing level logs need to be enabled using the RRAS
consolé, and they are written to %windir%\tracing.

Normal logging needs to be enabled in the RRAS consolé. Choose "Log all
events". Normal events are written to an actual log file:
% w¡ nd i r%\syste m32\LogFiles

Monitoring Jump Boxes


One relatively inexpensive technique that can provide a high degree of security
by limiting access into the server farm using RDP or SSH so it only comes from a
jump box farm. The concept is that RDP, SSH, and other direct logon capabilities
are blocked into and out of server segments unless they origínate from a specific

92
Security Monitoring Use Cases by Data Source

set of jump box resources, those resources are located on their own routable
segment, and only designated accounts can login to the jump boxes. Once the
jump box farm is setup, deny inbound access into all other server segments for
port 3389 (RDP) for Windows and 22 (SSH) for Linux machines. Some systems
may use VNC on port 5800/5900, or Xll Services on port 600X. Inelude any
other remote desktop equivalent not listed here.

Jump boxes are an excellent example where an EDR applications can really shine
because they provide highly granular awareness of EXEs launched, network
connections, registry activity, and so forth. All jump boxes should have
WEC/WEF enabled.

Jump Box uses cases:

1. Remote connections: Any remote access management attempt into the


server segment(s) not from a jump box or the jump box network segment is,
at best, suspicious and disallowed if at all possible.
2. Limited exe's: A limited set of executables should run on the jump boxes.
Any executable outside of what's needed to permit management to and
from the server segment needs to be run down. Not only that, the inventory
of executables running on these systems should be very stable.
3. Limited user access population: Only a specific set of users should be
logging into them, so login attempts outside of that group should cause an
alarm. If these user accounts are in a specific AD group then that group can
feed a list within the SIEM. If the user accessing using RDP or SSH isn't in this
list, raise an alarm.
4. Service accounts: Service Accounts should not be logging into a jump box.

Table 15 RDP Events from Applications and Services Logs -> Microsoft -> Windows ->
TerminalServices-LocalSessionManager

EventID Ñame
21 Remote Desktop Services: Session logon succeeded. Records user,
session ID, and source address.
22 Remote Desktop Services: Shell start notification received.
Records user, session ID, and source address.
23 Remote Desktop Services: Session logoff succeeded. Records user
and session ID.
24 Remote Desktop Services: Session has been disconnected.
Records user, session ID, and source address
25 Remote Desktop Services: Session reconnection succeeded.
Records user, session ID, and source address.

93
Security Monitoring Use Cases by Data Source

1101 Remote Desktop Services: Session logon succeeded. Records user,


session ID, and source address.
1103 Remote Desktop Services: Session logoff succeeded. Records user
and session ID.
1104 Remote Desktop Services: Session has been disconnected.
Records user, session ID, and source address.
1105 Remote Desktop Services: Session reconnection succeeded:
Records user, session ID, and source address

Table 16 RDP Events from the Security Log

EventID Level Ñame


4624,type 10 Informational An account was successfully logged on.
This is a generalized event.

Network Hardware Devices and Appliances


Network hardware such as switches, routers, access points, storage systems, IP
cameras, door controllers, acceleration servers, load balancers, and all sorts of
appliance-oriented systems are initially configured with default local accounts
like "admin" or "root". All of these devices must be configured to centrally log,
must be configured to use centralized time Services, and most o/them should be
configured to use a central directory such as Active Directory vía LDAP for user
authentication. There may be some systems that have a characteristic that can
justify a local account database, but it is unlikely that there is a solid reason why
those systems should not centrally log.

Network Hardware Use Cases:

1. Identify Network Hardware: You can achieve this objective (or at least
make progress towards achieving it) by scanning the network with nmap,
looking for a response from ports 443/TCP, 80/TCP, and possibly 22/TCP. If
systems are responding and not generating a log record, or at a mínimum
not seen as a source client IP for authentication on an AD DC, then an
appliance of some sort is identified. Next step is to identify the system and
determine if it can log, and should be configured to log.
2. Collect authentication and change activity: Depending on the device, you
may want logging from them. At a mínimum, you want change activity, user
logins (success and failure), and system reboots.
3. Monitor for default account attempts: Logins to network hardware should
be monitored so that default accounts like admin, root, and supervisor are
not used.

94
Security Monitoring Use Cases by Data Source

4. Monitor for outbound traffic: Most network devices should generate very
little outbound traffic outside of a few specífic sites, which are most often
for contení or system updates. Alarm conditions will vary. For example,
tuned alerts if a piece of hardware makes DNS requests or communicates to
sites outside of a small list. Alternatively, review outbound traffic from a
piece of hardware periodically to detect anomalies. Items of note include
NTP requests, DNS requests not to the local DNS, and software updates
from vendor network ñames.

Printing
Print servers can be configured to record when a user prints a document, the
document ñame, and document size. You are more likely to need to enable print
job monitoring through event log forwarding to support long term employee
investigation. In order to support any assertion other than "User X printed a job
to printer Y at time Z", you will need to enable supplemental auditing33 to
capture the print job ñame and conduct a forensic examination of the
workstation or have an EDR application in place in order to fully support this
degree of attribution.

Based on empirical observation, the Windows Print event order is: 800 -> 801 ->
311 -> 842 -> 804 -> 307 -> 802. Of these, the most relevant are 311 and 307.

Table 17 Windows > PrintService > Operational

EventID Level Ñame (based on empirical observations)


307 Informational Print Document owned by user (identifies
user, print server ñame, printer, and if
auditing is enabled, the file ñame).
800, 801 Informational Print Job Diagnostics (spooling)
311 Informational Printing a Document (Identifies the user and
printer ñame)
824 Informational Print Job Sandbox / Isolating print job
802 Informational Print Job deletion
842 Informational Print job isolation and print process tracking

33 The GPO path is: Computer configuration » Administrative Templates » Printers» allow job
ñame in event logs.

95
Security Monitoring Use Cases by Data Source

Operating System Security, Change, and Stability


There are several conditions that affect system security and stability. By being
able to monitor these conditions, the SOC can support helping to identify
system stability issues and help to identify operational issues.

Operating system stability Use Cases:

1. Adverse events by population: The same adverse stability event occurring


across N% of your environment. Think 2%, so if you have 1,000 servers that
means an error occurs across 20 Systems within a 24-hour period. As you
find and remedíate systemic issues, you would set this higher. Rather than
focusing on "100 systems", though, this metric is better related to a single
digit percentage of servers, because that metric has operational and
security valué. This particular condition is where the event taxonomy will
come in handy, because identifying all of the source events by specific type
would be exhausting.
2. Security Service failures: The use case relates to security focused Services
failing, because that can indícate the environment cannot be properly
monitored or active tampering is occurring.
3. (Un)lnstalls outside of the change or maintenance window: For changes
(1022,1033, 903-908), SOC should be able to perform long tail analysis of
the installed application on a daily basis. For centrally deployed applications,
the count of successful installation should be the size of the target
population.
4. Clearing the event log: When configuring an alarm for this condition, make
sure the alarm is for the security Service and not the ADFS Service. The ADFS
Service logs events to the Security log with event ID 1102. Clearing the event
log should rarely, if ever, occur on the network. There are múltiple
configuration changes that can be done to compénsate for reasons
someone would cite to clear a log. For example: if the event log is too large,
then its size can be reduced through group policy by increment, like
dropping the size by 10% every six hours until the log reaches 128MB34. As
events are written, the log will naturally trim itself. An OS can also be
configured to shut down if the log is full. Given these conditions, about the
only reason for legitímate clearing of the log is if it is truly corrupted and a
reboot didn't fix the log.
5. New Services: Windows records a new Service installation with Event ID
4697. These are infrequent events and should be supported with a change
control ítem.

34 This number is based on using Server 2008 and 2012 in a highly virtualized environment. The
Windows admins found that was a size that provided several days to months of record keeping
and still allowed the Event Viewer to be responsive. YMIVIV.

96

You might also like