Boot Trace Windows
Boot Trace Windows
Boot Trace Windows
This is the first in a series of blog posts I'll be writing that discuss the optimization of Windows
client boot and logon in a corporate setting. Certainly, a lot of what I'll discuss can be applied to
the home PC while other parts of it won't matter. Other blog writers have discussed tools and
techniques that are close to this topic but I hope to bring some simplicity and a new way of
thinking to an area that challenges a lot of IT Pros.
This content started with an idea I had several years ago. As a Premier Field Engineer for Microsoft,
I was doing a lot of desktop assessments for customers where we were examining the boot and
logon time of Windows XP clients and helping them find ways to speed things up. As customers
moved away from Windows XP, I found that while helpful, these assessments were also a little
annoying.
Think about it. You've spent time designing a new corporate image that includes the software you
want, it's been configured with the Group Policy you want and you've deployed the Anti-virus
software you want. Then I come in the door and tell you it's wrong and you need to change it
(politely of course).
As I was having these experiences with customers, I started to think about how we could turn it
around. Could we use the same tools I use for analysis and build-in optimization during the design
phase? Heck yeah we could.
So, along with a couple of my colleagues (Roger Southgate and Scott Duffey), we built a customer
facing workshop. This workshop (Windows Enterprise Boot and Logon Optimization) is available
globally to all Microsoft Premier Customers and is delivered by Premier Field Engineering. And the
deliveries we've had have been really successful.
But now I want to share these ideas with a larger audience. It's time for this blog post series.
The Sales Pitch
So first up, why do we care about the boot and logon performance of user devices in the
Enterprise?
What we've found is that client performance is often overlooked. It's not built into the design,
there's no baseline for acceptable performance and this leads to a loss of productivity.
Users make assumptions about IT based upon their user experience and often, issues go unnoticed
when employees head off for a morning coffee while their PC boots, rather than calling helpdesk.
The objective of these blog posts will be to help you prioritize the user experience, design for
performance, establish a baseline, create a repeatable process to measure the impact of change
and to understand potential causes for poor user experience.
Eventually we'll get to some deep analysis but the intent is to start slow and build a solid process
that will help you minimize the time you need to spend isolating production issues.
With Windows 10 just around the corner, it's also a good time to have this discussion.
Windows Enterprise Client Boot and Logon
Optimization – Part 2, Tools and
Instrumentation
The process that I’m going to describe in the following posts will require you to benchmark the
user experience as you design the client image. In order to do this, you’ll need some way to time
the system as it boots and as the user logs in. What you may not realise is that the instrumentation
is already built into Windows.
A nice diagram that conveys ETW is shown here (thank you Matthew Reynolds for this
conceptualization)
A quick summary -
ETW is the framework built into Windows that allows the instrumentation we’re interested in
Providers create the events captured during a trace session
Controllers allow you to start/stop logging and to choose the providers from which to
capture events
Real-time Consumers can display information as it is generated
ETL files can capture the event data for parsing/analysis by a Consumer after the trace is
complete
Tools
This series of posts will focus on tools available with the Windows Performance Toolkit (WPT). WPT
is included with the Windows Assessment and Deployment Kit (ADK).
You can download the ADK for Windows 10 here.
The ADK may be installed on any down-level version of Windows that is still in mainstream
support. You’ll want a client system set aside for analysis that has WPT installed as follows:
1. Download and run ADKSetup.exe from the ADK link provided above
2. Follow the installation wizard though until you reach the feature selection screen
3. Clear all check boxes except Windows Performance Toolkit and then click Install
Once completed with the default installation path, WPT will reside in C:\Program Files
(x86)\Windows Kits\10\Windows Performance Toolkit
For deep analysis, which I’ll discuss later, you’ll want to proceed through this installation. This
makes sure that additional binaries are in the right folders, components are registered and your
path environment variable has been modified.
The WPT installation folder also includes a Redistributables folder. This includes WPT Windows
Installer packages for each of three architectures -
WPTarm-arm_en-us.msi
WPTx64-x86_en-us
WPTx86-x86_en-us
For trace capture of a test system, you can flat copy this folder across to avoid a registry/Programs
and Features installation footprint. I usually put my tools in C:\WPT.
Going back to the terminology discussed above, and referencing some of the .exe’s you’ll find in
the WPT installation folder, we have -
1. Match the WPT architecture to the Operating System you want to benchmark (see
the previous post)
2. You must have local admin rights on the system to capture a trace
3. Consider the use of Autologon
The first two points are pretty clear but I want to expand on the use of Autologon -
During a boot trace, the instrumentation captures data throughout boot and logon. If a system is
left at the logon screen for a prolonged period of time, the trace will be longer and the timing will
be skewed. It’s certainly possible to extract information from the trace that tells you how long this
wait was but it’s an extra step and it’s not immediately accessible. In the interest of keeping things
simple and helping you create a robust process, I want to advocate the use of Autologon while the
system is being benchmarked.
N.B. The username and password is stored in clear text in the registry. It’s very important that you
remove the Autologon registry entries after you’ve completed benchmarking.
To enable Autologon, refer to knowledge base article - How to turn on automatic logon in
Windows XP (this article is still valid for supported versions of Windows).
Capturing a Timing Trace
Everything I’ve discussed to this point has been getting you ready to capture a timing trace. A
timing trace makes use of trace options that minimise performance overhead, log to a .ETL file and
establish the timing of boot phases.
To capture a timing trace, you begin by launching Windows Performance Recorder (WPRUI.exe).
This will be in your environment path if you completed a full installation of WPT. If not, just browse
to your WPT folder and launch it.
The options I’ll recommend for a timing trace are shown here -
With these options selected, simply click Start. Windows Performance Recorder will ask you to
select a trace file destination and to provide an optional description for the trace -
Once you click Save, you will be prompted to reboot. Once a reboot and login (with or without
Autologon) completes, the trace will continue for 120 seconds.
After this time, the trace will be saved to the location specified earlier (see above).
Simple Trace Analysis
The first thing you’ll want to do after capturing a timing trace is to extract some useful information
from it. In line with keeping it simple, I’m going to show you how XPerf.exe may be used as a
Consumer to process the trace. Let’s start with -
xperf.exe -i <trace_file.etl> -a tracestats
This simple command provides some information about the trace and the system from where it
was captured. For example:
Here I can see that the trace was captured from a system with 4 processors running at 2660 MHz
and that the total trace time was a little over 3 minutes.
So let’s get down to it. How can I extract the timing information needed to benchmark the system?
Again, XPerf.exe is the answer -
xperf -i <trace_file.etl> -o Summary.xml -a boot
The “-o” in this command specifies an output file. The output of boot analysis is in an XML format
so I choose an output file with a .xml extension.
If I open the Summary.xml file with Internet Explorer (or another favoured XML viewer), there’s a
great deal of data displayed. Again, keeping it simple means that I want to collapse all of the
redundant information and just display data relevant to system boot and logon times. At its
simplest, my data looks like this -
bootDoneViaExplorer is the time since boot start at which the Windows Desktop (Explorer) launch
was complete (48.944 seconds in this example).
postBootRequiredIdleTime is the required amount of accumulated idle time (CPU and Disk 80%
idle) before the system is considered ready for use. This is always 10 seconds.
bootDoneViaPostBoot is the time at which 10 seconds of idle time has been accumulated. In
practical terms, subtract 10 seconds from this number and consider this the time at which the
system was ready for use (80.944 – 10 = 70.944 seconds in this example). This calculation should
also be performed against the PostExplorerPeriod boot phase. A value of “-1” here means that the
system did not reach an idle state by the time the trace completed.
For the purposes of benchmarking the system, I have everything I need. I can record these
numbers and observe how they fluctuate as changes are made to the system image.
Benchmarking
The benchmarking approach I’ll recommend is just an iterative process. Start with a clean
installation of Windows on your chosen hardware and gather a timing trace. This provides you with
your best possible baseline for that system.
Add a component and benchmark it again. In this way you’ll gain insight into how much time that
component costs you.
The diagram above makes reference to optimizing ReadyBoot. For now, just be aware that you’ll need to do
this with systems using older rotational hard drives and not with systems using solid state drives (SSDs). I’ll
dedicate the entirety of my next blog post to this topic.
Another way to look at it is to benchmark each significant layer of the user experience. This
approach may look something like this -
While taking this layered approach, it’s important to apply components that have large overall
impact, last. For example, Antivirus software not only has impact on CPU and disk directly but
because it may be scanning binaries as they’re loaded by other processes, it can slow those other
components down. Install it after all of your other components.
Disk encryption software is another example of a component that may have large overall impact.
Conclusion
By following this iterative process, you’ll be able to understand the time each design decision costs
you.
Consider
Time (in minutes) x Average Reboots Per Day x Average Employee Cost Per Minute x #Employees
This number might lead you to re-evaluate design decisions in your client image.
Windows Enterprise Client Boot and Logon
Optimization – Part 4, An Aside on ReadyBoot
During my last post, I discussed a benchmarking process you might use during the development
of your Windows client system image. To recap -
ReadyBoot takes time to optimize (a number of reboots and a defrag cycle). Installing Windows
straight off distribution media and gathering a timing trace won’t accurately represent the system
at its best performance.
Additionally, large changes to the system can mean the last boot plan is sub optimal. ReadyBoot
usually tolerates small system changes but for example, installing Antivirus software which loads
large virus definition files during boot will invalidate the boot plan.
What Happens When the Boot Plan is Suboptimal?
A suboptimal boot plan means that the right data is not pre-fetched at the right time and boot
slows down.
Terminology that ties into the diagram above –
The <path> is used to store boot traces during the optimization process.
Executing this command will trigger a series of reboots that build a boot plan, defrag the hard
drive and optimize ReadyBoot. It’s a time consuming operation and requires a logon after each
reboot. It’s not uncommon for observers to think a system has hung during the defrag interval.
For these reasons, use ReadyBoot optimization cautiously; only after large changes to the system
or when you see that ReadyBoot efficiency is dropping away.
How Do I Measure ReadyBoot Efficiency?
ReadyBoot efficiency data may be extracted from a timing trace. Just as we extracted boot timing
information and trace statistics in Part 3 of this blog series, we can extract ReadyBoot information
with
xperf -i <filename.etl> -o ReadyBootSummary.txt -a bootprefetch -summary
This time, the output is simple text and so we choose an output file with a .txt extension. Once
collected, the output looks like
As mentioned in the graphic, you should aim for Prefetch-Hit Ratio and Request-Hit Ratio to
exceed 80%.
Keep in mind that as more components are installed, more data may need reading from disk
during boot. The longer the boot process goes, the more variance there may be in the boot
sequence and an optimized boot plan may be harder to achieve.
How Does Hardware Effect ReadyBoot Efficiency?
CPU and disk speeds play a significant role and ReadyBoot can only achieve so much. If your
system has a very high CPU speed but a very slow disk (e.g. 5400 RPM), pre-fetched data may be
consumed before ReadyBoot can pre-fetch more, leading to a high miss ratio.
In general, more RAM offers ReadyBoot more cache and in turn, you should see better system
performance during boot and logon.
In Part 3, I mentioned that you don’t need to optimize ReadyBoot if your system uses a Solid State
Drive (SSD). SSDs are fast enough that there’s no significant benefit to using ReadyBoot and so,
Windows disables the feature.
Conclusion
The first four posts of this blog series have introduced the idea of Windows Enterprise Client Boot
and Logon Optimization. I’ve discussed tools and instrumentation you might use to trace and
benchmark a system as a Windows client image is developed.
Finally in this post, I’ve discussed ReadyBoot functionality, when it’s important to optimize it and
how do execute that optimization.
With the first four posts in this series, you should be empowered to design your Windows
Enterprise Client system image with performance in mind.
Up to this point, I’ve discussed a process you can use to benchmark your Windows client image as
it’s designed. In this way you can come to understand how choices you make influence boot and
logon performance of your image on specific system hardware.
Going forward, I’ll be talking about how you can troubleshoot and isolate boot and logon
performance issues that unexpectedly creep into production. This is the hard (or harder) stuff.
Before we get to troubleshooting, I want to give you a tour of the tool you’ll be using. In addition
to using xperf.exe for producing a boot Summary.xml (discussed in Part 3), you need to be familiar
with Windows Performance Analyzer (WPA).
Windows Performance Analyzer (WPA)
WPA.exe is in your path after you install the Windows Performance Toolkit (WPT). For analysis, I
recommend completing a full installation of WPT instead of a flat-copy (discussed in Part 2). You
can safely copy .ETL trace files to an analysis system to save you installing the tools on every client
that has an issue.
The additional benefit to a full install is that .ETL files will have a file association to WPA. Double-
clicking an ETL file will open it and the default view you’ll see is this -
At this stage, it’s not beneficial to explain every option. I want to cover just enough to get you
started. As this blog series continues, I’ll introduce new features when they’re needed.
System Configuration
The first thing I find useful in troubleshooting boot and logon performance is to see system
details. This information is available by selecting System Configuration from the Trace menu. Doing
so opens a new tab in the analysis view. Here you can see I’ve selected the Storage category where
I can retrieve details of the physical disk.
In this example, you’ll notice the system uses a virtual hard drive – it’s a virtual machine running on
Hyper-V.
For a physical system, you’ll see the manufacturer and model number that may be used to
research disk detail. You may find the hard drive to be a 5400 RPM rotational drive and further
analysis showing saturated disk utilization. This may be evidence enough for a disk upgrade.
If you’re finished with System Configuration, you can close the tab just as you would with an
Internet browser.
Graph Explorer
The left-hand pane, called Graph Explorer, provides access to a number of pre-defined analysis
graphs. These graphs are categorised in System Activity, Computation, Storage, Memory, Power and
others depending on the options selected during trace capture.
Expanding a category provides access to the graphs. Double-clicking a graph adds it to the current
analysis view -
Here, I’ve added the CPU Usage (Sampled) graph and its data table.
As more graphs are added to the analysis view, they’re stacked on top of each other and the
analysis view gains a vertical scroll bar.
Graphs
Graphs may be highly customised and their display is tied to the ordering of columns in their
attached data table. You’ll see a lot of examples of this as the blog series continues but for now,
understand this -
To gain a feel for this, I suggest you try opening a few graphs for yourself. Examples of each graph
type mentioned above are -
During the analysis I’ll show you, there are times where I’ll modify the table. You should be aware
that this alters the appearance of the graph.
In situations where I want to maintain the graph but modify the table, I’ll open two instances of the
same graph, hide the table of the first instance and hide the graph of the second instance. This is
achieved using the buttons in the top right corner of each individual graph -
You’ll often want to focus on a small region of the graph. You may zoom to that region by
highlighting it and right-clicking -
Zooming in this way, not only expands the interesting region across the analysis view but also
filters the data table to display only events occurring in that region.
Analysis Views
Opening WPA provides you with the first analysis view. All graphs in the same analysis view are
subject to zooming. In other words, selecting a region in the CPU Utilization (Sampled) graph and
zooming to it, performs the same operation on every other graph residing in the same analysis
view.
New analysis views allow separate streams of investigation. To create a new analysis view, open a
new tab -
Conclusion
For now, I’ve covered WPA in enough detail to make a start on boot and logon performance
analysis. As this series continues, I’ll introduce new WPA features in the context of the task I’m
talking about.
Windows Enterprise Client Boot and Logon
Optimization – Part 6, Boot and Logon
Overview 0
Before I discuss boot and logon analysis in any detail, it’s important to provide an overview so that
you clearly understand the troubleshooting approach.
The boot and logon of Windows includes a number of distinct phases. When a particular phase is
slow, the investigation of delays requires an understanding of –
Windows Performance Analyzer (WPA) (see Part 5) as well as analysis summaries you can generate
with XPerf.exe, (seePart 3 and Part 4) expose these activities for examination. Consider -
1. Gather a boot trace and examine the boot phases. Systematically investigate the most time
consuming phasesa
2. Review disk utilization during the delayed phase. Disk I/O is a common bottleneck. Identify
processes that are high contributors
3. Review CPU utilization during the delayed phase. Identify processes that are high
contributorsb
4. Examine the behaviour of activities that occupy and span the lengthy phase
5. Failing all else, proceed to wait analysis c
The very early phases of boot will not be included in the trace. During POST and OSLoader, Event
Tracing for Windows (ETW) hasn’t started yet -
A Note on Trace Capture
For most of the analysis I’ll be discussing, a timing trace (discussed in Part 3) includes sufficient
detail. Some of the analysis however, will require a greater level of detail. For this reason, I
recommend the collection of an analysis trace any time you want to investigate boot and logon
delays – it just saves you having to do it twice.
An analysis trace uses the same options as a timing trace (shown below in green) except for the
detail level (shown below in red)
Conclusion
I’ve discussed the foundational information you need to begin boot and logon analysis. In posts to
come, I’ll discuss analysis of cross-phase activities as well as each boot phase in turn.
For each boot phase, I’ll discuss the activities that occur within the phase, measurement of the
phase and potential problems.
Windows Enterprise Client Boot and Logon
Optimization – Part 7, ReadyBoot Analysis
In my last post, I proposed that the troubleshooting approach for boot and logon performance is
to identify long running boot phases and then to investigate activities that occur during and across
those phases.
ReadyBoot is an activity that occurs across multiple phases on systems with rotational disks.
Recap
Part 4 of this series introduced ReadyBoot and I provided a method for extracting ReadyBoot
statistics from a boot trace. To revisit -
As mentioned in the graphic, you should aim for Prefetch-Hit Ratio and Request-Hit Ratio to
exceed 80%.
This data already provides us with an indication of whether there are problems with ReadyBoot.
Event Log
The Windows Event Log also contains some information about the state of ReadyBoot. Specifically,
whether the boot plan was correctly calculated. To see this, navigate to
Event Viewer –> Application and Services Logs –> Microsoft –> Windows –> ReadyBoost –
> Operational
The ReadyBoost (not a typo) Operational log writes an event 1016 every time the boot plan is
calculated. When this occurs after boot, the 1016 should be logged with a Reason of System Boot
Completion and a Result of 0x0 -
You may see a Reason other than System Boot Completion but these events do not represent the
boot plan calculation done at boot time.
In cases where the Result is not 0x0, there was an error calculating the boot plan.
Windows Performance Analyzer (WPA)
Any analysis trace captured on a rotational disk system running Windows 7 or later includes
ReadyBoot events. Opening this trace with WPA provides a ReadyBoot graph under the Storage
category.
It’s possible an error or some manual intervention prevents ReadyBoot from running. Typically, this occurs
when the SuperFetch service is disabled and the following registry value is set to 0x5, causing Windows to
assume the system boots from SSD. 0x0 also disables pre-fetching, while 0x3 is the default and
recommended value.
HKLM\SYSTEM\CurrentControlSet\services\rdyboost\Parameters\ReadyBootPlanAge
I like to view the graph and manipulate the table at the same time so I open two instances
(see Part 5 for WPA usage) -
Note that in some cases, I’ve observed the need to delete the contents of %windir%\Prefetch and
to run the ReadyBoot optimization again. Deleting the contents of the prefetch folder is often
discouraged but the act of manual ReadyBoot optimization rebuilds it for you.
As I’ve mentioned a few times now, ReadyBoot still has limitations. There’s only so much that may
be done to account for poor Disk I/O and reading large volumes of data during boot and logon.
Ideally, new systems will use SSDs for their OS disk and you keep the boot and logon process as
lean as possible.
If an SSD is out of the question, steer clear of 5400 RPM disks and ensure the OS disk on systems
you deploy to your Enterprise are at least 7200 RPM.
Conclusion
We’ve discussed the analysis and remediation of ReadyBoot which is the first cross-phase activity
you need to consider during boot and logon performance analysis.
When loading a driver that is not embedded-signed, Code Integrity scans all catalog files on the
system, trying to locate the signature. This operation delays validation of the driver and slows
down the associated boot phase which is particularly problematic for BOOT_START drivers.
Windows Performance Analyzer (WPA) shows catalog reload events in the System Activity –
> Generic Events table. Moving the Provider Name and Task Name columns to the very left provide
the following view
If you see ReloadCatalogs in the Task Name column after expanding the Microsoft-Windows-
CodeIntegrity provider, there’s a non-embedded-signed driver loading on the system.
I recommend installing the SDK on your analysis system and if you need to validate driver
signatures on a production system, flat-copy the entire signtool.exe directory from C:\Program Files
(x86)\Windows Kits\8.1\bin\<architecture>.
Once you have signtool.exe in place, you can check BOOT_START drivers by first parsing a boot
trace with
xperf -i trace.etl -o trace.txt -a dumper
1. Gather a boot trace and examine the boot phases. Systematically investigate the most time
consuming phasesa
2. Review disk utilization during the delayed phase. Disk I/O is a common bottleneck. Identify
processes that are high contributors
3. Review CPU utilization during the delayed phase. Identify processes that are high
contributorsb
4. Examine the behaviour of activities that occupy and span the lengthy phase
5. Failing all else, proceed to wait analysis c
With these environment variables set, you can open your analysis trace and at any time,
select Load Symbols from theTrace menu -
Depending on the speed of your Internet connection, loading symbols may take some time -
Keep in mind that only symbols for Microsoft binaries will become available using the instructions
I’ve provided here. Third-parties may not make symbols available. You’d need to ask the vendor to
see if they have a public symbol repository.
The last thing to mention about symbols is that after collecting an analysis trace, you may find
an NGENPDB folder in the same location as the trace. This folder contains symbols for .Net
components which have been automatically generated during trace capture. If you are copying the
analysis trace to another computer, you should also copy the NGENPDBfolder.
CPU Utilization
As I’ve mentioned above, investigation of a slow boot phase is done by selecting and zooming to
that phase (more on this in future posts), after which you’ll want to start investigation with CPU
and Disk Utilization.
I find it best to examine CPU Utilization by adding two instances of Computation –> CPU Usage
(Sampled) to the analysis view.
The data in the CPU Usage (Sampled) graph/table is collected by sampling CPU activity every 10ms. For this
reason, it’s ideal for investigation of high CPU issues.
I configure the first graph instance to display only the graph and from the menu at the top left
corner, I select Utilization by CPU -
Here, svchost.exe (1116) is the greatest contributor to the overall CPU utilization displayed in the
graph.
Analysis to this point may be enough to choose your next action. The top contributor to CPU
utilization may be a process you can remove or launch as a delayed scheduled task. I often see
instant messaging clients or mail clients automatically launched at logon which steal CPU time
away from other, more important activities.
Sometimes, you won’t be sure how to proceed. By loading symbols as I’ve discussed above, you
can expose the code path (stack of binaries and functions) spending most time on the CPU within
a process -
The information in the stack column appears as module!function. My reviewing the names of
functions, you may be able to deduce process behaviour at the time of high CPU consumption.
Ultimately, you may need the assistance of a Microsoft support engineer that has code access (if
the issue is in a Microsoft process) or a third-party support engineer if the issue is in non-Microsoft
code. This approach I’m suggesting just gives you a deeper view of system activity that may
empower you to take action.
Disk Utilization
Reviewing disk activity is quite similar to CPU Utilization. I start by adding two instances
of Storage –> Disk Usage Utilization by Disk.
I configure the first instance to display only the graph.
The second instance I display as a table, change the view to Disk Usage Utilization by Process, Path
Name, Stack and then change the columns so that they display as Process, Path Tree, <golden
bar>.
The resulting view looks like this -
The graph at the top gives me great insight into how busy the disk is overall while the table shows
me the processes responsible for the I/O. Expanding rows in the Path Tree column show me
exactly which files on disk are being accessed and from that, I can begin to deduce my next course
of action.
Conclusion
Today I’ve discussed my preferred approach to examining CPU and Disk Utilization with Windows
Performance Analyzer. These techniques provide evidence and clues regarding cause for high CPU
and Disk consumption.
Sometimes these investigations won’t lead to a clear next step. If the system has poor physical
hardware, it may be the case that you just can’t improve performance. On the other hand, fast
systems behaving poorly may reveal insightful results.
Beginning with Windows 8, ACPI 5.0 systems record POST duration (ms) in –
HKLM\System\CurrentControlSet\Control\Session Manager\Power\FwPOSTTime
Boot Phase POST – Potential Issues
BIOS/Firmware configuration may introduce a delay -
Last time, I began a breakdown of each boot phase. In this post, I’ll continue this discussion but
move on to OS Loaderand Kernel Initialization.
The reason I’ll discuss OS Loader and Kernel Initialization together is because the summary XML
generated with xperf.exe(see Part 3) summarises the total time of both activities in a single metric.
Regions of Interest in WPA
Windows Performance Analyzer (WPA) provides a new feature starting with the Windows 10 ADK –
the Regions of Interestgraph available in the System Activity group. By default, this graph displays
no data –
From WPA’s Trace menu, you can select Trace Properties and load a Regions of Interest definition
file –
The default location for Regions of Interest definition files is C:\Program Files (x86)\Windows
Kits\10\Windows Performance Toolkit\Catalog. By clicking the Add button, you’ll see a number of
definitions to choose from. For the purposes of Boot and Logon performance analysis, you should
choose FullBoot.Boot.Regions
The Regions of Interest graph is now populated with a Gantt chart of the boot phases we’re
interested in –
I’ll make reference to the Regions of Interest as we proceed to describe each boot phase and its
analysis.
Kernel Initialization
You’ll notice a slight discrepancy in the duration which is due to xperf.exe and Regions of
Interest calculating in different ways. These small differences are unimportant in the context of our
investigation.
Boot Phase OS Loader and Kernel Initialization – Potential Issues
OS Loader
Kernel Initialization
If ReloadCatalog events are discovered, use the techniques in the code integrity post to identify
the drivers responsible.
The next action you can take is to look for drivers that take a long time to load. As discussed
above, the drivers of interest are BOOT_START and SYSTEM_START drivers. The load times (slowest
at the top) for these are exposed towards the bottom of the summary XML file below the PnP
section -
For any drivers causing problems, contact the vendor and look for an update.
In almost all cases, you’ll see that Disk Utilization is near 100% during this phase. The system is
trying to load drivers as fast as possible – it makes sense to push the disk as hard as possible.
Slower disks will slow this phase down.
Conclusion
Some improvement may be seen by upgrading the system disk, especially old, slow, 5400 RPM
drives.
Windows Enterprise Client Boot and Logon
Optimization – Part 12, Boot Phase – Session
Initialization
★★★★★
★★★★
★★★
★★
0
0
In my last post, I mentioned that Kernel Initialization concluded by handing control over to the
Session Manager. This signals the start of the next phase – Session Initialization.
If you try to run Chkdsk on the boot volume it is scheduled for the next reboot
If Chkdsk cannot gain exclusive use of the volume
If the volume is flagged as dirty
The most common issue during this phase is slow driver initialization with graphics drivers being
the typical culprit.
Boot Phase Session Initialization – Remediation
The execution of Check Disk during boot should be obvious as its status is written to the screen as
the disk is examined. After completion, Check Disk typically re-cycles the system so evidence of
this activity will not appear in an analysis trace.
Understanding that a dirty shutdown is a common trigger for Check Disk means that you should
look for relevant causes – possibly a hardware related issue. Also,
examine HKLM\System\CurrentControlSet\Control\Session Manager\BootExecutewhich should have
a default value as shown –
Slow drivers are identified in the same way as I discussed for Kernel Initialization only they show up
in the remainder section of the summary XML file –
Again, updates for drivers causing delays should be obtained from the vendor.
Conclusion
Session Initialization experiences similar issues to Kernel Initialization. Slow drivers should be
identified and updated where possible.
★★★★
★★★
★★
Some of the activities that occur during the Winlogon phase have a dependency on supporting
network and server infrastructure. I’ll write a post regarding the influence of your infrastructure at a
later time.
Boot Phase Winlogon – Phase Activity
Alternatively, the Regions of Interest graph (discussed in Post 11) shows this phase duration as Boot-
Winlogon-Phase:
Boot Phase Winlogon – Potential Issues
You can envision service start as follows (again, thank you Matthew Reynolds for this conceptualization) –
In addition, consider –
Synchronous processing waits for the network which waits for auto-start services
Triggers for synchronous processing
o Always wait for the network at computer startup and logon policy setting
o Folder Redirection
o Roaming User Profiles
o Software Installation (via policy)
o Home Drives
After examining CPU and Disk utilization, the first avenue of investigation is to expand Boot-Winlogon-
Phase: in theRegions of Interest graph. By doing so, we see a breakdown of activities with
the Winlogon phase –
The example above provides some insight that the User Profile has loaded quickly but Group Policy is a
contributing factor, both during machine policy processing as well as user policy processing. With this
information, I’ll want to check whether policy processing is in synchronous mode.
To do this, I’ll select and zoom to Boot-Winlogon-Phase:, open the Generic Events table, order columns as
follows and then expand the Microsoft-Windows-GroupPolicy provider –
Now that I know synchronous processing of Group Policy is taking place, I will also examine
the Services graph and table. Once again, column ordering is important –
Now I’ve identified a slow starting service, I would investigate whether it may be moved to delayed -auto-
start and seek an update from the vendor.
If I move back to the Generic Events table (using exactly the same layout as above), scrolling further down
reveals two rows. The first row represents the Group Policy Objects that have applied while the second row
represents the Group Policy Objects that have been filtered. These rows are a little hard to find but Field
1 contains useful information that shows up in a tool tip when you mouse over it.
Applied GPOs (machine) –
I see that C-GPO1 and C-GPO3 are not applied due to a WMI filter. WMI filtering may be a very expensive
operation so my next avenue of investigation would be to examine those filters and attempt to exclude
those Group Policy objects using another method.
Similarly, when I scroll down further still, the Generic Events table shows applied and filtered Group Policy
Objects for the user –
Lastly, after zooming to Boot-Winlogon-Phase:, I can use the Process Lifetimes graph and table to identify
processes that may have been launched by start-up or logon scripts –
Transient processes whose life ends within GPClient phases (computer or user) may provide clues regarding
script execution delaying the user experience. Consider script optimization or moving script functionality
to Group Policy Preferences (where possible).
Conclusion
Winlogon is a complex phase with a lot going on. Windows Performance Analyzer (WPA) provides useful
insight into the phase activities and makes it easier to isolate problems. As you’ll find, it takes some time
and experience to become familiar with the workflow I’ve described but with practice, analysis with these
tools is powerful and specific.