Windows Server Patching
Windows Server Patching
Windows Server Patching
Table of Contents
Audience
Downtime
Scheduling
Change Management
Compliance and Reporting
Additional Notes
Microsoft zero days/Out-of-band patches
Windows Fail-over Cluster patching
Credits
As we all know applying security/cumulative updated in the Windows environment is very important to secure the
environment from external attack. It also helps to fix identified bugs in the previous version and improve stability and
performance.
Writing this article, because we have seen many administrators struggle to accomplish patching activity into own
organization or in a customer environment. Most of the struggle is not because of technical challenges, but it’s due to
operational challenges, for example, server downtime, scheduling, change management etc.
I am taking this opportunity to share the best practices which I have followed while performing Windows Patching. This
may help to organize/structure the patching activity in your organization.
Audience
Windows Server Administrators and the people who follow server patching via any patching tool available in the market.
Important
This article covers for the Windows Server environment and applicable for Operating System
patching - like Windows Server 2008, 2012, 2016, this will not cover for installed Microsoft or
any other application on servers - like Microsoft Exchange, SharePoint, SQL etc. For application
patching separate test needs to be carried by application specialist before deployment to
production environment.
Downtime
Downtime is the key factor in entire patching activity. Many of us are experiencing issues while getting downtime for
business. Every organization handles it differently, based on business approval. Few of the methods which can be used
mentioned below.!
By following one of this method you can reduce the efforts which we add for getting downtime for business.
1. Follow the standard maintenance window. There are a couple of organizations which follow the standard
maintenance window of the environment. We can utilize the same window for patching. Example – Standard
maintenance window for DEV and UAT environment during weekday post business hours (10 PM – 3 AM) and
weekend for production and DR environment. Servers can be restarted within the provided Standard
maintenance window.
2. Or Standard four hours patching window, which is only approved to perform patching activity. This window
should be agreed on by all business units. For example – Second Friday (DEV), Second Saturday (UAT), Third
Saturday (Production) and Fourth Saturday ( DR).
3. Or Another example – First week after second Tuesday à (Week 1) DEV, (Week 2) UAT, (Week 3 ) Production,
(Week 4) DR (DR week can be the first week of the upcoming month).
Scheduling
As mentioned above, getting downtime is always a challenging task for all administrators. There are a couple of points,
that we need to consider while preparing a schedule to line up it with agreed downtime.
Bellow scenario is describing based on the downtime method (3) mentioned in the above section of “Server Downtime”.
Change Management
Change management is also one of the important factors in patching. This gives awareness about the upcoming
changes in the environment and also help from an audit point of view. Every organization will have defined process
based on business needs. It's recommended using Standard Change Template since patching activity is one of the
mandatory activities which will be performed on a monthly basis. Using Standard Template we minimize the change
initiator work of drafting the Change Description/Change Task etc.
Compliance and Reporting
It’s very important to carry out compliance check post completion of patching. Measuring the implanted work is always
beneficial to the organization from the security audit point of view.
It’s recommended to perform patching compliance imitated post completion of patching. For example – if you have
four hours of downtime, then perform the patching compliance scan on second of third hours so that you can re-patch
the servers within the same downtime under approved change. If you missed checking compliance within the same
downtime window, then you may need to request for new downtime for business and also need to raise a separate
change ticket.
If your compliance mechanism is giving compliance data after 24/48 hours, then its recommended patching missing
servers in upcoming downtime windows. Do not keep a backlog for a longer time. This impact on the overall compliance
by end of month cycle.
Additional Notes
1. Make sure you are performing daily health check for the patching tool agent (The agent will depend on
patching tool which you are using example Microsoft SCCM, HPSA etc.). All agents should be reported as
healthy. The agent who are not healthy should remediate them immediately. If the agent is not healthy, it may
fail to patch the server and it will impact on patching compliance.
2. If any issue encountered on the application during DEV/UAT testing, then make sure to exclude production
and DR servers from patching until issue fixed on DEV/UAT
3. If we are uninstalling the patches due to the reported issue, then make sure the application team will consult
with App vendor for solution and compatibility. Because we can’t keep servers without patching for a longer
duration.
If the security team confirms to deploy the patches within the next 48 hours, then we have to define the scope by
identifying servers running with an impacted software/product under the venerability. For example, If the vulnerability
is identified in Internet Explorer 9, then we have to identify how many servers in the environment are running with IE9.
Data can be fetched by the compliance tool which you are using in your environment. If you are using Microsoft SCCM
, then you can create a custom report with a custom query to fetch this data. If you don’t have any tool, then you have
to use any scripting method, the last option is a manual method, but fetching this information manually will be a tedious
job if you have more servers.
Assume after assessment, you have 100 servers running with IE9 out of 4000 servers. In this case, you have to plan to
patch these 100 servers on priority. Since the timeline is short, you may need to notify/contact server owner/application
owner to take explicit approval for a server reboot. After the approval servers can be patched and reboot post business
hours to minimize the business impact. If the standard changed management is not helping to fulfill the change
management requirement, then you may need to go with an emergency change request.
Apart from these impacted 100 servers, the rest of the servers you can patch as per your standard patching schedule.
Sometimes installed antivirus software can mitigate the vulnerability, In this situation, you have to take a call with the
security team. As far as installed antivirus is securing your environment, you can patch the servers in regular patching
schedule. Make sure you have confirmation from antivirus vendor about security coverage.
Consider you have two node windows Fail-over cluster running File Server Role.
Microsoft recommends running all the cluster nodes on the same patch level.
Patching and restart you can automate If you are going to take care of pre-work of resources movement before Patch
deployment schedule.
Hope this article will be helpful for you. Your comments and feedback is important
Important
Suggesting you consult with your senior staff/Technical Lead/Technical Manager before you
follow any of the approaches, the above best practices are shared based on my experience.