Clearpass 6.X: Tech Note: Clearpass Clustering Design Guidelines

ClearPass 6.
x
Tech Note: ClearPass
Clustering Design Guidelines

Version Date Modified By Comments
0.1 2013 David Wilson Early Draft Version
0.2 August 2014 Danny Jump Published Draft (#10119)
1.0 November 2014 Danny Jump Published Version 1
1.1 November 2015 Danny Jump Minor updates through the document.
1.2 April 2017 Danny Jump Minor update.

ClearPass 6.x Tech Note: Clustering Design Guidelines
Table of Contents

Table of Contents ............................................................................................................................................ 2
Table of Figures ............................................................................................................................................... 5
Copyright ............................................................................................................................................................ 6
Open Source Code ........................................................................................................................................... 6
Introduction ................................................................................................................................................................ 7
Audience .................................................................................................................................................................. 7
Notes on this Version of this Document ..................................................................................................... 7
V1 – October 2014 .......................................................................................................................................... 7
Clustering Overview ................................................................................................................................................ 8
WAN Considerations L2/L3 ....................................................................................................................... 8
Campus Considerations L2/L3 .................................................................................................................. 8
ClearPass Databases ........................................................................................................................................... 8
Publisher/Subscriber Model ........................................................................................................................... 9
What Is Replicated? ..................................................................................................................................... 10
What Is A Large-Scale Deployment? ......................................................................................................... 10
Clustering Example 1.................................................................................................................................. 11
Clustering Example 2.................................................................................................................................. 12
Network Traffic Flows .................................................................................................................................... 13
Cluster-wide replication ................................................................................................................................ 13
Handling Authentication Requests ............................................................................................................ 13
Optimizing Authentication processing for a MSFT AD domain .................................................... 14
Internal API For Dynamic Content Creation (Guest/Onboard) .................................................... 15
Onboard Certificates And OCSP .................................................................................................................. 16
Aruba, a HP Enterprise Company 2

OCSP Recommendations ........................................................................................................................... 18
Load Balancing ................................................................................................................................................... 18
Auto Backup Collector aka “Data Puller” ................................................................................................ 19
Linux Installation ......................................................................................................................................... 20
Windows Installation ................................................................................................................................. 21
Update to Data Puller Feature in CPPM 6.5 [Push rather than Pull mode] ......................... 21
Failover Modes ........................................................................................................................................................ 22
Publisher Down ................................................................................................................................................. 22
Guest/Onboard ............................................................................................................................................. 22
The Standby-Publisher .............................................................................................................................. 22
Publisher Failover - L2 or L3? ................................................................................................................ 23
How the Failover Process works ........................................................................................................... 24
What do you lose when the Publisher fails? ..................................................................................... 25
Subscriber Down ............................................................................................................................................... 25
Design Guidelines .................................................................................................................................................. 26
Allow HTTP/S Between Publisher and Subscribers .......................................................................... 26
Allow Database & ‘other’ Traffic Between PUB and SUB’s .............................................................. 26
Size The Publisher Node Appropriately .................................................................................................. 26
Provide Sufficient Bandwidth Between Publisher/Subscribers .................................................. 27
Bandwidth Usage/Sizing for a CPPM Cluster ................................................................................... 27
Volumetrics of Cluster in an idle state ................................................................................................ 28
RADIUS RTT Considerations ............................................................................................................................. 30
ClearPass Cluster Bandwidth Consumption .............................................................................................. 33
Guest .................................................................................................................................................................. 33
Insight ............................................................................................................................................................... 33

Use Zones for Geographical Regions ........................................................................................................ 34
Use Nearest Subscriber Node ...................................................................................................................... 35
Use Subscriber Nodes As Workers ............................................................................................................ 35
Use Dedicated Insight Node ......................................................................................................................... 35
Insight Setup ....................................................................................................................................................... 36
Insight Resilience ......................................................................................................................................... 37
Cluster Wide Parameters config settings .......................................................................................... 38
High Capacity Guest .............................................................................................................................................. 39
Enabling HCG ................................................................................................................................................. 39
Supported number of Users in HCG ..................................................................................................... 39
HCG Cluster ..................................................................................................................................................... 39
HCG - Other related information (licensing/disabled features) .............................................. 40
Cluster Operations ................................................................................................................................................ 41
Making a node a Subscriber from the GUI ......................................................................................... 41
Timings to add a CPPM Node to a cluster – Timings .................................................................... 42
Making a node a Subscriber from the CLI ......................................................................................... 42
Cluster Administration ................................................................................................................................... 45
Cluster Upgrades ............................................................................................................................................... 47
Cluster Upgrade Tool .................................................................................................................................. 49
Scaling Limitations ................................................................................................................................................ 50
Virtual IP Considerations ................................................................................................................................... 51

Table of Figures
Figure 1 - Subscriber GUI 'read-only' banner message warning .............................................................. 9
Figure 2 - Node specific configuration sections .............................................................................................. 10
Figure 3 - Cluster Example 1 - Picture ................................................................................................................ 11
Figure 4 - Clustering Example 2 - picture ......................................................................................................... 12
Figure 5 - Subscriber 'Read Only Access' when changing a guest password ..................................... 15
Figure 6 - OCSP Recommendations Summary ................................................................................................ 18
Figure 7 - Setting OCSP Authentication Method ............................................................................................ 18
Figure 8 - Autobackup Options .............................................................................................................................. 19
Figure 9 - List of auto-backup files ...................................................................................................................... 19
Figure 10 - Setting up the Standby Publisher .................................................................................................. 23
Figure 11 - Configuring Standby over a L3 connection - WARNING ..................................................... 23
Figure 12 – Total Data in bytes between Publisher and Subscriber in 24-hour idle period ....... 28
Figure 13 - Publisher traffic to Subscriber over 24-hours ......................................................................... 28
Figure 14 – Total Data in Bytes between Subscriber and Publisher in 24-hour idle period ...... 29
Figure 15 - Subscriber traffic to Publisher over 24-hours ......................................................................... 29
Figure 16 - RADIUS RTT Testing from NAD to CPPM (EAP-PEAP Win7) ............................................ 31
Figure 17 - RADIUS RTT Testing from NAD to CPPM (EAP-PEAP Win8.1) ........................................ 32
Figure 18 - Enabling Insight on a CPPM node ................................................................................................ 36
Figure 19 - Insight resiliency across cluster nodes ....................................................................................... 37
Figure 20 - Enabling Insight on multiple nodes and defining the Backup source order .............. 38
Figure 21 - Enabling HCG mode ............................................................................................................................ 39
Figure 22 – Warning message when enabling HCG mode. ........................................................................ 40
Figure 23 - Make Subscriber on GUI .................................................................................................................... 41
Figure 24 - Sync in Progress for new cluster node ........................................................................................ 42
Figure 25 - Event Viewer success message after new cluster node added .......................................... 42
Figure 26 - Setting up cluster from CLI .............................................................................................................. 44
Figure 27 - Checking on cluster progress from CLI ....................................................................................... 44
Figure 28 - Dropping Subscriber from Publisher ........................................................................................... 45
Figure 29 - Drop Subscriber confirmation message ..................................................................................... 45
Figure 30 - Drop Subscriber confirmation options ....................................................................................... 46
Figure 31 - Resetting a CPPM node configuration ........................................................................................ 46
Figure 32 - CLI messages for adding a node .................................................................................................... 48
Figure 33 - Confirmation that an upgrade is completed ............................................................................ 48


Copyright
© 2014 Aruba Networks, Inc. Aruba Networks’ trademarks include Aruba Networks®,
Aruba The Mobile Edge Company® (stylized), Aruba Mobility-Defined Networks™, Aruba
Mobility Management System®, People Move Networks Must Follow®, Mobile Edge
Architecture®, RFProtect®, Green Island®, ETips®, ClientMatchTM, Virtual Intranet
AccessTM, ClearPass Access Management SystemsTM, Aruba InstantTM, ArubaOSTM,
xSecTM, ServiceEdgeTM, Aruba ClearPass Access Management SystemTM, AirmeshTM,
AirWaveTM, Aruba CentralTM, and “ARUBA@WORKTM. All rights reserved. All other
trademarks are the property of their respective owners.
Open Source Code

Certain Aruba products include Open Source software code developed by third parties,
including software code subject to the GNU General Public License (GPL), GNU Lesser
General Public License (LGPL), or other Open Source Licenses. The Open Source code used
can be found at this site: http://www.arubanetworks.com/open_source

Introduction
This TechNote describes the design guidelines that are applicable to large-scale
deployments of the ClearPass product.
The intent is to provide documentation about what can and cannot be done with the
publisher/subscriber clustering model implemented in ClearPass. These constraints will
enable proposed designs to be checked for feasibility and compliance with recommended
practices. Where it is practical, best practices will be documented, although not every
conceivable use case or deployment can be covered here.

Note: Where you see a red-chili this is to signify a ‘hot’ important point and highlights
that this point is to be taken as a best-practice recommendation.

Audience
The reader is assumed to be familiar with the ClearPass family of products, including Policy
Manager, Insight, Guest and Onboard. Basic knowledge of IP networks and wide-area
networking is also assumed.
Notes on this Version of this Document

V1 – October 2014
This document has been released early to be shared with the field. Within this document is
a host of valuable information covering the design/deployment/management of a clustered
CPPM network and the important components such as Insight that need special
consideration. We are already actively gathering more related and relevant information
and plan to release an updated version of this document at some time in the future.

Clustering Overview
Within this section we discuss the process of the initial design of a cluster.
CPPM can be deployed either as a dedicated hardware appliance or a Virtual Machine

running on top of VMware ESXi. We support a 500, 5,000 or a 25,000 endpoint appliance,
the exception to this is when you deploy ClearPass in High Guest Capacity mode where the
node can support 1,000, 10,000 and 50,000 Guests/day respectively. When demand
exceeds the capacity of a single instance or we have a requirement to have a High
Availability deployment we have the option of logically join multiple instances together to
process the workload from the network. You can logically join physical and virtual
instances and also join dissimilar sized CPPM instances, however careful planning must be
taken especially if you plan to utilize the failover capabilities within the clustering feature.
WAN Considerations L2/L3

Where a CPPM cluster is deployed and ‘typical’ WAN technologies separate the nodes e.g.
MPLS with low-speed (sub 10Mbps) and high-latency (>50ms RTT) then additional
consideration regarding the deployment must be considered and discussed with the
customer as outlined and discussed later in this document in Provide Sufficient
Bandwidth Between Publisher/Subscribers.
Campus Considerations L2/L3

No specific consideration is typically required when clustering in a Campus/LAN
environment, though the placement of CPPM nodes SHOULD typically be close to the user
population but not that critical. If the Campus network connecting building and faculty is
based around a MAN/VPLS then there are no special considerations around
bandwidth/latency and the main consideration here is then only the CPPM configuration
and clustering for High Availability.
ClearPass Databases
A single ClearPass server makes use of several different databases:
The configuration database contains most of the editable entries that can be seen in the
GUI. This includes, but is not limited to:
• Administrative user accounts

• Local user accounts
• Service definitions
• Role definitions
• Enforcement policies and profiles
• Network access devices

• Guest accounts
• Onboard certificates
• Most of the configuration shown within Guest and Onboard
The log database contains activity logs generated by typical usage of the system. This
includes information shown in Access Tracker and the Event Viewer.
The Insight database records historical information generated by the Netevents

framework, and is used to generate reports.
Publisher/Subscriber Model
ClearPass uses a publisher/subscriber model to provide a multiple-box clustering
capability.
Another term for this model is “hub and spoke”, where the “hub” corresponds to the
publisher, and the “spokes” correspond to the subscribers.
The publisher node has full read/write access to the configuration database. All
configuration changes MUST be made on the publisher. The publisher sends configuration
changes to each subscriber.
The subscriber maintain a local copy of the configuration database and each have read-
only access to a local copy of the configuration database. A background replication process
handles the task of updating the configuration database based on the configuration changes
received from the publisher.
Because the subscriber has read-only access, a message will be displayed to an

administrator logging in to that server, indicating that read-only access is available and that
they should log into the publisher for full access.

Figure 1 - Subscriber GUI 'read-only' banner message warning

What Is Replicated?
Multiple items exist within a CPPM node/cluster that must be shared to ensure successful
operation of the cluster. Only the configuration database is replicated. Note that the Log
and Insight databases are not replicated across the cluster.
However, certain items are node specific and these must be configured separately for each
node, this can be achieved directly on the Publisher or individually on the node. The node
specific attribute can be summarized as the configuration under the below highlighted
sections.

Figure 2 - Node specific configuration sections
Note: Finally three other items that are node specific, Log Configuration, Local Shared
Folders and Server Certificates (RADIUS and HTTPS) need to be individually configured.
What Is A Large-Scale Deployment?

Large-scale deployments are defined as those that would require the publisher node to
be dedicated to servicing the subscriber nodes, i.e. the Publisher is not directly processing
authentication requests.
This is the case when the volume of configuration changes generated by all subscribers in
the cluster impacts the publisher node. This limits the publisher node’s capacity to handle
other tasks and implies that it must become a dedicated node.
Design Guidance: The dedicated Publisher should be a CP-HW-25K appliance or a CP-VM-

25K that matches the minimum spec for the VM. The VM specification can be found here.
Configuration changes that SHOULD be considered in the context of a large-scale

deployment include:
• Creating, modifying or deleting a guest account

• Issuing or revoking an Onboard certificate
• Modifying Policy Manager configuration (adding a network access device, defining a
new service, updating an enforcement profile, etc.)
• Adding new endpoints (including automatically created endpoints) in Policy Manager
• Modifications made to guest account or endpoint records with a Policy Manager post-
authentication profile

Note that not every clustering scenario is a large-scale deployment. CPPM clustering may
also be performed for other reasons, for example to distribute several CPPM nodes
geographically for policy reasons, or to have an off-site disaster recovery system.
Clustering Example 1
Authenticating corporate users with Guest access. A cluster of CP-HW-5K’s has two
nodes (US East Coast and US West Coast). US-West is the publisher, and US-East is the
subscriber. Each node handles the authentication traffic for 2,000 corporate endpoints.
Each node also registers 100 guests per day. There are few configuration updates in the
network.
This fictitious customer example would not be considered a large-scale deployment:
• The additional load on the publisher due to clustering can be estimated at 100 guest
accounts created per day.
• The authentication traffic on the subscriber node does not impose any additional load
on the publisher and the new endpoints registered (in the order of 100 per day,
assuming new guests each day) does also not add any significant load.
• This workload on the publisher is small and represents a fraction of its capacity.
In this example, each node could be used as the backup for the other node. In the event of a
node failure, the other node could handle the authentication requirements of all 4,000
endpoints plus 200 guest registrations per day.

Figure 3 - Cluster Example 1 - Picture

Clustering Example 2
Authenticating conference center users. A cluster has three CP-HW-25K’s nodes in the
same time-zone. Located in San Jose (Publisher), San Diego (Subscriber) and Seattle
(Subscriber). Each node can registers up to 15,000 guests per day, often in short bursts.
There is constant authentication traffic through the day from the onsite employees and
guest. On some days, a node may be idle, but there are days where all nodes are busy.
This would be considered a large-scale deployment:
• In our example the maximum potential load on the publisher due to the Guest account
creation process can be estimated at 45,000 guest accounts being created per hour
(peak rate), that equates to 12.5 account creations per sec, a max of 15 accounts per sec.
• This is a significant load on the publisher.
In this example, a separate dedicated publisher node would be recommended: a hardware

appliance Publisher, CP-HW-25K, could theoretically handle up to 54,000 guest accounts
being created per hour (15 per sec), but with bursts of Guest traffic being unpredictable
during the ‘hot hour’ and with the corresponding replication of these accounts to each of
the subscriber nodes we consider this to be an example of a deployment warranting a
dedicated Publisher.

Figure 4 - Clustering Example 2 - picture
So even though in theory the Publisher could process and create these Guest accounts, this
amount of work in the hot-hour is not really feasible in addition to any other background
network authentication/replication etc. the Publisher is excepted to perform.

Network Traffic Flows

The table below lists the network ports that must be opened between the Pub and the Sub’s
Protocol Port Notes

UDP 123 NTP – time synchronization
TCP 80 HTTP – internal proxy
TCP 443 HTTPS – internal proxy and node-to-node communications
TCP 4231 NetWatch Post Auth module – (this port is no longer in
use after 6.5)
TCP 5432 Postgresql – database replication
All protocol/port combinations listed above should be bidirectional and should be open
between any two nodes in the cluster. The reason for this is that any subscriber node can
be promoted to the publisher node, which implies a fully connected network is necessary.
To see the complete list of ports required across a CPPM cluster to ensure all processes
beyond just the clustering process work correctly please review the document here.
Cluster-wide replication
Beyond the data that is replicated by the Multi-Master Cache (which is actually zone
specific), data in the configuration database is replicated cluster wide. Data that is NOT
replicated includes…… note that we discuss ZONES later in this document.
• Access Tracker Logs

• Session Log
• Accounting Data
• Event Viewer Data
• System Monitor
Handling Authentication Requests

The typical use case for Policy Manager is to process authentication requests using the
policy framework. The policy framework is a selection of services that work to process but
is not limited to and determine:- authentication, authorization, posture, enforcement, role
etc. of the endpoint/end-user.
In this use case, authentication typically involves a read-only operation as far as the
configuration database is concerned: a cluster node receives an authentication request,
determines the appropriate policies to apply, and responds appropriately. This does not
require a configuration change, and can therefore be scaled across the entire cluster.

Note: Authentication is performed from the node itself to the configured identity store,
whether local (as sync’ed by the Publisher i.e. a Guest account) or external like MSFT AD.
Logs relevant to each authentication request are recorded separately on each node, using
that node’s log database. Centralized reporting is handled by generating a Netevent from
the node, which is sent to all Insight nodes and recorded in the Insight database.
Optimizing Authentication processing for a MSFT AD domain

When attaching a CPPM node to an Active-Directory (AD) domain, (note that each CPPM
node must be separately attached/enrolled) this is the node that we send the Auth request
to. In CPPM 6.3 we added add some logic to control the processing of where CPPM sends
the authentication request to when the primary-node you initially connect to fails. This is
achieved via the configuration of AD Password Servers. If NO Password Servers are
configured then the processing of where the Auth requests are sent is indeterminate after
the primary node fails.

To better understand the processing of which server in the network could be used to
process these request look at the below nslookup example. This shows you the servers in
the network that can process the CPPM AD authentication requests. Knowing this you can
have a discussion with the customer to discuss where these server are located and whether
or not you want to add an deterministic process to which servers are used first.

danny-jump:Downloads djump$ nslookup
> set type=srv
> _ldap._tcp.dc._msdcs.arubanetworks.com
;; Truncated, retrying in TCP mode.
Server: 10.1.10.10
Address: 10.1.10.10#53
_ldap._tcp.dc._msdcs.arubanetworks.com service = 0 100 389 hqdc03.arubanetworks.com.
_ldap._tcp.dc._msdcs.arubanetworks.com service = 0 100 389 blr-dc-1.arubanetworks.com.
_ldap._tcp.dc._msdcs.arubanetworks.com service = 0 100 389 sjc-dc-05.arubanetworks.com.
_ldap._tcp.dc._msdcs.arubanetworks.com service = 0 100 389 dcv1dc01.arubanetworks.com.
_ldap._tcp.dc._msdcs.arubanetworks.com service = 0 100 389 chn-dc-01.arubanetworks.com.
_ldap._tcp.dc._msdcs.arubanetworks.com service = 0 100 389 hqdc04.arubanetworks.com..
_Etc. Etc. Etc. Etc…………
Making the processing deterministic can be achieved in the CPPM CLI with the following
command…. ad passwd-server set -s <server 1> <server 2> <server 3> <etc>

To see a list of the current configured servers.. ad passwd-server list -n <domain>
To load balance across DCs, different CPPM nodes in the cluster can be joined to different
domain controllers.

Internal API for Dynamic Content Creation (Guest/Onboard)

Most deployments will make relatively few policy changes after initial deployment is
complete. This is well suited to the publisher/subscriber model, as the policy configuration
is replicated to each subscriber in real-time. However, interactive use of the system to
create guest accounts or provision devices with Onboard poses a different challenge. These
use cases require configuration changes to be effective (Example: reset guest account
password).
Because of the publisher/subscriber model, configuration changes can only be performed

on the publisher. However, in a complex deployment it may be necessary to direct guests
and BYOD enrollment requests to a subscriber node.
Note: Some functions such as a sponsor creating a guest account, they MUST login to the
publisher. Same goes for MACtrac – it must be done on the publisher.
As an example, below we tried to change the password for a guest user on a Subscriber,
notice specifically the ‘Read Only Access’ message and the ‘Update Account ‘ is greyed out
and not available to be used.

Figure 5 - Subscriber 'Read Only Access' when changing a guest password
So, putting this in context of a CPPM High Availability cluster: If I want employees to login
and create guest accounts, (and I need that in an High Availability setup) I must setup the
standby publisher, plus, where appropriate use a VIP to ensure in the event of the failure
the VIP is always available on the clustered-publisher (active or standby)., so the re-directs
from the controllers always go to an available IP address (the VIP).

In the scenario where the standby-publisher is separated by a L3 WAN boundary, the use of
the VIP address between the active and standby publisher is not an option. We recommend
this in an environment where the active/standby nodes are deployed within the same
broadcast L2 network to simplify the availability of the active Publisher’s reachable IP
address.


The process that has been implemented in Guest and Onboard utilizes an internal
communications channel between the nodes to process any necessary requests that involve
database modification. This works as follows:
1. Subscriber node receives a request for a database modification, or for an operation that
could potentially lead to a database modification (e.g. guest_register.php)
2. The request is processed and internally channeled to the current publisher node
3. Publisher receives the request and handles it (performs the database modification or
generates the appropriate dynamic content)
4. The response is returned to the subscriber node
5. Subscriber node returns the response to the client
With this solution, it appears as if the change is taking place on the subscriber (all URLs will
appear to be pointing at the subscriber), but the change takes place on the publisher.
Onboard Certificates and OCSP

A device that is provisioned using Onboard will receive a client certificate that contains the
device’s credentials for accessing the network via EAP-TLS.
One use case supported in ClearPass is for an administrator to revoke a device’s client cert
and deny it access to the network. This is implemented with the Online Certificate Status
Protocol (OCSP), which provides a real-time status check on a particular cert’s validity.
In a large publisher/subscriber deployment, consideration needs to be given to how these

OCSP checks should be handled, as there may be a significant number of authentications
that use a client certificate, and each authentication attempt will require a separate OCSP
status check.
The available OCSP options in Onboard are configured under (CPPM 6.3 +) Onboard »
Certificate Authorities, prior to CPPM 6.3 it was configured under Onboard » Initial
Setup Certificate Authorities then the “Authority Info Access” option may be set to:
• Do not include OCSP Responder URL – default option; does not encode any OCSP URL
into the generated client certificate
• Include OCSP Responder URL – includes an OCSP URL in the client certificate, where
the URL is determined from the IP address of the issuing server (in the Onboard case
this will be the publisher)
• Specify an OCSP Responder URL – includes an OCSP URL in the client certificate, but
allows the URL to be specified manually
To avoid overloading the publisher with OCSP requests, the “Include OCSP Responder
URL” option must not be selected.

Note: Exception to this is when CPPM has been configured with more than one Onboard
CA, this MUST be used since each CA will have a different OCSP URL. You cannot hard code
the URL across the board in this scenario. Our recommendation is to include the OCSP URL
in the certificate and let the EAP-TLS auth method determine where to send the OCSP
request.
Either of the remaining options can be selected:
• If you select “Do not include OCSP Responder URL”, then CPPM must be manually
configured with an appropriate OCSP URL.
o This may be done by modifying the EAP-TLS authentication method, setting “Verify
Certificate using OCSP” to “Required”, selecting the “Override OCSP URL from Client”
checkbox, and then providing a suitable OCSP URL.
o OCSP requests do not need to use HTTPS.
o The OCSP URL provided should be a local reference to the same Policy Manager
server, i.e. http://localhost/guest/mdps_ocsp.php/1
o This will ensure that OCSP requests are handled by the same Policy Manager server
that handles the client’s EAP-TLS authentication.
• If you select “Specify an OCSP Responder URL”, then a suitable URL can be included as
part of each client certificate, without changing the CPPM configuration. However, there
are certain requirements for this URL:

o Using the IP address of a specific Policy Manager server is not recommended, as
this IP will be embedded into each client certificate for the lifetime of that
certificate. Changing the IP address would then require reissuing (re-provisioning)
any device that has a certificate. If the server is not responding, OCSP checks will
also fail.
o Instead, the OCSP URL should use a DNS name that can be resolved from anywhere
in the cluster.
o The target of the DNS name should be a nearby Policy Manager server. All nodes
(publisher and subscribers) are able to respond to OCSP requests.
o Round-robin DNS can be used to load-balance OCSP requests in different regions.
o This approach is not recommended for two reasons: server information is
embedded into the client certificate (which is unnecessary), and this approach also
imposes additional DNS configuration requirements.


OCSP Recommendations
The table below summarizes the recommended settings for OCSP in a publisher/subscriber
deployment:
Product Setting Value

Onboard Provisioning Settings » Authority Do not include OCSP Responder URL
Info Access
Policy Configuration » Authentication » Enable Override OCSP URL from Client
Manager Methods » EAP-TLS with OCSP Provide the OCSP URL
Enabled http://localhost/guest/mdps_ocsp.php/1

Figure 6 - OCSP Recommendations Summary

Figure 7 - Setting OCSP Authentication Method

Load Balancing
Considerations for using third-party load balancing, e.g. for HTTP(S) captive portal, RADIUS
auth/accounting have been well documented and are available in the CPPM + F5
Deployment TechNote. This along with other CPPM related TechNotes can be located here.

Auto Backup Collector aka “Data Puller”

When CPPM administrators make changes to the CPPM Configuration its desirable and best
practice to take a copy of the running configuration, so that in the event of a failure a CPPM
node can be re-deployed, especially if this is the Publisher. One of CPPM’s system jobs that
run daily produces an automated Backup file. By default this backup Config saves the
configuration database, known as the tipsdb database. As an advanced option you can
configure the backup setting to be Config|SessionInfo as shown below, this then saves the
configuration data and the access tracker records, this file is known as the tipslogdb file
and the Insightdb. To select which files are added to this backup, go to Administration ->
Server Manager -> Server Configuration -> Cluster-Wide Parameters as shown below.

Figure 8 – Auto-backup Options
These Backup files can be extremely useful whether a customer has a single or multi-node
deployment. The auto-backup file can be used to restore a node to a known point. The
backup task runs at 01:10am each night.

Figure 9 - List of auto-backup files

These backup files are stored within the CPPM node and never exported. CPPM tracks
these files and system cleanup jobs ensure they are purged to reduce storage. As of CPPM
6.4 however there is no feature to allow them to be saved/exported to off node storage. In
the 6.5 release we added this feature and its discussed below. However we do provide a
tool “as is” that can be deployed on a Linux or Windows client to allow those files to be
extracted and then if necessary utilized to restore a node configuration.
Download the tool that is supplied, its available for Windows as a 32 or 64 bit application,
for Linux we supply an RPM file. Download the appropriate files from our support site by
clicking here then follow the below installation instructions for Linux or Windows.
We recommend that you use the tool to extract the daily backup from the Publisher. Note
that you’ll have to manually manage the disk space and rotation of these backup files on the
offline system.
Linux Installation
Install the 'rpm' file on a Linux system by issuing the following command:  
rpm -Uvh <filename>
We have tested the installer with CentOS 5.3/5.4/6.x versions. Installer should be
compatible with any Linux distributions supporting RPM installations.  
The configuration directory of the application is --   /usr/local/avenda/datapuller/etc

After the installation, edit the "datapuller.conf" in the configuration directory to provide
the following details:
 
CPPM Server IP Address (only one node in the cluster is required)
Administration username [UI username is typically 'admin']  
Administration password    

If required create a separate External Data Puller Service account with Super Administrator
privileges on the Publisher.

Restart the "avenda-ext-backup" Service.   /sbin/service avenda-ext-backup restart

Once the service is up and running, it downloads the configuration backup files from the
CPPM nodes in the cluster and stores them in the following directory.
/var/avenda/datapuller/downloads/config-info/

This location can be altered by modifying the "datacollector.conf" in the config directory.


Windows Installation
Extract the installer into a folder and run "setup.exe". Depending on the architecture of the
system, the application installs either in C:/Program Files" or "C:/Program Files (x86)

The configuration directory of the application is --
    $INSTALL_ROOT/AvendaSystems/ExtDataPuller/etc

After the installation, edit the "datapuller.conf" in the configuration directory to provide
the following details:

CPPM Server IP Address (only one node in the cluster is required)
Administration username   [UI username is typically 'admin']
Administration password

If required create a separate External Data Puller account with Super Administrator
privileges on the Publisher. Restart the "Avenda External Data Puller Client" Service with
in Windows services  

Once the service is up and running, it downloads the configuration dumps from the CPPM
nodes in the cluster and stores them in the following directory.
C:/AvendaSystems/ExtDataPuller/var/downloads/config-info  

This location can be altered by modifying the "datacollector.conf" in the config directory.

Update to Data Puller Feature in CPPM 6.5 [Push rather than Pull mode]
With the release of CPPM 6.5 we added the ability to configure directly within the CPPM
GUI a backup destination. Go to Administration->External Servers-> File Backup
Servers here you can add SCP and SFTP destinations and as part of the nightly-
housekeeping, CPPM will take a backup and save it securely to this remote destination.

Failover Modes
What happens when something goes wrong in a publisher/subscriber deployment?
Publisher Down
Guest/Onboard
If the publisher goes down, prior to changes introduced in CPPM 6.2 the internal proxy
request will fail and a “404” not found error will be displayed for all Guest and Onboard
user-facing pages. This was not the ideal situation. Starting in 6.2 CPPM moved to an API
based approach between the Subscriber and the Publisher for communication specific to
Guest/Onboarding, this change allowed the Subscriber nodes to handle failures between
the SUB/PUB in a much more friendly way.
The Standby-Publisher
Any subscriber within a cluster can be manually promoted to be the active Publisher for
the cluster once the Active Publisher has failed. Sometimes its pertinent that this be a
manual procedure but during the time that a cluster does not have an active Publisher
some functions across the cluster do not exist, e.g. Creation of Guest accounts… the full list
is documented later in this section What do you lose when the Publisher fails?
Now, whilst some customers may be content with having to manually promote a
Subscriber, demand from the field and our customers required that we provide an
automated method to allow for a specific node to auto-promote itself within the cluster
thus ensuring that any service degradation is limited to an absolute minimum.
This feature was introduced in CPPM 6.1 to allow for a Subscriber to AUTO promote itself
from a Standby Subscriber to that of the Active Publisher. Configuration of the Standby
Publisher is completed in the Cluster-Wide Parameters under Administration -> Server
Manager -> Server Configuration -> Cluster-Wide Parameters
Note: Before you can designate a CPPM node as a Designated Publisher, the nodes have to
be clustered. For more information covering the process of cluster operations, see the
section below on Cluster Operation Commands.
Ensure that ‘Enable Publisher Failover’ is set to TRUE, in the ‘Designated Standby
Publisher’ drop down, then select the CPPM node required to operate as the Standby node.
Note: The Standby-Publisher can still perform full Subscriber duties. However in large
deployment, say when over 20 CPPM nodes are deployed the Publisher and Standby-
Publisher might be dedicated nodes and not be performing ANY work beyond cluster
configuration and creating Guest accounts and Onboarding users.
Note: The standby publisher cannot perform publisher functions until it completes its
promotion to that of the active publisher in the cluster.

Note: The default failover timer is set to 10 minutes, 5 minutes being the minimum value
you can select before the standby publisher begins to promote itself to an active state.

Figure 10 - Setting up the Standby Publisher
As can be seen above we have select node cppm182 to be the Standby Publisher. We have
in this test environment left the Failover Timer to its default of 10 minutes.
Note: When a subscriber is configured as a Standby Publisher, there is no additional traffic

sent to this node compared to any of the other ‘normal’ Subscriber in the cluster.
Publisher Failover - L2 or L3?

When we initially introduced the standby-Publisher in CPPM 6.1 we enforced the rule that
the Standby and Active Publishers must be within the same IP Subnet, i.e. L2-broadcast
domain. For certain deployments it was possible to ‘overcome’ this limitation by utilizing a
GRE tunnel to provide for vlan-extension or use some other L2 extension technology like
VPLS to extend the L2-domain over a L3 WAN boundary. Starting within CPPM 6.3, this
restriction was relaxed. When you configure Standby and Active Publishers to be within
separate IP-subnets you are presented with a warning message as shown below.

Figure 11 - Configuring Standby over a L3 connection - WARNING

How the Failover Process works

The Standby Publisher health-checks the Primary every 60 seconds, it makes a SQL call to
the Primary Publishers Database, if this fails then after 10 [default] additional attempts
[one per minute] it begins the process to promoting itself to be the Active Publisher.
Prior to CPPM 6.4.0 the node would ping (ICMP) its default GW to see if this failure was
related to a network issue, if this failed it would not promote its self to an active state. If
this was successful it would then ping (ICMP) the remaining nodes in the cluster and it
would require that at least 50% of the nodes respond else again it would not promote, this
logic tries to account for potential network related issue. However we found that in some
customers the default gateway was a firewall that would not respond to ICMP and the
remote CPPM nodes were protected by firewall policy to limit ICMP over the WAN, so the
net result was that the Standby Publisher would never automatically promote.
Starting in CPPM 6.4.0 the logic was changed in the fail-over processing so that the process
used to verify the reachability of the remote CPPM nodes now uses an outbound HTTPS
call, as mentioned on page 10, you already have 443/tcp opened between nodes and it’s a
fundamental requirement for ‘normal’ CPPM<-> communications. Utilizing this HTTPS
health check provides for a more robust and predictable failover process.
Mitigation strategies for this failure mode:
Ensure that nodes are being monitored – determine if a publisher node is no longer
reachable/providing service, e.g. via SNMP host checking or similar. When a failure is
detected, another subscriber node should be promoted either manually or via the
automated standby-publisher feature to be the active-publisher; other subscribers will
then automatically update and replicate their configuration with the new publisher,
which will resolve the issue.
Use a virtual IP for the publisher – reduces the potential for a prolonged service
outage during the time the active-publisher is down/promoting for some functions.
Use the subscriber auto-promotion capability – reduces potential for a failure but
note that the VIP fails over significantly faster (i.e. 1 second) than a CPPM Standby-
Publisher can promote itself (i.e. 8-9 minutes).
Setup your NAD to point to a primary node, backup node, tertiary, etc. This only covers
you for RADIUS auth/accounting traffic. Until the standby Publisher has transitioned
into an active state features detailed below will not be available.

Note: It is presumed and good practice that when you have a standby-publisher and also
deploy Virtual IP that the standby-publisher will be ‘paired’ with the active-publisher in the
VIP group.

What do you lose when the Publisher fails?

• General CPPM & CPG Configuration changes
• Guest Account creation
• Certificate Revocation List Updates
• Onboarding, Certificate creation and revocation
• AirGroup / MACTrac enrollment
• MDM endpoint Polling and ingestion
• ClearPass Exchange Outbound enforcement

Subscriber Down
If a subscriber node goes down, authentication requests, guest access, and Onboard access
will fail to this node, probably with a timeout error displayed to the client.
Mitigation strategies for this failure mode:
Ensure that nodes are being monitored – determine if a subscriber node is no longer
reachable/providing service, e.g. via SNMP host checking or similar. When a failure is
detected, another subscriber node can be used in its place
Use a virtual IP for the subscriber reduces the potential for a prolonged service
outage during use. For this to work, all places that reference the subscriber must use its
virtual IP address, e.g. captive portal redirection, authentication server configuration,
guest registration URLs, sponsor confirmation emails, etc.
Setup your NAD to point to a primary node, backup node, tertiary, etc. This only covers
you for RADIUS auth/accounting traffic.

Note: Also possible options/recommendations:
- Use load-balancing, please review the CPPM & F5 Load-Balancing TechNote for
additional guidance.

Design Guidelines
A ClearPass deployment using the publisher/subscriber model must satisfy the constraints
described in this section.
Allow HTTP/S Between Publisher and Subscribers

Ensure that any firewalls that are between publisher and subscribers are configured to
permit HTTPS traffic (and HTTP if required), in both directions. Refer to the “Network
Traffic Flows” section above for a list of all protocols and port numbers that must be open.
Allow Database & ‘other’ Traffic Between PUB and SUB’s

Replication and cluster management requires that each node must be able to reach every
other node on the HTTPS and database port (TCP 5432) on the management interface.
Design Guidance: Ensure that any firewalls that are between publisher and subscribers
are configured to permit TCP/5432 traffic, in both directions. Refer to the “Network Traffic
Flows” section above for a list of all protocols and port numbers.
Size The Publisher Node Appropriately

The publisher node should be sized appropriately, as it needs to handle database writes
from all subscribers simultaneously. It must also be capable of handling the number of
endpoints within the cluster and be capable of processing remote work directed to it in the
case of a cluster when Guest account creation and Onboarding are occurring.

If any customer has any concerns about their environment specifically related to heavy
workload on their Publisher/Subscriber then they should only consider the deployment of
an appliance based CPPM cluster.
Design Guidance: In a worldwide large-scale deployment, not all subscriber nodes will be
equally busy. If the traffic pattern (busy hours) can be estimated for each subscriber node,
these can be added together after adjusting for time zone differences to determine the
maximum request rate that must be handled by the publisher node.

Provide Sufficient Bandwidth Between Publisher/Subscribers

The traffic flows between the publisher and subscriber include:
• Basic monitoring of the cluster – is trivial traffic.

• Time synchronization for clustering – standard NTP traffic
• Policy Manager configuration changes – assumed to be infrequent and therefore
not a significant consumer of bandwidth
• Battery multi-master cache – depends on the authentication load and other
details of the deployment; cached information is metadata and is not expected to
be very large; only replicated within the Policy Manager Zone
• Guest/Onboard dynamic content proxy requests – this is a web page, essentially,
and could be reasonably expected to average 100KB
• Guest/Onboard configuration changes – changes to database configuration, sent
as deltas and are reasonably small (in the order of 10KB)
Design Guidance: In a large-scale deployment, reduced bandwidth or high latency

(>200ms) on the link will provide a lower quality user experience (due to BDP for the TCP
data-path, 200ms equates to 2.6Mbps of throughput based upon a 64K window) for all
users of that subscriber, even though static content will be delivered locally and will appear
to be near-instantaneous. For reliable operation of each subscriber, ensure that there is
sufficient bandwidth available for communications with the publisher. For basic auth, we
don’t necessarily have a requirement for high bandwidth, BUT the number of round-trips
to complete an EAP authentication (may be in excess of 10) could add up to an unpopular
amount of time and delay for the end-user.
Bandwidth Usage/Sizing for a CPPM Cluster

To understand the bandwidth usage between nodes we undertook a study to investigate
several load scenarios. For example, we wanted to understand if a node received say 100
auths/sec, either MSCHAPv2 or EAP-TLS with these being the most popular, how much
traffic would this generate across the cluster. And as another example, if we generated 5
Guest accounts per second, how much cluster traffic would this generate.
Replication between nodes in a cluster is carried on three ports, tcp-80, tcp-443 and tcp-
5432. Starting in CPPM 6.5.0 we will expose some new counter with in the Graphite
reporting tool to allow this cluster traffic to be displayed and monitored.
To understand the load on a network we wanted to record the baseline replication between
nodes. So using the CPPM 6.4.0.66263 release we created a four node cluster. Three of the
nodes are within the same IP-Subnet whilst the third sits behind a 10Mb emulated WAN
with 50ms RTT. The CPPM environment has just the basic default configuration.
We recorded via the Graphite tool the data transmitted in a 24-hour period to establish a
bandwidth baseline. (We added the 6.5.0 code to the 6.4.0 build to facilitate this graphing).
Node cppm155 in the below is the Publisher, cppm156, 157, 158 are the Subscribers.

Volumetrics of Cluster in an idle state

PUBLISHER -> SUBSCRIBER The data volumes are shown below in the graph, the raw
details are as follows, we noted the same volumes from the Publisher to the three
Subscriber’s in the cluster.

Figure 12 – Total Data in bytes between Publisher and Subscriber in 24-hour idle period
This turns out to be 145MB of traffic port 443 and 47MB of traffic on port 8432 for a total
of 192MB at an average rate of 2,633 Bytes/second (0.0026MB/second).
You will be able to access these statistics to record the inter-cluster CPPM traffic on the
nodes in graphite from the following interface… https://IP_Address/graphite then navigate
to Graphite-> basic_perf -> [ZONE] -> [Chose the Publisher] -> nw(5432) or http(80)
or https(443)

Figure 13 - Publisher traffic to Subscriber over 24-hours

SUBSCRIBER -> PUBLISHER The data volumes are shown below in the graph, the raw
details are as follows, we noted the same volumes from all three Subscriber’s in the cluster
to the Publisher.

Figure 14 – Total Data in Bytes between Subscriber and Publisher in 24-hour idle period
This turns out to be 83MB of traffic port 443 and 47MB of traffic on port 8432 for a total of
130MB, at an average rate of 1,580 Bytes/second (0.0016 MB/second).
You will be able to access these statistics to record the inter-cluster CPPM traffic on the
nodes in graphite from the following interface… https://IP_Address/graphite then navigate
to Graphite-> basic_perf -> [ZONE] -> [Chose a Subscriber] -> nw(5432) or http(80) or
https(443)

Figure 15 - Subscriber traffic to Publisher over 24-hours
Note: All boxes are in the same zone, default for the above metric.

RADIUS RTT Considerations

Special consideration must also be given to the RTT between the NAD/NAS and the
authenticating ClearPass Node. Below we have provided the results from testing we
undertook to determine the point where the RTT is a significant contributor to the failure
of the Auth. The below test were performed on CPPM 6.4, 10 test for each sample to ensure
a good model of results.
Client OS è Windows 7; Authentication Protocol è EAP-PEAP / EAP-MSCHAPV2
Round Trip Time Iteration Test Result Request Process Time

Test 1 PASS 10 Sec
Test 2 PASS 6 sec
Test 3 PASS 6 sec
Test 4 PASS 8 Sec
600 MS Test 5 PASS 6 Sec
Test 6 PASS 8 Sec
Test 7 PASS 7 Sec
Test 8 PASS 7 Sec
Test 9 PASS 7 Sec
Test 10 PASS 6 sec

Test 1 PASS 11 sec
Test 2 FAIL TIMEOUT
Test 3 PASS 10 Sec
Test 4 PASS 11 sec
1000 MS Test 5 PASS 11 Sec
Test 6 PASS 10 sec
Test 7 PASS 11 Sec
Test 8 PASS 10 Sec
Test 9 PASS 10 Sec
Test 10 PASS 11 Sec

Test 1 FAIL TIMEOUT
Test 2 PASS 16 Sec
Test 3 FAIL TIMEOUT
Test 4 PASS 15 Sec
Test 5 FAIL TIMEOUT
1500 MS
Test 6 FAIL TIMEOUT
Test 7 PASS 16 Sec
Test 8 FAIL TIMEOUT
Test 9 FAIL TIMEOUT
Test 10 Pass 15 Sec

Test 1 FAIL TIMEOUT

Test 2 FAIL TIMEOUT
Test 3 PASS 18 Sec
Test 4 FAIL TIMEOUT
Test 5 FAIL TIMEOUT
2000 MS
Test 6 FAIL TIMEOUT
Test 7 FAIL TIMEOUT
Test 8 FAIL TIMEOUT
Test 9 FAIL TIMEOUT
Test 10 FAIL TIMEOUT

Figure 16 - RADIUS RTT Testing from NAD to CPPM (EAP-PEAP Win7)


Client OS è Windows 8.1; Authentication Protocol è EAP-PEAP / EAP-MSCHAPV2

Round Trip Time Iteration Test Result Request Process Time
Test 1 PASS 6 Sec
Test 2 PASS 11 Sec
Test 3 PASS 7 Sec
Test 4 PASS 6 Sec
Test 5 PASS 6 Sec
600 MS
Test 6 PASS 6 Sec
Test 7 PASS 6 Sec
Test 8 PASS 7 Sec
Test 9 PASS 5 Sec
Test 10 PASS 6 Sec

Test 1 FAIL TIMEOUT
Test 2 PASS 10 Sec
Test 3 PASS 10 Sec
Test 4 PASS 11 Sec
Test 5 PASS 10 Sec
1000 MS
Test 6 PASS 10 Sec
Test 7 PASS 10 Sec
Test 8 PASS 9 Sec
Test 9 PASS 10 Sec
Test 10 PASS 9 Sec

Test 1 PASS 15 Sec
Test 2 FAIL TIMEOUT
Test 3 PASS 14 Sec
Test 4 FAIL TIMEOUT
Test 5 PASS 15 Sec
1500 MS
Test 6 PASS 17 Sec
Test 7 PASS 13 Sec
Test 8 PASS 15 Sec
Test 9 FAIL TIMEOUT
Test 10 Pass 12 Sec

Test 1 PASS 18 Sec
Test 2 FAIL TIMEOUT
Test 3 FAIL TIMEOUT
Test 4 PASS 18 Sec
Test 5 FAIL TIMEOUT
2000 MS
Test 6 PASS 20 Sec
Test 7 FAIL TIMEOUT
Test 8 FAIL TIMEOUT
Test 9 FAIL TIMEOUT
Test 10 FAIL TIMEOUT

Figure 17 - RADIUS RTT Testing from NAD to CPPM (EAP-PEAP Win8.1)


ClearPass Cluster Bandwidth Consumption

Guest
Measurements made against 6.2 give the following approximate traffic flows:
Subscriber -> Publisher: 3 KB per guest registration
Publisher -> Subscriber: 75 KB per guest registration
For database replication of a created guest account:
Publisher -> Subscriber: ~1 KB per guest account
Subscriber -> Publisher: ~0.6 KB per guest account

Insight
For Insight traffic (guest account creation):
Publisher -> Insight node: ~1.6 KB per guest account
Insight -> Publisher: ~1.4 KB per guest account
Subscriber -> Insight node: ~0.5 KB per authentication
Insight -> Subscriber: ~1 KB per authentication


Use Zones for Geographical Regions

CPPM shares a distributed cache of runtime state across all nodes in a cluster, this is
commonly referred to as the Multi-Master-Cache. If zoning has not been configured then
traffic flows from the Publisher <-> Subscriber and also from Subscribers <-> Subscriber.

These runtime states include:
• Roles and Postures of connected entities

• Machine authentication state
• Session info used for COA
• Which endpoints are on which NAS
In a deployment where a cluster spans WAN boundaries and multiple geographic zones, it
is not necessary to share all of this runtime state across all nodes in the cluster. For
example, endpoints present in one geographical area are not likely to authenticate or be
present in another area. It is therefore more efficient from a network usage and processing
perspective to restrict the sharing of such runtime state to a given geographical area.
CPPM uses this runtime state information to make policy decisions across multiple
transactions.
Certain cached information is only replicated within the servers within a Policy Manager
Zone. In a large-scale deployment with multiple different geographical areas, multiple
zones should be used to reduce the amount of data that needs to be replicated over a wide-
area network.
Design Guidance: In a large-scale deployment, create one Policy Manager Zone for each
major geographical area of the deployment. To handle RADIUS authentication traffic in
each region, configure the region’s networking devices with the Policy Manager nodes in
the same Zone.
If additional authentication servers are required for backup reasons, you can specify one or
more Policy Manager servers located in a different Zone, but prefer remote servers that
have the best connection (lowest latency, highest bandwidth, highest reliability).
Note: Zones also effected the operation of the OnGuard Persistent agent, to fully
understand the impact of CPPM Zones on OnGuard, please review the OnGuard Clustering
TechNote found here.

Note: You may have configured the RADIUS server on the Network Infrastructure to use
remote CPPM nodes that are OUTSIDE of their primary geographic area. In this scenario
the replication of the runtime state might be relevant. Consider this behavior during the
design and deployment of a distributed cluster of CPPM nodes.

Use Nearest Subscriber Node

Guests/Onboard clients should be directed to the nearest subscriber node. From the
client’s point of view, the internal API call to the publisher will be handled transparently.
The best response time for static resources will be obtained if the server is nearby.
Design Guidance: In a large-scale deployment, the publisher should not receive any
authentications requests or Guest/Onboard request directly to help reduce the maximum
amount of traffic possible (ignoring API requests from subscribers as well as the outbound
replication traffic to subscribers).
Use Subscriber Nodes As Workers

Subscriber nodes should be used as workers that process:
• Authentication requests (e.g. RADIUS, TACACS+, Web-Auth)

• OCSP requests
• Static content delivery (images, CSS, JavaScript etc.)
Avoid sending this ‘worker’ traffic to the publisher, as it will already be servicing API
requests from subscribers, handling the resulting database writes, and generating
replication changes to send back to the subscribers.
If Onboard is used, ensure that the EAP-TLS authentication method in Policy Manager is
configured to perform “localhost” OCSP checks, as described under “Onboard Certificates
And OCSP”, above.
Design Guidance: In a large-scale deployment, isolate the publisher node, to allow it to

handle the maximum amount of traffic possible.
Use Dedicated Insight Node

Collecting Netevents and updating the Insight database generates a lot of database writes
(insert and update statements) that translates to heavy system IO.
All ClearPass servers, whether physical or virtual, are write-limited when it comes to
database I/O, due to the need to maintain reliability. To understand why, consider that
most database tables will be cached in memory due to the large amount of RAM available,
and will not be read-limited; but database writes are performed to a journal that must be
flushed to disk for reliability reasons.
In a large-scale deployment, the publisher node should already be isolated according to the
advice under “Use Subscriber Nodes As Workers”, above. If the ‘worker traffic’ sent
from the subscriber nodes is expected to fully saturate the capacity of the publisher node,
this would be considered a very large-scale deployment. In this case, Insight should not

be placed (enabled) on the Publisher node. However, if the publisher node has spare
capacity, it can be used to support the Insight Database, but the nodes capacity and
performance should be carefully monitored.
Design Guidance: In a very large-scale deployment, Insight should be placed on its own
dedicated node. This removes a lot of processing and IO from the publisher, allowing it to
handle the maximum amount of worker traffic as possible. Insight data is valuable and
could be used as part of policy evaluation. If this is the case, then there should be redundant
Insight nodes enabled for fault tolerance. On top of that, performance could be impacted if
there is a delay between authenticating CPPM and Insight node.
Insight Setup
Insight must be enabled on at least one node (two nodes is better) within a cluster. Multiple
functions are dependent on Insight for them to function, e.g. MAC caching.
By default Insight is NOT enabled on a node, you MUST manually enable Insight, this is
performed from Administration -> Server Manager -> [node] System -> ‘Enable
Insight’

Figure 18 - Enabling Insight on a CPPM node
Insight can be enabled on multiple nodes with in a Cluster but you need to carefully
consider where you enable Insight. For every node where Insight is enabled, all the other
nodes with in the cluster subscribe through a process called ‘NetEvents’ to send data to
this/these Insight Database’s. The amount of data sent can be extremely high, so guidance
from a ClearPass specialist is recommended when considering this part of a cluster
deployment.
Insight does NOT replicate data to any other nodes within the cluster, it is an entirely
standalone Database.
When you configure reporting on a node the reporting configuration is isolated to this
individual node. In the above diagram you see a setting called Insight Master, this allows
other nodes where Insight has been enabled to subscribe to this node’s Insight Report
configuration. In the event that this node fails, the reports will still be produced as the
Database the reports are generated against will be similar on other nodes in the cluster, not

because the Insight Database has been replicated but because the nodes in the cluster all
send a copy of their ‘NetEvents’ to all nodes that have Insight enabled.
Note: If you are at a remote site with a local CPPM and this node points to a remote Insight
node, you cannot authenticate users if your policy includes querying Insight as an
authorization source and the WAN link is down.
Insight Resilience
As we mentioned above Insight can be enabled on multiple nodes within your cluster, this
then provides for a level of Insight resiliency. If you use Insight for Authorization within
your cluster where you enable Insight is an important design consideration. Also consider
that MAC caching (important part of a ClearPass Guest workflow) requires that Insight is
enabled on at least a single node.

Figure 19 - Insight resiliency across cluster nodes
As you enable Insight on additional nodes in the cluster, CPPM automatically adds these
nodes to the Insight Database authentication source definition… and provides the ability to
set the Backup server priority when you have more than three nodes enabled for Insight as
shown above.
Whenever an Insight enabled node is dropped from cluster, the corresponding node entry
in Insight repository gets removed.

When an Insight enabled node in a cluster is down / out of sync for more than 30mins, the
insight node is moved to be the last Insight node in the fall-back list. The allows for fail-
though to other Insight nodes, on the chance that if all other nodes have also failed its likely
a major network outage.


Figure 20 - Enabling Insight on multiple nodes and defining the Backup source order
Note: Our guidance around enabling Insight is that if you are running a CPPM network that
we consider large and the worker traffic is not consuming all the Publishers resources then
Insight can be enabled on the dedicated Publisher and the standby-Publisher. If you have a
CPPM network that is considered very-large, where the worker traffic will consume the
Publishers resources, then Insight could still be enabled on the dedicated Publisher and the
standby-Publisher but these nodes should be dedicated to cluster duties, i.e. the
Publisher and standby-Publisher should not be performing any authentications.

Cluster Wide Parameters config settings

Auto backup settings should be set to “None” or “Config"
Session log details retention – 3 days
Known endpoint cleanup interval – review and setup if appropriate. Depends on the nature
of the deployment.
Unknown endpoint cleanup interval – recommend that this be enabled. We suggest 7 as a
default.
Expired guest account cleanup interval – review and set value depending on the nature of
deployment. We suggest 30 days.
Profiled Unknown endpoint cleanup interval – we suggest 7 as the default.
Audit records cleanup interval – 7 days
Configure Alert Notification email/SMS.
Insight data retention – 30 days

High Capacity Guest

Starting with CPPM 6.4.0 we provided the ability to set a CPPM node to run in ‘High
Capacity Guest’ (HCG) mode targeted for Public Facing Enterprises environments. This
mode provides the ability for a node to support double the number of Guest account
regardless of whether it’s an appliance or a virtual machine.
Enabling HCG
The option to enable this mode is performed from Administration-> Server Manager->
Server Configuration -> Cluster-Wide Parameters ->Mode

Figure 21 - Enabling HCG mode

Supported number of Users in HCG

So a single CP-xx-500 can support 1,000 Guests, a single CP-xx-5K will support up to 10,000
Guest and a single CP-xx-25K will support up to 50K Guest.
HCG Cluster
When nodes are enabled for this mode they can ONLY be clustered with nodes also in HCG
mode. Adding a HCG node to an existing ‘non-HCG’ cluster will result in a failure.

HCG - Other related information (licensing/disabled features)

So when enabling HCG on a 5K node for example, it provides the ability for that node to
register up to 10K Guest users. But the licenses have to be purchased and applied, we don’t
allow a gratuitous 2:1 licensing for these users when HCG mode is active. An additional
consideration you should remember when a ClearPass node is deployed in HCG mode, is
that the ClearPass Policy AAA licensing is reset on a daily bases bring it inline with the
ClearPass Guest licensing. So if on a 5K node you purchase 8K Guest licenses, this would
entitle you to process 8K unique endpoint/guests per day.
In allowing double the number of licensed guest users we have disabled some of the other
features on ClearPass. Below is a list of the restrictions for HCG mode……
• ClearPass Onboard is disabled

• ClearPass OnGuard is disabled
• You cannot perform posture checks on endpoints
• You cannot perform Audit checks on endpoints
• The Service templates to configure 802.1X for both wired and wireless are disabled
• A number of EAP methods are disabled, FAST, GTC, MSCHAPv2, PEAP, TLS, TTLS
Below is the warning message you are presented with when enabling HCG mode, this
explains the features that are disabled when enabling HCG mode.

Figure 22 – Warning message when enabling HCG mode.
Note: EAP-PPSK is still enabled. EAP-PPSK, private or personal PPSK is a new

authentication method we added also in our CPPM 6.4.0 release. This is specifically well
suited to a CPPM node running in HCG mode. It simplifies the ability for the deployment of
a Guest network that is ‘open’ in that the user-id and password are the same for each user
but secure as each Guest/Endpoint uses a unique per-endpoint WPA pre-shared key.
The client doesn’t need to support anything more than WPA-PSK. The “PPSK” magic is all
on the network side.

Cluster Operations
A cluster exists when two or more CPPM nodes are logically ‘joined’ together so that they
can distribute processing of Authentications/Onboarding etc. across multiple nodes.
The process to join a node to another node to make a cluster or to join a new node to an
existing cluster can be performed in the GUI or from within the CLI. The function to change
a node from a Publisher to a Subscriber (because we only have a single active Publisher in a
cluster) is always performed on the node that is going to be changed.
Making a node a Subscriber from the GUI

The procedure from the GUI is performed from the Administration -> Server Manager ->
Server Configuration -> [Make Subscriber]

Figure 23 - Make Subscriber on GUI
In the above, we are about to make the node cppm183 a subscriber. We point it to the
clusters Publisher 10.2.102.181 and have entered the Publishers password. The Publishers
password is the same as the appadmin password. During the downgrade of a node to a
Subscriber the below represent the messages you’d expect to see during this process, and a
final message of ‘Make subscriber complete…’.


Prior to the CPPM 6.4 release where we optimized this process, the process of adding
nodes to the WAN can appear to be taking a long time, this is explained later in the Cluster
Upgrade Section. What is actually happening in the background is the ConfigDB is being
replicated. If you look at the Dashboard on the Publisher you will see the status for the new
node, ‘Sync in Progess’, is shown in the Dashboard Cluster Status widget.

Figure 24 - Sync in Progress for new cluster node
You can also track this process in the Event Viewer following a successful addition is the
below message.

Figure 25 - Event Viewer success message after new cluster node added
Timings to add a CPPM Node to a cluster – Timings

The below data is based on CPPM 5K hardware and for adding a node where the Publisher
has no endpoints, i.e. it’s a clean default configuration.
Test1 - Local-LAN 1GB – 140-150 seconds

Test2 - WAN 2MB with 100ms RTT – 260- 280 seconds
Test3- WAN 10MB with 100ms RTT 250-275 seconds

Note: The time for Test3 above is similar due to TCP BDP.

Making a node a Subscriber from the CLI

The process to make a node a Subscriber from the CLI is also fairly simple. You need to
login to the CLI with the appadmin userid. Multiple cluster related administrative
functions can be performed from here and these provide additional functionality over what
can be accomplished from the GUI.

Use the command ‘cluster make-subscriber –I [publisher ip_address]’ (other switches

are possible as shown below) to add a standalone Publisher to a cluster and make it a
Subscriber.
[[email protected]]# cluster make-subscriber

Usage:
make-subscriber -i <IP Address> [-l] [-b]
-i <IP Address> -- Publisher IP Address

-l -- Restore the local log database after this operation
-b -- skip generating a backup before this operation
After entering the IP address of the Publisher you’ll see a suitable warning message about
the action you’re about to perform. After confirming you want to continue you have to
enter the password for the Publisher, this is the cluster password, which will be the
appadmin password.
See below for a view of the process and the typical messages you will see in the CLI when
adding a node to the cluster.


Figure 26 - Setting up cluster from CLI
Then the process to downgrade the node to a Subscriber begins. It takes a while as there
has to be a sync of the ConfigDB between the nodes and especially if this is performed over
a WAN the process can take a while. Some timings were shown above.

Figure 27 - Checking on cluster progress from CLI

Cluster Administration
Managing the cluster is straightforward and typically requires little involvement. However
at times problems or issues can occur with the cluster which will may require some
operational involvement. In the event that a node has lost communication with the cluster
for a period greater than 24-hours the node will be marked as down by the Publisher. To
re-join this node to the cluster requires that the node is removed from the cluster on the
Publisher and the configuration on the out-of-sync node reset.
Removing the Subscriber from the cluster can be accomplished in the GUI or the CLI.
In the GUI under the Administration -> Server Manager -> Server Configuration ->
[Select_CPPM_Node] - > Drop Subscriber

Figure 28 - Dropping Subscriber from Publisher
You have to confirm the action to drop the Subscriber from the cluster.

Figure 29 - Drop Subscriber confirmation message
Following the confirmation message above, there are a couple of additional settings, you
can select if the Database on the node you are about to drop is to be cleared and reset and
also if you want the Database on the local node (Publisher) to be backed up before you
begin this cluster maintenance.


Figure 30 - Drop Subscriber confirmation options
Because the CPPM node has been classified as ‘bad’ by the Publisher which ‘owns’ the
status/health of the cluster its also likely you will have to perform some intervention on
the Subscriber that requires resetting. In the CLI use the command cluster reset-database
command to reset the nodes configuration back to a default state, except that is for the IP
addressing and the appadmin password. Following this reset, reboot the node to keep the
process clean, then add the node back to the cluster as describer previously.

Figure 31 - Resetting a CPPM node configuration

Cluster Upgrades
Following the release of a new patch or upgrade version of the CPPM software it’s highly
desirable to upgrade the CPPM nodes in the cluster. Whilst we are not going to discuss the
installation process, I want to discuss and guide you regarding the best way to upgrade a
cluster and the considerations to be aware off.
In short the recommendation is to upgrade the Publisher first, ensure that this is FULLY
complete and then upgrade the subscribers in a serial process. Starting in the CPPM 6.4
software release the process of adding and upgrading nodes has been significantly
improved. We have streamlined the process in several ways to improve the upgrade
process. When you download the new software and install this software the unused
partition/file-system is where the new version is installed and a copy of the Configuration
Database is placed. When you reboot the s/w installation is completed and the remaining
databases are copied and if required migrated if new Database schemas changes have been
introduced in this new code release. The installation time is dependent on the size of the
Database’s which will be directly related to the number of endpoints etc.
Whilst this portion of the upgrade is happening the remaining subscribers can still
continue to process authentications etc. But no new Guest or Onboarding can occur as
documented in the section What do you lose when the Publisher fails?
Following the upgrade of the Publisher, you need to upgrade the Subscribers in a serial
process ensuring that the upgrade has completed before starting the next upgrade. Why
you ask? Well, during the upgrade process the Database on the Publisher is locked. This
means several things, one that you cannot make changes to the configuration on the
Publisher, again you cannot create new Guest accounts or Onboard devices. This ‘locking’ of
the Publishers Database has been significantly streamlined in CPPM 6.4, we only lock the
configuration Database for the time it takes to generate a dump of the publisher’s config
Database. During the Subscriber upgrade we used to copy a lot of data in a serial process,
now we have optimized a bulk transfer of the Data from the Publisher to the Subscriber.
This allows for the Publishers Database to be released significantly quicker and allows for
the next Subscriber to be added.
Below is a copy of the messages that we now post when you add the node via the CLI, you
can see some of the improvement from the below… specifically what I’ve highlighted is the
locking/backup/release process as I’ve explained above.
Note: The key message below is the ‘ Config database lock released’, this is the point
where you can begin to add another subscriber to the cluster.
Setting up local machine as a subscriber to 10.2.100.155

INFO - Local checks before adding subscriber passed
INFO - 10.2.100.155: - Subscriber node added successfully for
host=cppm-158.ns-tme.com
INFO - Subscriber node entry added in publisher

INFO - Backup databases for AppPlatform

INFO - Backup databases for PolicyManager
INFO - Stopping services
INFO - Dropped existing databases for Policy Manager
INFO - Create database and schema for Policy Manager
INFO - Local database setup done for Policy Manager databases
INFO - Subscriber password changed
INFO - Syncing up initial data...
INFO - Config database temporarily locked for updates
INFO - 10.2.100.155: - Backup databases for AppPlatform
INFO - 10.2.100.155: - Backup databases for PolicyManager
INFO - Config database lock released
INFO - Subscriber now replicating from publisher 10.2.100.155
INFO - Retaining local node certificate
INFO - Subscriber replication and node setup complete
INFO - Notify publisher that adding subscriber is complete
INFO - Subscriber added successfully
INFO - Restarting Policy Manager admin server
Figure 32 - CLI messages for adding a node
As we’ve just explained the upgrading of the subscribers must be completed as soon as
possible after the Publisher has been upgraded as its likely that following the upgrade of
the Publisher, the Subscribers will be out of sync with the Publisher. Nodes that are out of
Sync with the Publisher will not be able to receive changes made to the clusters
configuration, be that new or amended service policies or new Guest Accounts. The best
way to see if the upgrade has completed is to ensure the message below is seen in the Event
Viewer on the Publisher or that the above messages as observed in the CLI.

Figure 33 - Confirmation that an upgrade is completed
Depending on the type of software upgrade you are doing on the Publisher it is possible
that the Subscribers will not go out of sync. Either way the recommendation is to upgrade
the remaining nodes within the cluster ASAP.

What follows are some other additional good practice processes that are valid but not
absolutely necessary.
Stopping the RADIUS server on the node before you begin the upgrade. This allows for a
clean take-down of the node and no NAS devices will send authentications to it expecting a
response. If this is completed a couple of minutes before the upgrade begins the NAS
devices should have marked this RADIUS server unavailable.
Also, we recommend disabling auto-backup and standby publisher setting needs to be

disabled as well prior to starting an s/w upgrade. Below is taken from the CPPM User
Guide.
Select any of the following auto backup configuration options:

Off - Select this to not perform periodic backups.
Note: Select Off before upgrading ClearPass Policy Manager to avoid the interference between
Auto backup and migration process.
Config - Perform a periodic backup of the configuration database only. This is the default
auto backup configuration option.
Config|SessionInfo - Perform a backup of the configuration database and the session log
database.

Cluster Upgrade Tool
We have recently made available a cluster upgrade patch that will simplify the upgrading of
large CPPM multi-node clusters. The tool was written to take advantage of some of the
changes we made in the underlying 6.4.0 code-release. Some of these are discussed in the
section above that relate to the processes used when adding nodes to CPPM clusters. The
tool is available for CPPM versions 6.3 and 6.2. It automates a vast amount of the tasks
required to upgrade nodes, e.g. it will download the upgrade image to the central Publisher
and then push the code update as required to the end-nodes. The tool is released as a patch
update for ClearPass 6.2 and 6.3 versions. It can be downloaded and installed either
through CPPM’s Software Updates portal, or from the Aruba Support portal. Once the tool is
installed you can access the tool at https://[YOUR_PUBLISHER_IP]/upgrade
There is a special TechNote that covers the cluster upgrade tool in detail. It can be located
here

Scaling Limitations
Different components of a ClearPass deployment will scale differently, due to the design of
the publisher/subscriber model. Certain components are listed below with the limits to
scaling identified.
Authentication capacity: scales linearly in the number of subscriber nodes. Add more
nodes to provide additional capacity to service authentication requests.
Logging capacity: scales linearly in the number of subscriber nodes, as each node handles
its own logging.
Insight reports: does not scale with additional nodes as it is centralized. Use a separate
Insight node sufficient to handle the incoming Netevents traffic from all nodes in the
cluster. The publisher node should not be used as the Insight reporting node in a very
large-scale deployment.
Configuration changes (Policy Manager): these are assumed to be infrequent and

therefore are not a significant limit to scaling, as the total size of the configuration set will
be bounded.
Replication load on publisher: scales linearly in the number of subscriber nodes. The
replication is assumed to be relatively efficient as only deltas are sent.
Configuration changes (Guest/Onboard): does not scale with additional nodes as it is

centralized. Requires the publisher be scaled to support write traffic from the maximum
number of subscribers that would be active concurrently.

Virtual IP Considerations
Using a Virtual IP address allows for the deployment of a highly available pair of servers.
This is intended to reduce the amount of downtime in the event of a server failure: if one of
the servers in a HA pair fails, the other server can take over the virtual IP address and
continue providing service to clients. Particular useful if the NAS devices are trying to
process basic RADIUS authentications to a CPPM node.
However this does not eliminate the failure modes described above. Consider the case
where the publisher node that currently has the virtual IP address fails. The backup
publisher node cannot take over immediately (in the sense of it creating Guest accounts
etc,) as the failure may be transient and the minimum time it takes for a standby-Publisher
to become active is about 8 minutes, this duration is made up of 5 minutes (5 attempts) to
connect to the active-Publisher’s Database then about 3-4 minutes for the node to promote
itself in to an active state. There will always be a delay before the virtual IP address is back
in service, in the sense that the IP address NAS clients are communicating with is able to
process more than basic Database read actions i.e. RADIUS authentication. During this
window, requests from subscribers to write to the Publishers Database will fail as there
will be no publisher responding to the virtual IP address than can write to the Database.

Clearpass 6.X: Tech Note: Clearpass Clustering Design Guidelines

Uploaded by

Copyright:

Available Formats

Clearpass 6.X: Tech Note: Clearpass Clustering Design Guidelines

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Clearpass 6.X: Tech Note: Clearpass Clustering Design Guidelines

Uploaded by

Copyright:

Available Formats

ClearPass 6.

Clustering Design Guidelines

0.1 2013 David Wilson Early Draft Version

0.2 August 2014 Danny Jump Published Draft (#10119)

1.0 November 2014 Danny Jump Published Version 1

1.2 April 2017 Danny Jump Minor update.

Table of Figures ............................................................................................................................................... 5

Open Source Code ........................................................................................................................................... 6

Notes on this Version of this Document ..................................................................................................... 7

V1 – October 2014 .......................................................................................................................................... 7

Clustering Overview ................................................................................................................................................ 8

WAN Considerations L2/L3 ....................................................................................................................... 8

Campus Considerations L2/L3 .................................................................................................................. 8

ClearPass Databases ........................................................................................................................................... 8

Publisher/Subscriber Model ........................................................................................................................... 9

What Is Replicated? ..................................................................................................................................... 10

What Is A Large-Scale Deployment? ......................................................................................................... 10

Clustering Example 1.................................................................................................................................. 11

Clustering Example 2.................................................................................................................................. 12

Network Traffic Flows .................................................................................................................................... 13

Cluster-wide replication ................................................................................................................................ 13

Handling Authentication Requests ............................................................................................................ 13

Optimizing Authentication processing for a MSFT AD domain .................................................... 14

Internal API For Dynamic Content Creation (Guest/Onboard) .................................................... 15

Onboard Certificates And OCSP .................................................................................................................. 16

Aruba, a HP Enterprise Company 2

OCSP Recommendations ........................................................................................................................... 18

Load Balancing ................................................................................................................................................... 18

Auto Backup Collector aka “Data Puller” ................................................................................................ 19

Linux Installation ......................................................................................................................................... 20

Windows Installation ................................................................................................................................. 21

Failover Modes ........................................................................................................................................................ 22

Publisher Down ................................................................................................................................................. 22

The Standby-Publisher .............................................................................................................................. 22

Publisher Failover - L2 or L3? ................................................................................................................ 23

How the Failover Process works ........................................................................................................... 24

What do you lose when the Publisher fails? ..................................................................................... 25

Subscriber Down ............................................................................................................................................... 25

Design Guidelines .................................................................................................................................................. 26

Allow HTTP/S Between Publisher and Subscribers .......................................................................... 26

Size The Publisher Node Appropriately .................................................................................................. 26

Provide Sufficient Bandwidth Between Publisher/Subscribers .................................................. 27

Bandwidth Usage/Sizing for a CPPM Cluster ................................................................................... 27

Volumetrics of Cluster in an idle state ................................................................................................ 28

RADIUS RTT Considerations ............................................................................................................................. 30

ClearPass Cluster Bandwidth Consumption .............................................................................................. 33

Aruba, a HP Enterprise Company 3

Use Zones for Geographical Regions ........................................................................................................ 34

Use Nearest Subscriber Node ...................................................................................................................... 35

Use Subscriber Nodes As Workers ............................................................................................................ 35

Use Dedicated Insight Node ......................................................................................................................... 35

Insight Setup ....................................................................................................................................................... 36

Insight Resilience ......................................................................................................................................... 37

Cluster Wide Parameters config settings .......................................................................................... 38

High Capacity Guest .............................................................................................................................................. 39

Enabling HCG ................................................................................................................................................. 39

Supported number of Users in HCG ..................................................................................................... 39

HCG Cluster ..................................................................................................................................................... 39

HCG - Other related information (licensing/disabled features) .............................................. 40

The configuration directory of the application is --   /usr/local/avenda/datapuller/etc