The Nutanix Bible: by Steven Poitras
The Nutanix Bible: by Steven Poitras
The Nutanix Bible: by Steven Poitras
by Steven Poitras
1
The Nutanix Bible
Table Of Contents
Foreword
Introduction
2
2.4.2 Hypervisor Upgrade ................................................................................................25
2.4.3 Cluster Expansion (add node) .............................................................................27
2.4.4 Capacity Planning .....................................................................................................31
3
The Nutanix Bible
4
4.3.2 Command Reference .............................................................................................152
4.3.3 Metrics and Thresholds .........................................................................................154
4.3.4 Troubleshooting & Advanced Administration ..............................................154 Servi
Nuta
Part V - Book of vSphere ..................................................................................................................................155
OS C
5.1 Architecture ...............................................................................................................................155
5.1.1 Node Architecture ..................................................................................................155 Block
5.1.2 Configuration Maximums and Scalability ......................................................155 File S
5.1.3 Networking ................................................................................................................155
Cont
Copyright (c) 2016: The Nutanix Bible and NutanixBible.com, 2016. Unauthorized use and/or duplication of this material
without express and written permission from this blogs author and/or owner is strictly prohibited. Excerpts and links may
be used, provided that full and clear credit is given to Steven Poitras and NutanixBible.com with appropriate and specific
direction to the original content.
5
Foreword The Nutanix Bible
6
This ever-improving artifact, beyond being authoritative,
is also enjoying wide readership across the world. Architects,
managers, and CIOs alike, have stopped me in conference hallways
to talk about how refreshingly lucid the writing style is, with some
painfully detailed illustrations, visio diagrams, and pictorials. Steve
has taken time to tell the web-scale story, without taking shortcuts.
Democratizing our distributed architecture was not going to be
easy in a world where most IT practitioners have been buried in
dealing with the urgent. The Bible bridges the gap between IT
and DevOps, because it attempts to explain computer science and
software engineering trade-offs in very simple terms. We hope that
in the coming 3-5 years, IT will speak a language that helps them
get closer to the DevOps web-scale jargon.
Keep us honest.
Dheeraj Pandey
CEO, Nutanix
7
Foreword The Nutanix Bible
Stuart Miniman
Stuart Miniman
Principal Research Contributor, Wikibon
8
Introduction
Steven Poitras
NOTE: What you see here is an under the covers look at how things
work. With that said, all topics discussed are abstracted by Nutanix
and knowledge isnt required to successfully operate a Nutanix
environment!
Enjoy!
Steven Poitras
Principal Solutions Architect, Nutanix
9
The Nutanix Bible
10
A Brief Lesson in History
A Brief Lesson in
History
PART I
A brief look at the history of infrastructure and what has led us to where we are today.
11
The Nutanix Bible
12
A Brief Lesson in History
This shift forces IT to act more as a legitimate service provider to their end-users (company
employees).
13
The Nutanix Bible
14
A Brief Lesson in History
This section will present some of the core concepts behind Web-scale infrastructure and
why we leverage them. Before I get started, I just wanted to clearly state the Web-scale
doesnt mean you need to be web-scale (e.g. Google, Facebook, or Microsoft). These
constructs are applicable and beneficial at any scale (3-nodes or thousands of nodes).
Historical challenges included:
Complexity, complexity, complexity
Desire for incremental based growth
15
The Nutanix Bible
1.3.1 Hyper-Convergence
There are differing opinions on what hyper-convergence actually is. It also varies based on
the scope of components (e.g. virtualization, networking, etc.). However, the core concept
comes down to the following: natively combining two or more components into a single
unit. Natively is the key word here. In order to be the most effective, the components must
be natively integrated and not just bundled together. In the case of Nutanix, we natively
converge compute + storage to form a single node used in our appliance. For others, this
might be converging storage with the network, etc.
What it really means:
Natively integrating two or more components into a single unit which can be easily scaled
Benefits include:
Single unit to scale
Localized I/O
Eliminates traditional compute / storage silos by converging them
16
A Brief Lesson in History
These distributed systems are designed to accommodate and remediate failure, to form
something that is self-healing and autonomous. In the event of a component failure, the system
will transparently handle and remediate the failure, continuing to operate as expected. Alerting
will make the user aware, but rather than being a critical time-sensitive item, any remediation
(e.g. replace a failed node) can be done on the admins schedule. Another way to put it is fail
in-place (rebuild without replace) For items where a master is needed an election process is
utilized, in the event this master fails a new master is elected. To distribute the processing of
tasks MapReduce concepts are leveraged.
What it really means:
Distributing roles and responsibilities to all nodes within the system
Utilizing concepts like MapReduce to perform distributed processing of tasks
Using an election process in the case where a master is needed
Benefits include:
Eliminates any single points of failure (SPOF)
Distributes workload to eliminate any bottlenecks
17
The Nutanix Bible
Book of
Prism
PART II
2.2 Architecture
Prism is a distributed resource management platform which allows users to manage and
monitor objects and services across their Nutanix environment.
These capabilities are broken down into two key categories:
Interfaces
HTML5 UI, REST API, CLI, PowerShell CMDlets, etc.
Management
Policy definition and compliance, service design and status, analytics and monitoring
The figure highlights an image illustrating the conceptual nature of Prism as part of the
Nutanix platform:
18
Book of Prism
Pro tip
For larger or distributed deployments (e.g. more than one cluster or multiple sites)
it is recommended to use Prism Central to simplify operations and provide a single
management UI for all clusters / sites.
Prism ports
Prism listens on ports 80 and 9440, if HTTP traffic comes in on port 80 it is
redirected to HTTPS on port 9440.
When using the cluster external IP (recommended), it will always be hosted by the current
Prism Leader. In the event of a Prism Leader failure the cluster IP will be assumed by the
newly elected Prism Leader and a gratuitous ARP (gARP) will be used to clean any stale
ARP cache entries. In this scenario any time the cluster IP is used to access Prism, no
redirection is necessary as that will already be the Prism Leader.
Pro tip
You can determine the current Prism leader by running curl localhost:2019/prism/
leader on any CVM.
2.3 Navigation
Prism is fairly straight forward and simple to use, however well cover some of the main
pages and basic usage.
Prism Central (if deployed) can be accessed using the IP address specified during configuration
or corresponding DNS entry. Prism Element can be accessed via Prism Central (by clicking on a
specific cluster) or by navigating to any Nutanix CVM or cluster IP (preferred).
Once the page has been loaded you will be greeted with the Login page where you will use
your Prism or Active Directory credentials to login.
Upon successful login you will be sent to the dashboard page which will provide overview
information for managed cluster(s) in Prism Central or the local cluster in Prism Element.
Prism Central and Prism Element will be covered in more detail in the following sections.
20
Book of Prism
Analysis Page
Detailed performance analysis for cluster and managed objects with event correlation
Alerts
Environment wide alerts
The figure shows a sample Prism Central dashboard where multiple clusters can be
monitored / managed:
Pro tip
If everything is green, go back to doing something else :)
21
The Nutanix Bible
The home page will provide detailed information on alerts, service status, capacity,
performance, tasks, and much more. To get further information on any of them you can click
on the item of interest.
The figure shows a sample Prism Element dashboard where local cluster details are displayed:
Keyboard Shortcuts
Accessibility and ease of use is a very critical construct in Prism. To simplify things
for the end-user a set of shortcuts have been added to allow users to do everything
from their keyboard.
The following characterizes some of the key shortcuts:
Change view (page context aware):
O - Overview View
D - Diagram View
T - Table View
Activities and Events:
A - Alerts
P - Tasks
Drop down and Menus (Navigate selection using arrow keys):
M - Menu drop-down
S - Settings (gear icon)
F - Search bar
U - User drop down
H - Help
22
Book of Prism
After the software is loaded click on Upgrade to start the upgrade process:
24
Book of Prism
Note
Your Prism session will briefly disconnect during the upgrade when the current Prism
Leader is upgraded. All VMs and services running remain unaffected.
Similar to the rolling nature of the Nutanix software upgrades, each host will be upgraded in
a rolling manner with zero impact to running VMs. VMs will be live-migrated off the current
host, the host will be upgraded, and then rebooted. This process will iterate through each
host until all hosts in the cluster are upgraded.
Pro tip
You can also get cluster wide upgrade status from any Nutanix CVM by running host_
upgrade --status. The detailed per host status is logged to ~/data/logs/host_upgrade.
out on each CVM.
Once the upgrade is complete youll see an updated status and have access to all of the new
features:
27
The Nutanix Bible
28
Book of Prism
29
The Nutanix Bible
After the upload is completed you can click on Expand Cluster to being the imaging and
expansion process:
After the imaging and add node process has been completed youll see the updated cluster
size and resources:
31
The Nutanix Bible
This view provides detailed information on cluster runway and identifies the most
constrained resource (limiting resource). You can also get detailed information on what the
top consumers are as well as some potential options to clean up additional capacity or ideal
node types for cluster expansion.
32
Book of Prism
2.5.1 ACLI
The Acropolis CLI (ACLI) is the CLI for managing the Acropolis portion of the Nutanix
product. These capabilities were enabled in releases after 4.1.2.
NOTE: All of these actions can be performed via the HTML5 GUI and REST API.
I just use these commands as part of my scripting to automate tasks.
Acli
33
The Nutanix Bible
OR
Description: Execute ACLI command via Linux shell
ACLI <Command>
Acli o json
List hosts
Description: Lists Acropolis nodes in the cluster.
host.list
Create network
Description: Create network based on VLAN
List network(s)
Description: List networks
net.list
Note: .254 is reserved and used by the Acropolis DHCP server if an address for the
Acropolis DHCP server wasnt set during network creation
34
Book of Prism
vm.disk_create <VM NAME> create_size=<Size and qualifier, e.g. 500G> container=<CONTAINER NAME>
Add NIC to VM
Description: Create and add NIC
35
The Nutanix Bible
Steps:
1. Upload ISOs to container
2. Enable whitelist for client IPs
3. Upload ISOs to share
Power on VM(s)
Description: Power on VM(s)
Example: vm.on *
2.5.2 NCLI
NOTE: All of these actions can be performed via the HTML5 GUI and REST API.
I just use these commands as part of my scripting to automate tasks.
36
Book of Prism
ncli sp ls
List containers
Description: Displays the existing containers
ncli ctr ls
Create container
Description: Creates a new container
List VMs
Description: Displays the existing VMs
ncli vm ls
37
The Nutanix Bible
38
Book of Prism
# Node status
ncli cluster get-domain-fault-tolerance-status type=node
# Block status
ncli cluster get-domain-fault-tolerance-status type=rackable_unit
The below will cover the Nutanix PowerShell CMDlets, how to use them and some general
background on Windows PowerShell.
Basics
Windows PowerShell is a powerful shell (hence the name ;P) and scripting language built
on the .NET framework. It is a very simple to use language and is built to be intuitive and
interactive. Within PowerShell there are a few key constructs/Items:
CMDlets
CMDlets are commands or .NET classes which perform a particular operation. They are
usually conformed to the Getter/Setter methodology and typically use a <Verb>-<Noun>
based structure. For example: Get-Process, Set-Partition, etc.
Piping or Pipelining
Piping is an important construct in PowerShell (similar to its use in Linux) and can greatly
simplify things when used correctly. With piping youre essentially taking the output of
one section of the pipeline and using that as input to the next section of the pipeline. The
pipeline can be as long as required (assuming there remains output which is being fed to
the next section of the pipe). A very simple example could be getting the current processes,
finding those that match a particular trait or filter and then sorting them:
# Do something
}
39
The Nutanix Bible
Variable
$myVariable = foo
Note: You can also set a variable to the output of a series or pipeline of commands:
In this example the commands inside the parentheses will be evaluated first then variable
will be the outcome of that.
Array
$myArray = @(Value,Value)
Note: You can also have an array of arrays, hash tables or custom objects
Hash Table
Useful commands
Get the help content for a particular CMDlet (similar to a man page in Linux)
Interactive
Get-NTNXVDisk
41
The Nutanix Bible
Interactive
Get-NTNXContainer
Interactive
Get-NTNXProtectionDomain
Interactive
Get-NTNXProtectionDomainConsistencyGroup
2.6 Integrations
2.6.1 OpenStack
OpenStack is an open source platform for managing and building clouds. It is primarily
broken into the front-end (dashboard and API) and infrastructure services (compute,
storage, etc.).
The OpenStack and Nutanix solution is composed of a few main components:
OpenStack Controller (OSC)
An existing, or newly provisioned VM or host hosting the OpenStack UI, API and
services. Handles all OpenStack API calls. In an Acropolis OVM deployment this can
be co-located with the Acropolis OpenStack Drivers.
42
Book of Prism
43
The Nutanix Bible
The table shows the core OpenStack components and role mapping:
The figure shows a more detailed view of the OpenStack components and communication:
In the following sections we will go through some of the main OpenStack components and
how they are integrated into the Nutanix platform.
Nova
Nova is the compute engine and scheduler for the OpenStack platform. In the Nutanix
OpenStack solution each Acropolis OVM acts as a compute host and every Acropolis
Cluster will act as a single hypervisor host eligible for scheduling OpenStack instances. The
Acropolis OVM runs the Nova-compute service.
44
Book of Prism
You can view the Nova services using the OpenStack portal under Admin->System-
>System Information->Compute Services.
45
The Nutanix Bible
As you can see from the previous image the full cluster resources are seen in a single
hypervisor host.
Swift
Swift in an object store used to store and retrieve files. This is currently only leveraged for
backup / restore of snapshots and images.
Cinder
Cinder is OpenStacks volume component for exposing iSCSI targets. Cinder leverages
the Acropolis Volumes API in the Nutanix solution. These volumes are attached to the
instance(s) directly as block devies (as compared to in-guest).
You can view the Cinder services using the OpenStack portal under Admin->System-
>System Information->Block Storage Services.
46
Book of Prism
hosted on the Nutanix platform, they will be published to the OpenStack controller via
Glance on the OVM. In cases where the Image Repo exists only on an external source, Glance
will be hosted by the OpenStack Controller and the Image Cache will be leveraged on the
Acropolis Cluster(s).
Glance is enabled on a per-cluster basis and will always exist with the Image Repo. When
Glance is enabled on multiple clusters the Image Repo will span those clusters and images
created via the OpenStack Portal will be propogated to all clusters running Glance. Those
clusters not hosting Glance will cache the images locally using the Image Cache.
Pro tip
For larger deployments Glance should run on at least two Acropolis Clusters per
site. This will provide Image Repo HA in the case of a cluster outage and ensure the
images will always be available when not in the Image Cache.
When external sources host the Image Repo / Glance, Nova will be responsible for handling
data movement from the external source to the target Acropolis Cluster(s). In this case the
Image Cache will be leveraged on the target Acropolis Cluster(s) to cache the image locally
for any subsequent provisioning requsts for the image.
Neutron
Neutron is the networking component of OpenStack and responsible for network
configuration. The Acropolis OVM allows network CRUD operations to be performed by the
OpenStack portal and will then make the required changes in Acropolis.
You can view the Neutron services using the OpenStack portal under Admin->System-
>System Information->Network Agents.
Neutron will assign IP addresses to instances when they are booted. In this case Acropolis
will receive a desired IP address for the VM which will be allocated. When the VM performs a
DHCP request the Acropolis Master will respond to the DHCP request on a private VXLAN as
usual with AHV.
47
The Nutanix Bible
The Keystone and Horizon components run in an OpenStack Controller which interfaces
with the Acropolis OVM. The OVM(s) have an OpenStack Driver which is responsible for
translating the OpenStack API calls into native Acropolis API calls.
Region
A geographic landmass or area where multiple Availability Zones (sites) are located.
These can include regions like US-Northwest or US-West.
Availability Zone (AZ)
A specific site or datacenter location where cloud services are hosted. These can
include sites like US-Northwest-1 or US-West-1.
Host Aggregate
A group of compute hosts, can be a row, aisle or equivalent to the site / AZ.
Compute Host
An Acropolis OVM which is running the nova-compute service.
Hypervisor Host
A Acropolis Cluster (seen as a single host).
The figure shows the high-level relationship of the constructs:
48
Book of Prism
49
The Nutanix Bible
For environments spanning multiple sites the OpenStack Controller will talk to multiple
Acropolis OVMs across sites.
50
Book of Prism
2.6.1.4 Deployment
The OVM can be deployed as a standalone RPM on a CentOS / Redhat distro or as a full VM.
The Acropolis OVM can be deployed on any platform (Nutanix or non-Nutanix) as long as it
has network connectivity to the OpenStack Controller and Nutanix Cluster(s).
The VM(s) for the Acropolis OVM can be deployed on a Nutanix AHV cluster using the
following steps. If the OVM is already deployed you can skip past the VM creation steps. You
can use the full OVM image or use an existing CentOS / Redhat VM image.
First we will import the provided Acropolis OVM disk image to Acropolis cluster. This can be
done by copying the disk image over using SCP or by specifying a URL to copy the file from.
We will cover importing this using the Images API. Note: It is possible to deploy this VM
anywhere, not necessarily on a Acropolis cluster.
To import the disk image using Images API, run the following command:
image.create <IMAGE_NAME> source_url=<SOURCE_URL> container=<CONTAINER_NAME>
Next create the Acropolis VM for the OVM by running the following ACLI commands on any CVM:
vm.create <VM_NAME> num_vcpus=2 memory=16G
vm.disk_create <VM_NAME> clone_from_image=<IMAGE_NAME>
vm.nic_create <VM_NAME> network=<NETWORK_NAME>
vm.on <VM_NAME>
Once the VM(s) have been created and powered on, SSH to the OVM(s) using the provided
credentials.
OVMCTL Help
Help txt can be displayed by running the following command on the OVM:
ovmctl --help
OVM-allinone
The following steps cover the OVM-allinone deployment. Start by SSHing to the OVM(s) to
run the following commands.
51
The Nutanix Bible
OVM-services
The following steps cover the OVM-services deployment. Start by SSHing to the OVM(s) to
run the following commands.
# Register OpenStack Driver service
ovmctl --add ovm --name <OVM_NAME> --ip <OVM_IP>
If non-default passwords were used for the OpenStack controller deployment, well need to
update those:
52
Book of Prism
Now that the OVM has been configured, well configure the OpenStack Controller to know
about the Glance and Neutron endpoints.
Log in to the OpenStack controller and enter the keystonerc_admin source:
# enter keystonerc_admin source ./keystonerc_admin
First we will delete the existing endpoint for Glance that is pointing to the controller:
# Find old Glance endpoint id (port 9292) keystone endpoint-list # Remove
old keystone endpoint for Glance
keystone endpoint-delete <GLANCE_ENDPOINT_ID>
Next we will create the new Glance endpoint that will point to the OVM:
# Find Glance service id
keystone service-list | grep glance
# Will look similar to the following:
| 9e539e8dee264dd9a086677427434982 | glance | image |
Next we will delete the existing endpoint for Neutron that is pointing to the controller:
# Find old Neutron endpoint id (port 9696)
keystone endpoint-list # Remove old keystone endpoint for Neutron
keystone endpoint-delete <NEUTRON_ENDPOINT_ID>
Next we will create the new Neutron endpoint that will point to the OVM:
# Find Neutron service id
keystone service-list | grep neutron
# Will look similar to the following:
| f4c4266142c742a78b330f8bafe5e49e | neutron | network |
After the endpoints have been created we will update the Nova and Cinder configuration
files with new Acropolis OVM IP of Glance host.
53
The Nutanix Bible
First we will edit Nova.conf which is located at /etc/nova/nova.conf and edit the following lines:
[glance]
...
# Default glance hostname or IP address (string value)
host=<OVM_IP>
Now we will disable nova-compute on the OpenStack controller (if not already):
systemctl disable openstack-nova-compute.service
systemctl stop openstack-nova-compute.service
service openstack-nova-compute stop
Next we will edit Cinder.conf which is located at /etc/cinder/cinder.conf and edit the
following items:
# Default glance host name or IP (string value)
glance_host=<OVM_IP>
# Default glance port (integer value)
glance_port=9292
# A list of the glance API servers available to cinder
# ([hostname|ip]:port) (list value)
glance_api_servers=$glance_host:$glance_port
We will also comment out lvm enabled backends as those will not be leveraged:
# Comment out the following lines in cinder.conf
#enabled_backends=lvm
#[lvm]
#iscsi_helper=lioadm
#volume_group=cinder-volumes
#iscsi_ip_address=
#volume_driver=cinder.volume.drivers.lvm.LVMVolumeDriver
#volumes_dir=/var/lib/cinder/volumes
#iscsi_protocol=iscsi
#volume_backend_name=lvm
Now we will disable cinder volume on the OpenStack controller (if not already):
systemctl disable openstack-cinder-volume.service
systemctl stop openstack-cinder-volume.service
service openstack-cinder-volume stop
Now we will disable glance-image on the OpenStack controller (if not already):
systemctl disable openstack-glance-api.service
systemctl disable openstack-glance-registry.service
54
Book of Prism
After the files have been edited we will restart the Nova and Cinder services to take the new
configuration settings. The services can be restarted with the following commands below or
by running the scripts which are available for download.
# Restart Nova services
service openstack-nova-api restart
service openstack-nova-consoleauth restart
service openstack-nova-scheduler restart
service openstack-nova-conductor restart
service openstack-nova-cert restart
service openstack-nova-novncproxy restart
# OR you can also use the script which can be downloaded as part of the
helper tools:
~/openstack/commands/nova-restart
# Restart Cinder
service openstack-cinder-api restart
service openstack-cinder-scheduler restart
service openstack-cinder-backup restart
# OR you can also use the script which can be downloaded as part of the
helper tools:
~/openstack/commands/cinder-restart
Pro tip
Check NTP if a service is seen as state down in OpenStack Manager (Admin UI or
CLI) eventhough the service is running in the OVM. Many services have a requirement
for time to be in sync between the OpenStack Controller and Acropolis OVM.
Command Reference
56
Book of Acropolis
Book of
Acropolis
PART III
3.1 Architecture
Acropolis is a distributed multi-resource manager, orchestration platform and data plane.
It is broken down into three main components:
Distributed Storage Fabric (DSF)
This is at the core and birth of the Nutanix platform and expands upon the Nutanix
Distributed Filesystem (NDFS). NDFS has now evolved from a distributed system
pooling storage resources into a much larger and capable storage platform.
App Mobility Fabric (AMF)
Hypervisors abstracted the OS from hardware, and the AMF abstracts workloads
(VMs, Storage, Containers, etc.) from the hypervisor. This will provide the ability to
dynamically move the workloads between hypervisors, clouds, as well as provide the
ability for Nutanix nodes to change hypervisors.
Hypervisor
A multi-purpose hypervisor based upon the CentOS KVM hypervisor.
Building upon the distributed nature of everything Nutanix does, were expanding this
into the virtualization and resource management space. Acropolis is a back-end service
that allows for workload and resource management, provisioning, and operations. Its goal
is to abstract the facilitating resource (e.g., hypervisor, on-premise, cloud, etc.) from the
workloads running, while providing a single platform to operate.
This gives workloads the ability to seamlessly move between hypervisors, cloud providers,
and platforms.
The figure highlights an image illustrating the conceptual nature of Acropolis at various layers:
58
Book of Acropolis
3.1.2 Software-Defined
As mentioned above (likely numerous times), the Nutanix platform is a software-based
solution which ships as a bundled software + hardware appliance. The controller VM is
where the vast majority of the Nutanix software and logic sits and was designed from the
beginning to be an extensible and pluggable architecture. A key benefit to being software-
defined and not relying upon any hardware offloads or constructs is around extensibility. As
with any product life cycle, advancements and new features will always be introduced.
By not relying on any custom ASIC/FPGA or hardware capabilities, Nutanix can develop
and deploy these new features through a simple software update. This means that the
deployment of a new feature (e.g., deduplication) can be deployed by upgrading the current
version of the Nutanix software. This also allows newer generation features to be deployed
on legacy hardware models. For example, say youre running a workload running an older
version of Nutanix software on a prior generation hardware platform (e.g., 2400). The
running software version doesnt provide deduplication capabilities which your workload
could benefit greatly from. To get these features, you perform a rolling upgrade of the
Nutanix software version while the workload is running, and you now have deduplication.
Its really that easy.
Similar to features, the ability to create new adapters or interfaces into DSF is another
key capability. When the product first shipped, it solely supported iSCSI for I/O from the
hypervisor, this has now grown to include NFS and SMB. In the future, there is the ability to
create new adapters for various workloads and hypervisors (HDFS, etc.). And again, all of
this can be deployed via a software update. This is contrary to most legacy infrastructures,
where a hardware upgrade or software purchase is normally required to get the latest and
greatest features. With Nutanix, its different. Since all features are deployed in software,
they can run on any hardware platform, any hypervisor, and be deployed through simple
software upgrades.
The following figure shows a logical representation of what this software-defined controller
framework looks like:
59
The Nutanix Bible
Cassandra
Key Role: Distributed metadata store
Description: Cassandra stores and manages all of the cluster metadata in a distributed
ring-like manner based upon a heavily modified Apache Cassandra. The Paxos
algorithm is utilized to enforce strict consistency. This service runs on every node in the
cluster. The Cassandra is accessed via an interface called Medusa.
Zookeeper
Key Role: Cluster configuration manager
Description: Zookeeper stores all of the cluster configuration including hosts, IPs, state,
etc. and is based upon Apache Zookeeper. This service runs on three nodes in the
cluster, one of which is elected as a leader. The leader receives all requests and forwards
them to its peers. If the leader fails to respond, a new leader is automatically
elected. Zookeeper is accessed via an interface called Zeus.
Stargate
Key Role: Data I/O manager
Description: Stargate is responsible for all data management and I/O operations and is
the main interface from the hypervisor (via NFS, iSCSI, or SMB). This service runs on
every node in the cluster in order to serve localized I/O.
Curator
Key Role: Map reduce cluster management and cleanup
Description: Curator is responsible for managing and distributing tasks throughout the
cluster, including disk balancing, proactive scrubbing, and many more items. Curator
60
Book of Acropolis
runs on every node and is controlled by an elected Curator Master who is responsible for
the task and job delegation. There are two scan types for Curator, a full scan which
occurs around every 6 hours and a partial scan which occurs every hour.
Prism
Key Role: UI and API
Description: Prism is the management gateway for component and administrators to
configure and monitor the Nutanix cluster. This includes Ncli, the HTML5 UI, and REST
API. Prism runs on every node in the cluster and uses an elected leader like all
components in the cluster.
Genesis
Key Role: Cluster component & service manager
Description: Genesis is a process which runs on each node and is responsible for any
services interactions (start/stop/etc.) as well as for the initial configuration. Genesis is
a process which runs independently of the cluster and does not require the cluster to be
configured/running. The only requirement for Genesis to be running is that Zookeeper
is up and running. The cluster_init and cluster_status pages are displayed by the Genesis
process.
Chronos
Key Role: Job and task scheduler
Description: Chronos is responsible for taking the jobs and tasks resulting from a
Curator scan and scheduling/throttling tasks among nodes. Chronos runs on every
node and is controlled by an elected Chronos Master that is responsible for the task and
job delegation and runs on the same node as the Curator Master.
Cerebro
Key Role: Replication/DR manager
Description: Cerebro is responsible for the replication and DR capabilities of DSF. This
includes the scheduling of snapshots, the replication to remote sites, and the
migration/failover. Cerebro runs on every node in the Nutanix cluster and all nodes
participate in replication to remote clusters/sites.
Pithos
Key Role: vDisk configuration manager
Description: Pithos is responsible for vDisk (DSF file) configuration data. Pithos runs on
every node and is built on top of Cassandra.
3.1.4 Acropolis Services
An Acropolis Slave runs on every CVM with an elected Acropolis Master which is responsible
for task scheduling, execution, IPAM, etc. Similar to other components which have a Master,
if the Acropolis Master fails, a new one will be elected.
The role breakdown for each can be seen below:
Acropolis Master
Task scheduling & execution
Stat collection / publishing
Network Controller (for hypervisor)
VNC proxy (for hypervisor)
HA (for hypervisor)
61
The Nutanix Bible
Acropolis Slave
Stat collection / publishing
VNC proxy (for hypervisor)
SSD Devices
SSD devices store a few key items which are explained in greater detail above:
Nutanix Home (CVM core)
Cassandra (metadata storage)
OpLog (persistent write buffer)
Unified Cache (SSD cache portion)
Extent Store (persistent storage)
The following figure shows an example of the storage breakdown for a Nutanix nodes SSD(s):
62
Book of Acropolis
NOTE: The sizing for OpLog is done dynamically as of release 4.0.1 which will allow the
extent store portion to grow dynamically. The values used are assuming a completely
utilized OpLog. Graphics and proportions arent drawn to scale. When evaluating the
Remaining GiB capacities, do so from the top down. For example, the Remaining GiB
to be used for the OpLog calculation would be after Nutanix Home and Cassandra
have been subtracted from the formatted SSD capacity.
Nutanix Home is mirrored across the first two SSDs to ensure availability. Cassandra is on the
first SSD by default, and if that SSD fails the CVM will be restarted and Cassandra storage
will then be on the 2nd.
Most models ship with 1 or 2 SSDs, however the same construct applies for models shipping
with more SSD devices. For example, if we apply this to an example 3060 or 6060 node
which has 2 x 400GB SSDs, this would give us 100GiB of OpLog, 40GiB of Unified Cache, and
~440GiB of Extent Store SSD capacity per node.
HDD Devices
Since HDD devices are primarily used for bulk storage, their breakdown is much simpler:
Curator Reservation (Curator storage)
Extent Store (persistent storage)
Storage Pool
Key Role: Group of physical devices
Description: A storage pool is a group of physical storage devices including PCIe SSD,
SSD, and HDD devices for the cluster. The storage pool can span multiple Nutanix nodes
and is expanded as the cluster scales. In most configurations, only a single storage pool
is leveraged.
Container
Key Role: Group of VMs/files
Description: A container is a logical segmentation of the Storage Pool and contains
a group of VM or files (vDisks). Some configuration options (e.g., RF) are configured
at the container level, however are applied at the individual VM/file level. Containers
typically have a 1 to 1 mapping with a datastore (in the case of NFS/SMB).
vDisk
Key Role: vDisk
Description: A vDisk is any file over 512KB on DSF including .vmdks and VM hard disks.
vDisks are composed of extents which are grouped and stored on disk as an extent group.
The following figure shows how these map between DSF and the hypervisor:
Extent
Key Role: Logically contiguous data
Description: An extent is a 1MB piece of logically contiguous data which consists of n
number of contiguous blocks (varies depending on guest OS block size). Extents are
written/read/modified on a sub-extent basis (aka slice) for granularity and efficiency. An
extents slice may be trimmed when moving into the cache depending on the amount of
data being read/cached.
Extent Group
Key Role: Physically contiguous stored data
Description: An extent group is a 1MB or 4MB piece of physically contiguous stored
data. This data is stored as a file on the storage device owned by the CVM. Extents are
dynamically distributed among extent groups to provide data striping across nodes/
disks to improve performance. NOTE: as of 4.0, extent groups can now be either 1MB or
4MB depending on dedupe.
The following figure shows how these structs relate between the various file systems:
OpLog
Key Role: Persistent write buffer
Description: The OpLog is similar to a filesystem journal and is built as a staging area
to handle bursts of random writes, coalesce them, and then sequentially drain the data
to the extent store. Upon a write, the OpLog is synchronously replicated to another
n number of CVMs OpLog before the write is acknowledged for data availability
purposes. All CVM OpLogs partake in the replication and are dynamically chosen based
upon load. The OpLog is stored on the SSD tier on the CVM to provide extremely fast
write I/O performance, especially for random I/O workloads. For sequential workloads,
the OpLog is bypassed and the writes go directly to the extent store. If data is currently
sitting in the OpLog and has not been drained, all read requests will be directly fulfilled
from the OpLog until they have been drained, where they would then be served by the
extent store/unified cache. For containers where fingerprinting (aka Dedupe) has been
enabled, all write I/Os will be fingerprinted using a hashing scheme allowing them to be
deduplicated based upon fingerprint in the unified cache.
The per-vDisk OpLog limit is currently 6GB (as of 4.6), up from 2GB in prior versions.
66
Book of Acropolis
Extent Store
Key Role: Persistent data storage
Description: The Extent Store is the persistent bulk storage of DSF and spans SSD and
HDD and is extensible to facilitate additional devices/tiers. Data entering the extent
store is either being A) drained from the OpLog or B) is sequential in nature and has
bypassed the OpLog directly. Nutanix ILM will determine tier placement dynamically
based upon I/O patterns and will move data between tiers.
All other IOs, including those which can be large (e.g. >64K) will still be handled by the OpLog.
Unified Cache
Key Role: Dynamic read cache
Description: The Unified Cache is a deduplicated read cache which spans both the
CVMs memory and SSD. Upon a read request of data not in the cache (or based upon
a particular fingerprint), the data will be placed into the single-touch pool of the Unified
Cache which completely sits in memory, where it will use LRU until it is evicted from the
cache. Any subsequent read request will move (no data is actually moved, just cache
metadata) the data into the memory portion of the multi-touch pool, which consists
of both memory and SSD. From here there are two LRU cycles, one for the in-memory
piece upon which eviction will move the data to the SSD section of the multi-touch pool
where a new LRU counter is assigned. Any read request for data in the multi-touch pool
will cause the data to go to the peak of the multi-touch pool where it will be given a new
LRU counter.
67
The Nutanix Bible
Extent Cache
Key Role: In-memory read cache
Description: The Extent Cache is an in-memory read cache that is completely in
the CVMs memory. This will store non-fingerprinted extents for containers where
fingerprinting and deduplication are disabled. As of version 3.5, this is separate from the
Content Cache, however these are merged in 4.5 with the unified cache.
Performance at scale is also another important struct for DSF metadata. Contrary to tra-
ditional dual-controller or master models, each Nutanix node is responsible for a subset
of the overall platforms metadata. This eliminates the traditional bottlenecks by allowing
metadata to be served and manipulated by all nodes in the cluster. A consistent hashing
scheme is utilized to minimize the redistribution of keys during cluster size modifications
(also known as add/remove node) When the cluster scales (e.g., from 4 to 8 nodes), the
nodes are inserted throughout the ring between nodes for block awareness and reliability.
The following figure shows an example of the metadata ring and how it scales:
69
The Nutanix Bible
Data is also consistently monitored to ensure integrity even when active I/O isnt occurring.
Stargates scrubber operation will consistently scan through extent groups and perform
checksum validation when disks arent heavily utilized. This protects against things like bit
rot or corrupted sectors.
The following figure shows an example of what this logically looks like:
70
Book of Acropolis
Data
With DSF, data replicas will be written to other blocks in the cluster to ensure that in the
case of a block failure or planned downtime, the data remains available. This is true for both
RF2 and RF3 scenarios, as well as in the case of a block failure. An easy comparison would
be node awareness, where a replica would need to be replicated to another node which
will provide protection in the case of a node failure. Block awareness further enhances this
by providing data availability assurances in the case of block outages.
The following figure shows how the replica placement would work in a 3-block deployment:
71
The Nutanix Bible
As of Acropolis base software version 4.5 and later block awareness is best effort and
doesnt have strict requirements for enabling. This was done to ensure clusters with skewed
storage resources (e.g. storage heavy nodes) dont disable the feature. With that stated, it is
however still a best practice to have uniform blocks to minimize any storage skew.
Prior to 4.5 the following conditions must be met:
If SSD or HDD tier variance between blocks is > max variance: NODE awareness
If SSD and HDD tier variance between blocks is < max variance: BLOCK + NODE
awareness
Metadata
As mentioned in the Scalable Metadata section above, Nutanix leverages a heavily modified
Cassandra platform to store metadata and other essential information. Cassandra leverages
a ring-like structure and replicates to n number of peers within the ring to ensure data
consistency and availability.
The following figure shows an example of the Cassandras ring for a 12-node cluster:
Cassandra peer replication iterates through nodes in a clockwise manner throughout the
ring. With block awareness, the peers are distributed among the blocks to ensure no two
peers are on the same block.
The following figure shows an example node layout translating the ring above into the block
based layout:
Configuration Data
Nutanix leverages Zookeeper to store essential configuration data for the cluster. This role is
also distributed in a block-aware manner to ensure availability in the case of a block failure.
The following figure shows an example layout showing 3 Zookeeper nodes distributed in a
block-aware manner:
74
Book of Acropolis
Zookeeper role would be transferred to another node in the cluster as shown below:
NOTE: Prior to 4.5, this migration was not automatic and must be done manually.
VM impact:
HA event: No
Failed I/Os: No
Latency: No impact
In the event of a disk failure, a Curator scan (MapReduce Framework) will occur immediately.
It will scan the metadata (Cassandra) to find the data previously hosted on the failed disk
and the nodes / disks hosting the replicas.
Once it has found that data that needs to be re-replicated, it will distribute the replication
tasks to the nodes throughout the cluster.
An important thing to highlight here is given how Nutanix distributes data and replicas
across all nodes / CVMs / disks; all nodes / CVMs / disks will participate in the re-replication.
This substantially reduces the time required for re-protection, as the power of the full cluster
can be utilized; the larger the cluster, the faster the re-protection.
CVM Failure
A CVM failure can be characterized as a CVM power action causing the CVM to be
temporarily unavailable. The system is designed to transparently handle these gracefully.
In the event of a failure, I/Os will be re-directed to other CVMs within the cluster. The
mechanism for this will vary by hypervisor.
The rolling upgrade process actually leverages this capability as it will upgrade one CVM at a
time, iterating through the cluster.
VM impact:
HA event: No
Failed I/Os: No
Latency: Potentially higher given I/Os over the network
In the event of a CVM failure the I/O which was previously being served from the down
CVM, will be forwarded to other CVMs throughout the cluster. ESXi and Hyper-V handle this
via a process called CVM Autopathing, which leverages HA.py (like happy), where it will
modify the routes to forward traffic going to the internal address (192.168.5.2) to the external
IP of other CVMs throughout the cluster. This enables the datastore to remain intact, just the
CVM responsible for serving the I/Os is remote.
Once the local CVM comes back up and is stable, the route would be removed and the local
CVM would take over all new I/Os.
In the case of KVM, iSCSI multi-pathing is leveraged where the primary path is the local CVM
and the two other paths would be remote. In the event where the primary path fails, one of
the other paths will become active.
Similar to Autopathing with ESXi and Hyper-V, when the local CVM comes back online, itll
take over as the primary path.
Node Failure
VM Impact:
HA event: Yes
Failed I/Os: No
Latency: No impact
In the event of a node failure, a VM HA event will occur restarting the VMs on other nodes
throughout the virtualization cluster. Once restarted, the VMs will continue to perform I/Os
as usual which will be handled by their local CVMs.
Similar to the case of a disk failure above, a Curator scan will find the data previously hosted
on the node and its respective replicas.
Similar to the disk failure scenario above, the same process will take place to re-protect the
data, just for the full node (all associated disks).
76
Book of Acropolis
In the event where the node remains down for a prolonged period of time, the down CVM
will be removed from the metadata ring. It will be joined back into the ring after it has been
up and stable for a duration of time.
Pro tip
Data resiliency state will be shown in Prism on the dashboard page.
You can also check data resiliency state via the cli:
# Node status
ncli cluster get-domain-fault-tolerance-status type=node
# Block status
ncli cluster get-domain-fault-tolerance-status type=rackable_unit
These should always be up to date, however to refresh the data you can kick off a
Curator partial scan.
P2V/V2V,Hyper-V (ODX),Cross- Greater cache efficiency for data which wasnt cloned or
Perf Tier Dedup
container clones created using efficient Acropolis clones.
Capacity Tier Dedup Same as perf tier dedup Benefits of above with reduced overhead on disk.
77
The Nutanix Bible
Pro tip
You can override the default strip size (4/1 for RF2 like or 4/2 for RF3 like) via NCLI
ctr [create / edit] erasure-code=<N>/<K> where N is the number of data blocks
and K is the number of parity blocks.
The expected overhead can be calculated as <# parity blocks> / <# data blocks>. For
example, a 4/1 strip has a 25% overhead or 1.25X compared to the 2X of RF2. A 4/2 strip has
a 50% overhead or 1.5X compared to the 3X of RF3.
The following table characterizes the encoded strip sizes and example overheads:
Pro tip
It is always recommended to have a cluster size which has at least 1 more node than
the combined strip size (data + parity) to allow for rebuilding of the strips in the event
of a node failure. This eliminates any computation overhead on reads once the strips
have been rebuilt (automated via Curator). For example, a 4/1 strip should have at
least 6 nodes in the cluster. The previous table follows this best practice.
78
Book of Acropolis
The encoding is done post-process and leverages the Curator MapReduce framework for
task distribution. Since this is a post-process framework, the traditional write I/O path is
unaffected.
A normal environment using RF would look like the following:
In this scenario, we have a mix of both RF2 and RF3 data whose primary copies are local and
replicas are distributed to other nodes throughout the cluster.
When a Curator full scan runs, it will find eligible extent groups which are available to
become encoded. Eligible extent groups must be write-cold meaning they havent been
written to for > 1 hour. After the eligible candidates are found, the encoding tasks will be
distributed and throttled via Chronos.
The following figure shows an example 4/1 and 3/2 strip:
79
The Nutanix Bible
Once the data has been successfully encoded (strips and parity calculation), the replica
extent groups are then removed.
The following figure shows the environment after EC has run with the storage savings:
Pro tip
Erasure Coding pairs perfectly with inline compression which will add to the storage
savings. I leverage inline compression + EC in my environments.
3.2.7.2 Compression
The Nutanix Capacity Optimization Engine (COE)
is responsible for performing data transformations
to increase data efficiency on disk. Currently
compression is one of the key features of the COE to
perform data optimization. DSF provides both inline
and offline flavors of compression to best suit the
customers needs and type of data.
Inline compression will compress sequential streams of
data or large I/O sizes in memory before it is written
to disk, while offline compression will initially write For a video explanation
the data as normal (in an un-compressed state) and you can watch the
then leverage the Curator framework to compress following video:
the data cluster wide. When inline compression is https://www.youtube.com/watch?v
enabled but the I/Os are random in nature, the data will =ERDqOCzDcQY&feature=youtu.be
be written un-compressed in the OpLog, coalesced, and
then compressed in memory before being written to the Extent Store. The Google Snappy
compression library is leveraged which provides good compression ratios with minimal
computational overhead and extremely fast compression / decompression rates.
The following figure shows an example of how inline compression interacts with the DSF
write I/O path:
80
Book of Acropolis
Pro tip
Almost always use inline compression (compression delay = 0) as it will only compress
larger / sequential writes and not impact random write performance.
Inline compression also pairs perfectly with erasure coding.
For offline compression, all new write I/O is written in an un-compressed state and follows
the normal DSF I/O path. After the compression delay (configurable) is met and the data
has become cold (down-migrated to the HDD tier via ILM), the data is eligible to become
compressed. Offline compression uses the Curator MapReduce framework and all nodes will
perform compression tasks. Compression tasks will be throttled by Chronos.
The following figure shows an example of how offline compression interacts with the DSF
write I/O path:
81
The Nutanix Bible
The following figure shows an example of how decompression interacts with the DSF I/O
path during read:
82
Book of Acropolis
Pro tip
Use performance tier deduplication on your base images (you can manually fingerprint
them using vdisk_manipulator) to take advantage of the content cache.
Use capacity tier deduplication for P2V / V2V, when using Hyper-V since ODX does a
full data copy, or when doing cross-container clones (not usually recommended as a
single container is preferred).
In most other cases compression will yield the highest capacity savings and should
be used instead.
The following figure shows an example of how the Elastic Dedupe Engine scales and handles
local VM I/O requests:
83
The Nutanix Bible
The following figure shows an example of how the Elastic Dedupe Engine interacts with the
DSF I/O path:
Dedup + Compression
As of 4.5 both deduplication and compression can be enabled on the same container.
However, unless the data is dedupable (conditions explained earlier in section), stick
with compression.
Specific types of resources (e.g. SSD, HDD, etc.) are pooled together and form a cluster wide
storage tier. This means that any node within the cluster can leverage the full tier capacity,
regardless if it is local or not.
The following figure shows a high level example of what this pooled tiering looks like:
85
The Nutanix Bible
95%, 20% of the data in the SSD tier will be moved to the HDD tier (95% > 75%).
However, if the utilization was 80%, only 15% of the data would be moved to the HDD tier
using the minimum tier free up amount.
86
Book of Acropolis
Disk balancing leverages the DSF Curator framework and is run as a scheduled process as
well as when a threshold has been breached (e.g., local node capacity utilization > n %). In
the case where the data is not balanced, Curator will determine which data needs to be
moved and will distribute the tasks to nodes in the cluster. In the case where the node types
are homogeneous (e.g., 3050), utilization should be fairly uniform. However, if there are
certain VMs running on a node which are writing much more data than others, there can
become a skew in the per node capacity utilization. In this case, disk balancing would run
and move the coldest data on that node to other nodes in the cluster. In the case where the
node types are heterogeneous (e.g., 3050 + 6020/50/70), or where a node may be used in a
storage only mode (not running any VMs), there will likely be a requirement to move data.
The following figure shows an example the mixed cluster after disk balancing has been run in
a balanced state:
87
The Nutanix Bible
88
Book of Acropolis
The same methods are used for both snapshots and/or clones of a VM or vDisk(s). When
a VM or vDisk is cloned, the current block map is locked and the clones are created. These
updates are metadata only, so no I/O actually takes place. The same method applies for
clones of clones; essentially the previously cloned VM acts as the Base vDisk and upon
cloning, that block map is locked and two clones are created: one for the VM being cloned
and another for the new clone.
They both inherit the prior block map and any new writes/updates would take place on their
individual block maps.
89
The Nutanix Bible
90
Book of Acropolis
occur locally right away. DSF will detect the I/Os are occurring from a different node and
will migrate the data locally in the background, allowing for all read I/Os to now be served
locally. The data will only be migrated on a read as to not flood the network.
Data locality occurs in two main flavors:
Cache Locality
vDisk data stored locally in the Unified Cache. vDisk extent(s) may be
remote to the node.
Extent Locality
vDisk extents local on the same node as the VM
The following figure shows an example of how data will follow the VM as it moves between
hypervisor nodes:
With Shadow Clones, DSF will monitor vDisk access trends similar to what it does for data
locality. However, in the case there are requests occurring from more than two remote CVMs
(as well as the local CVM), and all of the requests are read I/O, the vDisk will be marked as
immutable. Once the disk has been marked as immutable, the vDisk can then be cached
locally by each CVM making read requests to it (aka Shadow Clones of the base vDisk).
This will allow VMs on each node to read the Base VMs vDisk locally. In the case of VDI, this
means the replica disk can be cached by each node and all read requests for the base will be
served locally. NOTE: The data will only be migrated on a read as to not flood the network
and allow for efficient cache utilization. In the case where the Base VM is modified, the
Shadow Clones will be dropped and the process will start over. Shadow clones are enabled
by default (as of 4.0.2) and can be enabled/disabled using the following NCLI command: ncli
cluster edit-params enable-shadow-clones=<true/false>.
The following figure shows an example of how Shadow Clones work and allow for
distributed caching:
92
Book of Acropolis
Hypervisor Layer
Key Role: Metrics reported by the Hypervisor(s)
Description: Hypervisor level metrics are pulled directly from the hypervisor and
represent the most accurate metrics the hypervisor(s) are seeing. This data can be
viewed for one of more hypervisor node(s) or the aggregate cluster. This layer will
provide the most accurate data in terms of what performance the platform is seeing and
should be leveraged in most cases. In certain scenarios the hypervisor may combine or
split operations coming from VMs which can show the difference in metrics reported
by the VM and hypervisor. These numbers will also include cache hits served by the
Nutanix CVMs.
When to use: Most common cases as this will provide the most detailed and valuable
metrics.
Controller Layer
Key Role: Metrics reported by the Nutanix Controller(s)
Description: Controller level metrics are pulled directly from the Nutanix Controller VMs
(e.g., Stargate 2009 page) and represent what the Nutanix front-end is seeing from
NFS/SMB/iSCSI or any back-end operations (e.g., ILM, disk balancing, etc.). This data
can be viewed for one of more Controller VM(s) or the aggregate cluster. The metrics
seen by the Controller Layer should normally match those seen by the hypervisor layer,
however will include any backend operations (e.g., ILM, disk balancing). These numbers
will also include cache hits served by memory. In certain cases, metrics like (IOPS),
might not match as the NFS / SMB / iSCSI client might split a large IO into multiple
smaller IOPS. However, metrics like bandwidth should match.
When to use: Similar to the hypervisor layer, can be used to show how much backend
operation is taking place.
Disk Layer
Key Role: Metrics reported by the Disk Device(s)
Description: Disk level metrics are pulled directly from the physical disk devices (via the
CVM) and represent what the back-end is seeing. This includes data hitting the OpLog
or Extent Store where an I/O is performed on the disk. This data can be viewed for one
of more disk(s), the disk(s) for a particular node, or the aggregate disks in the cluster. In
common cases, it is expected that the disk ops should match the number of incoming
writes as well as reads not served from the memory portion of the cache. Any reads
being served by the memory portion of the cache will not be counted here as the op is
not hitting the disk device.
When to use: When looking to see how many ops are served from cache or hitting the disksm
93
The Nutanix Bible
3.3 Services
3.3.1 Nutanix Guest Tools (NGT)
Nutanix Guest Tools (NGT) is a software based in-guest agent framework which enables
advanced VM management functionality through the Nutanix Platform.
The solution is composed of the NGT installer which is installed on the VMs and the Guest
Tools Framework which is used for coordination between the agent and Nutanix platform.
The NGT installer contains the following components:
Guest Agent Service
Self-service Restore (SSR) aka File-level Restore (FLR) CLI
VM Mobility Drivers (VirtIO drivers for AHV)
VSS Agent and Hardware Provider for Windows VMs
App Consistent snapshot support for Linux VMs (via scripts to queisce)
94
Book of Acropolis
NGT Proxy
Runs on every CVM and will forward requests to the NGT Master to perform the
desired activity. The current VM activing as the Prism Leader (hosting the VIP) will
be the active CVM handling communication from the Guest Agent. Listens externally
on port 2074.
nutanix_guest_tools_cli get_master_location
95
The Nutanix Bible
The Guest Tools Service acts as a Certificate Authority (CA) and is responsible for
generating certificate pairs for each NGT enabled UVM. This certificate is embedded in to
the ISO which is configured for the UVM and used as part of the NGT deployment process.
These certificates are installed inside the UVM as part of the installation process.
96
Book of Acropolis
The VM must have a CD-ROM as the generated installer containing the software and unique
certificate will be mounted there as shown:
97
The Nutanix Bible
As part of the installation process Python, PyWin and the Nutanix Mobility (cross-hypervisor
compatibility) drivers will also be installed.
After the installation has been completed, a reboot will be required.
After successful installation and reboot, you will see the following items visible in Programs
and Features:
98
Book of Acropolis
3.3.2 OS Customization
Nutanix provides native OS customization capabilities leveraging CloudInit and Sysprep.
CloudInit is a package which handles bootstrapping of Linux cloud servers. This allows for
the early initialization and customization of a Linux instance. Sysprep is a OS customization
for Windows.
Some typical uses include:
Setting Hostname
Installing packages
Adding users / key management
Custom scripts
Supported Configurations
The solution is applicable to Linux guests running on AHV, including versions below (list may
be incomplete, refer to documentation for a fully supported list):
Hypervisors:
AHV
Operating Systems:
Linux - most modern distributions
Windows - most modern distributions
Pre-Requisites
In order for CloudInit to be used the following are necessary:
CloudInit package must be installed in Linux image
Sysprep is available by default in Windows installations.
Package Installation
CloudInit can be installed (if not already) using the following commands:
Image Customization
To leverage a custom script for OS customization, a check box and inputs is available in
Prism or the REST API. This option is specified during the VM creation or cloning process:
99
The Nutanix Bible
ADSF Path
Use a file which has been previously upload to ADSF
Upload a file
Upload a file which will be used
Type or paste script
CloudInit script or Unattend.xml text
Nutanix passes the user data script to CloudInit or Sysprep process during first boot by
creating a CD-ROM which contains the script. Once the process is complete we will remove
the CD-ROM.
Input formatting
The platform supports a good amount of user data input formats, Ive identified a few of the
key ones below:
User-Data Script (CloudInit - Linux)
A user-data script is a simple shell script that will be executed very late in the boot process
(e.g. rc.local-like).
The scripts will begin similar to any bash script: #!.
Below we show an example user-data script:
#!/bin/bash
touch /tmp/fooTest
mkdir /tmp/barFolder
100
Book of Acropolis
The include file contains a list of urls (one per line). Each of the URLs will be read and they
will be processed similar to any other script.
The scripts will begin with: #include.
Below we show an example include script:
=#include
http://s3.amazonaws.com/path/to/script/1
http://s3.amazonaws.com/path/to/script/2
The cloud-config input type is the most common and specific to CloudInit.
The scripts will begin with: #cloud-init.
Below we show an example cloud config data script:
#cloud-config
# Set hostname
hostname: foobar
# Add user(s)
users:
- name: nutanix
sudo: [ALL=(ALL) NOPASSWD:ALL]
ssh-authorized-keys:
- ssh-rsa: <PUB KEY>
lock-passwd: false
passwd: <PASSWORD>
101
The Nutanix Bible
102
Book of Acropolis
</ModifyPartition>
</ModifyPartitions>
</Disk>
</DiskConfiguration>
</component>
<component name=Microsoft-Windows-International-Core-WinPE publ
icKeyToken=31bf3856ad364e35 language=neutral versionScope=nonSxS
processorArchitecture=x86>
<SetupUILanguage>
<WillShowUI>OnError</WillShowUI>
<UILanguage>en-US</UILanguage>
</SetupUILanguage>
<UILanguage>en-US</UILanguage>
</component>
</settings>
</unattend>
103
The Nutanix Bible
Pre-Requisites
Before we get to configuration, we need to configure the Data Services IP which will act as
our central discovery / login portal.
Well set this on the Cluster Details page (Gear Icon -> Cluster Details):
Target Creation
To use Block Services, the first thing well do is create a Volume Group which is the iSCSI
target.
From the Storage page click on + Volume Group on the right hand corner:
Next well click on + Add new disk to add any disk(s) to the target (visible as LUNs):
A menu will appear allowing us to select the target container and size of the disk:
# Add disk(s) to VG
105
The Nutanix Bible
106
Book of Acropolis
Given this mechanism, client side multipathing (MPIO) is no longer necessary for path HA.
When connecting to a target, theres now no need to check Enable multi-path (which
enables MPIO):
Multi-Pathing
The iSCSI protocol spec mandates a single iSCSI session (TCP connection) per target,
between initiator and target. This means there a 1:1 relationship between a Stargate and a
target.
As of 4.7, 32 (default) virtual targets will be automatically created per attached initiator and
assigned to each disk device added to the volume group (VG). This provides an iSCSI target
per disk device. Previously this would have been handled by creating multiple VGs with a
single disk each.
When looking at the VG details in ACLI/API you can see the 32 virtual targets created for
each attachment:
attachment_list {
external_initiator_name: iqn.1991-05.com.microsoft:desktop-foo
target_params {
num_virtual_targets: 32
}
}
Here weve created a sample VG with 3 disks devices added to it. When performing a
discovery on my client I can see an individual target for each disk device (with a suffix in the
format of -tgt[int]):
107
The Nutanix Bible
108
Book of Acropolis
109
The Nutanix Bible
Supported Protocols
As of 4.6, SMB (up to version 2.1) is the only supported protocol for client
communication with file services.
The File Services VMs run as agent VMs on the platform and are transparently deployed as
part of the configuration process.
High-Availability (HA)
Each FSVM leverages the Acropolis Volumes API for its data storage which is accessed via in-
guest iSCSI. This allows any FSVM to connect to any iSCSI target in the event of a FSVM outage.
The figure shows a high-level overview of the FSVM storage:
To provide for path availability DM-MPIO is leveraged within the FSVM which will have the
active path set to the local CVM by default:
In the event of a FSVM failure (e.g. maintenance, power off, etc.) the VG and IP of the failed
FSVM will be taken over by another FSVM to ensure client availability.
The figure shows the transfer of the failed FSVMs IP and VG:
Supported Configurations
The solution is applicable to the configurations below (list may be incomplete, refer to
documentation for a fully supported list):
112
Book of Acropolis
Hypervisor(s):
AHV
Container System(s)*:
Docker 1.11
*As of 4.7, the solution only supports storage integration with Docker based containers.
However, any other container system can run as a VM on the Nutanix platform.
The following entities compose Docker (note: not all are required):
Docker Image: The basis and image for a container
Docker Registry: Holding space for Docker Images
Docker Hub: Online container marketplace (public Docker Registry)
Docker File: Text file describing how to construct the Docker image
Docker Container: Running instantiation of a Docker Image
Docker Engine: Creates, ships and runs Docker containers
Docker Swarm: Docker host clustering / scheduling platform
Docker Daemon: Handles requests from Docker Client and does heavy lifting of building,
running and distributing containers
Architecture
The Nutanix solution currently leverages Docker Engine running in VMs which are created
using Docker Machine. These machines can run in conjunction with normal VMs on the
platform.
113
The Nutanix Bible
Data persistence is achieved by using the Nutanix Volume Driver which will leverage
Acropolis Block Services to attach a volume to the host / container:
114
Book of Acropolis
The next step is to SSH into the newly provisioned Docker Host(s) via docker-machine ssh:
Before we start the volume driver well make sure we have the latest driver, to pull the latest
version run:
Now that we have the latest version well start the Nutanix Docker Volume Driver:
~/start-volume-plugin.sh
NOTE: Though Nutanix provides native options for backup and dr, traditional
solutions (e.g. Commvault, Rubrik, etc.) can also be used, leveraging some of the
native features the platform provides (VSS, snapshots, etc.).
Pro tip
Create multiple PDs for various services tiers driven by a desired RPO/RTO. For file
distribution (e.g. golden images, ISOs, etc.) you can create a PD with the files to replication.
Pro tip
Group dependent application or service VMs in a consistency group to ensure they
are recovered in a consistent state (e.g. App and DB)
Snapshot Schedule
Key Role: Snapshot and replication schedule
Description: Snapshot and replication schedule for VMs in a particular PD and CG
Pro tip
The snapshot schedule should be equal to your desired RPO
Retention Policy
Key Role: Number of local and remote snapshots to keep
Description: The retention policy defines the number of local and remote snapshots to
retain. NOTE: A remote site must be configured for a remote retention/replication policy to
be configured.
116
Book of Acropolis
Pro tip
The retention policy should equal the number of restore points required per VM/file
Remote Site
Key Role: A remote Nutanix cluster
Description: A remote Nutanix cluster which can be leveraged as a target for backup or
DR purposes.
Pro tip
Ensure the target site has ample capacity (compute/storage) to handle a full site failure.
In certain cases replication/DR between racks within a single site can also make sense.
The following figure shows a logical representation of the relationship between a PD, CG,
and VM/Files for a single site:
118
Book of Acropolis
Click Next, then click Next Schedule to create a snapshot and replication schedule
Enter the desired snapshot frequency, retention and any remote sites for replication
Multiple Schedules
It is possible to create multiple snapshot / replication schedules. For example, if you
want to have a local backup schedule occurring hourly and another schedule which
replicated to a remote site daily.
Its important to mention that a full container can be protected for simplicity. However, the
platform provides the ability to protect down to the granularity of a single VM and/or file
level.
From the Data Protection Page, you can see the protection domains (PD) previously created
in the Protecting Entities section.
119
The Nutanix Bible
120
Book of Acropolis
Supported Configurations
The solution is applicable to both Windows and Linux guests, including versions below (list
may be incomplete, refer to documentation for a fully supported list):
121
The Nutanix Bible
Hypervisors:
ESX
AHV
Windows
2008R2, 2012, 2012R2
Linux
CentOS 6.5/7.0
RHEL 6.5/7.0
OEL 6.5/7.0
Ubuntu 14.04+
SLES11SP3+
Pre-Requisites
In order for Nutanix VSS snapshots to be used the following are necessary:
Nutanix Platform
Cluster Virtual IP (VIP) must be configured
Guest OS / UVM
NGT must be installed
CVM VIP must be reachable on port 2074
Disaster Recovery Configuration
UVM must be in PD with Use application consistent snapshots enabled
Architecture
As of 4.6 this is achieved using the native Nutanix Hardware VSS provider which is installed
as part of the Nutanix Guest Tools package. You can read more on the guest tools in the
Nutanix Guest Tools section.
The following image shows a high-level view of the VSS architecture:
You can perform an application consistent snapshot by following the normal data protection
workflow and selecting Use application consistent snapshots when protecting the VM.
122
Book of Acropolis
The Nutanix VSS solution is integrated with the Windows VSS framework. The following
shows a high-level view of the architecture:
123
The Nutanix Bible
The pre-freeze and post-thaw scripts are located in the following directories:
Pre-freeze: /sbin/pre_freeze
Post-thaw: /sbin/post-thaw
Replication Topologies
Traditionally, there are a few key replication topologies: Site to site, hub and spoke, and full
and/or partial mesh. Contrary to traditional solutions which only allow for site to site or hub
and spoke, Nutanix provides a fully mesh or flexible many-to-many model.
124
Book of Acropolis
Essentially, this allows the admin to determine a replication capability that meets their
companys needs.
Replication Lifecycle
Nutanix replication leverages the Cerebro service mentioned above. The Cerebro service
is broken into a Cerebro Master, which is a dynamically elected CVM, and Cerebro Slaves,
which run on every CVM. In the event where the CVM acting as the Cerebro Master fails, a
new Master is elected.
The Cerebro Master is responsible for managing task delegation to the local Cerebro Slaves
as well as coordinating with remote Cerebro Master(s) when remote replication is occurring.
During a replication, the Cerebro Master will figure out which data needs to be replicated,
and delegate the replication tasks to the Cerebro Slaves which will then tell Stargate which
data to replicate and to where.
Replicated data is protected at multiple layers throughout the process. Extent reads on the
source are checksummed to ensure consistency for source data (similar to how any DSF
read occurs) and the new extent(s) are checksummed at the target (similar to any DSF
write). TCP provides consistency on the network layer.
Pro tip
When using a remote site configured with a proxy, always utilize the cluster IP as
that will always be hosted by the Prism Leader and available, even if CVM(s) go
down.
125
The Nutanix Bible
The following figure shows a representation of the replication architecture using a proxy:
Note
This should only be used for non-production scenarios and the cluster IPs should be
used to ensure availability.
The following figure shows a representation of the replication architecture using a SSH
tunnel:
126
Book of Acropolis
The following figure shows an example three site deployment where each site contains one
of more protection domains (PD):
Note
Fingerprinting must be enabled on the source and target container / vstore for
replication deduplication to occur.
Since a cloud based remote site is similar to any other Nutanix remote site, a cluster can
replicate to multiple regions if higher availability is required (e.g., data availability in the case
of a full region outage):
128
Book of Acropolis
This expands the VM HA domain from a single site to between two sites providing a near 0
RTO and a RPO of 0.
In this deployment, each site has its own Nutanix cluster, however the containers are
stretched by synchronously replicating to the remote site before acknowledging writes.
The following figure shows a high-level design of what this architecture looks like:
3.6 Administration
3.6.1 Important Pages
These are advanced Nutanix pages besides the standard user interface that allow
you to monitor detailed stats and metrics. The URLs are formatted in the following
way: http://<Nutanix CVM IP/DNS>:<Port/path (mentioned below)> Example: http://
MyCVM-A:2009 NOTE: if youre on a different subnet IPtables will need to be disabled on
the CVM to access the pages.
2009 Page
This is a Stargate page used to monitor the back end storage system and should only be
used by advanced users. Ill have a post that explains the 2009 pages and things to look for.
2009/latency Page
This is a Stargate page used to monitor the back end latency.
2009/vdisk_stats Page
This is a Stargate page used to show various vDisk stats including histograms of I/O sizes,
latency, write hits (e.g., OpLog, eStore), read hits (cache, SSD, HDD, etc.) and more.
2009/h/traces Page
This is the Stargate page used to monitor activity traces for operations.
2009/h/vars Page
This is the Stargate page used to monitor various counters.
2010 Page
This is the Curator page which is used for monitoring Curator runs.
2010/master/control Page
This is the Curator control page which is used to manually start Curator jobs
2011 Page
This is the Chronos page which monitors jobs and tasks scheduled by Curator.
2020 Page
This is the Cerebro page which monitors the protection domains, replication status and DR.
130
Book of Acropolis
2020/h/traces Page
This is the Cerebro page used to monitor activity traces for PD operations and replication.
2030 Page
This is the main Acropolis page and shows details about the environment hosts, any
currently running tasks and networking details..
2030/sched Page
This is an Acropolis page used to show information about VM and resource scheduling used
for placement decisions. This page shows the available host resources and VMs running on
each host.
2030/tasks Page
This is an Acropolis page used to show information about Acropolis tasks and their state.
You can click on the task UUID to get detailed JSON about the task.
2030/vms Page
This is an Acropolis page used to show information about Acropolis VMs and details about
them. You can click on the VM Name to connect to the console.
Untar package
tar xzvf ~/tmp/nutanix*
Perform upgrade
~/tmp/install/bin/cluster -i ~/tmp/install upgrade
Check status
upgrade_status
131
The Nutanix Bible
Node(s) upgrade
Description: Perform upgrade of specified node(s) to current clusters version
From any CVM running the desired version run the following command:
cluster -u <NODE_IP(s)> upgrade_node
OR
Start Service
cluster start
132
Book of Acropolis
Find cluster id
Description: Find the cluster ID for the current cluster
zeus_config_printer | grep cluster_id
Open port
Description: Enable port through IPtables
sudo vi /etc/sysconfig/iptables
-A INPUT -m state --state NEW -m tcp -p tcp --dport <PORT> -j ACCEPT
sudo service iptables restart
133
The Nutanix Bible
# Partial Scan
allssh wget -O - http://localhost:2010/master/api/client/
StartCuratorTasks?task_type=3;
# Refresh Usage
allssh wget -O - http://localhost:2010/master/api/client/RefreshStats;
Compact ring
Description: Compact the metadata ring
allssh nodetool -h localhost compact
134
Book of Acropolis
Create links
source ~/ncc/ncc_completion.bash
echo source ~/ncc/ncc_completion.bash >> ~/.bashrc
progress_monitor_cli -fetchall
135
The Nutanix Bible
At the top of the page is the overview details which show various details about the cluster:
136
Book of Acropolis
In this section there are two key areas I look out for, the first being the I/O queues which
shows the number of admitted / outstanding operations.
The figure shows the queues portion of the overview section:
Pro tip
In ideal cases the hit rates should be above 80-90%+ if the workload is read heavy
for the best possible read performance.
The next section is the Cluster State which shows details on the various Stargates in the
cluster and their disk usages.
The figure shows the Stargates and disk utilization (available/total):
The next section is the NFS Slave section which will show various details and stats per
vDisk.
The figure shows the vDisks and various I/O details:
137
The Nutanix Bible
Pro tip
When looking at any potential performance issues I always look at the following:
1- Avg. latency
2- Avg. op size
3- Avg. outstanding
For more specific details the vdisk_stats page holds a plethora of information.
This will bring you to the vdisk_stats page which will give you the detailed vDisk stats.
NOTE: Theses values are real-time and can be updated by refreshing the page.
The first key area is the Ops and Randomness section which will show a breakdown of
whether the I/O patterns are random or sequential in nature.
The figure shows the Ops and Randomness section:
138
Book of Acropolis
The next area shows a histogram of the frontend read and write I/O latency (aka the latency
the VM / OS sees).
The figure shows the Frontend Read Latency histogram:
The next key area is the I/O size distribution which shows a histogram of the read and write
I/O sizes.
The figure shows the Read Size Distribution histogram:
139
The Nutanix Bible
The next key area is the Working Set Size section which provides insight on working set
sizes for the last 2 minutes and 1 hour. This is broken down for both read and write I/O.
The figure shows the Working Set Sizes table:
Pro tip
If youre seeing high read latency take a look at the read source for the vDisk and take
a look where the I/Os are being served from. In most cases high latency could be
caused by reads coming from HDD (Estore HDD).
140
Book of Acropolis
The Write Destination section will show where the new write I/O are coming in to.
The figure shows the Write Destination table:
Pro tip
Random or smaller I/Os (<64K) will be written to the Oplog. Larger or sequential I/Os
will bypass the Oplog and be directly written to the Extent Store (Estore).
Another interesting data point is what data is being up-migrated from HDD to SSD via ILM.
The Extent Group Up-Migration table shows data that has been up-migrated in the last 300,
3,600 and 86,400 seconds.
The figure shows the Extent Group Up-Migration table:
141
The Nutanix Bible
Erasure Coding X
Garbage Cleanup X
142
Book of Acropolis
Clicking on the Execution id will bring you to the job details page which displays various
job stats as well as generated tasks.
The table at the top of the page will show various details on the job including the type,
reason, tasks and duration.
The next section is the Background Task Stats table which displays various details on the
type of tasks, quantity generated and priority.
The figure shows the job details table:
The next section is the MapReduce Jobs table which shows the actual MapReduce jobs
started by each Curator job. Partial scans will have a single MapReduce Job, full scans will
have four MapReduce Jobs.
143
The Nutanix Bible
Book of
AHV
PART IV
4.1 Architecture
4.1.1 Node Architecture
In AHV deployments, the Controller VM (CVM) runs as a VM and disks are presented using
PCI passthrough. This allows the full PCI controller (and attached devices) to be passed
through directly to the CVM and bypass the hypervisor. AHV is based upon CentOS KVM.
The AHV is built upon the CentOS KVM foundation and extends its base functionality to
include features like HA, live migration, etc.
AHV is validated as part of the Microsoft Server Virtualization Validation Program and is
validated to run Microsoft OS and applications.
145
The Nutanix Bible
4.1.4 Networking
AHV leverages Open vSwitch (OVS) for all VM networking. VM networking is configured
through Prism / ACLI and each VM nic is connected into a tap interface.
The following figure shows a conceptual diagram of the OVS architecture:
146
Book of AHV
Example:
Once the local Stargate comes back up (and begins responding to the NOP OUT commands),
the iSCSI redirector will perform a TCP kill to kill all connections to remote Stargates. QEMU
will then attempt an iSCSI login again and will be redirected to the local Stargate.
If the Acropolis Master is running remotely, the same VXLAN tunnel will be leveraged to
handle the request over the network.
Traditional DHCP / IPAM solutions can also be leveraged in an unmanaged network scenario.
4.2.3 VM High Availability (HA)
AHV VM HA is a feature built to ensure VM availability in the event of a host or block outage.
In the event of a host failure the VMs previously running on that host will be restarted
on other healthy nodes throughout the cluster. The Acropolis Master is responsible for
restarting the VM(s) on the healthy host(s).
The Acropolis Master tracks host health by monitoring its connections to the libvirt on all cluster hosts:
Pro tip
Use reserve hosts when:
You have homogenous clusters (all hosts DO have the same amount of RAM)
Consolidation ratio is higher priority than performance
Use reserve segments when:
You have heterogeneous clusters (all hosts DO NOT have the same amount of RAM)
Performance is higher priority than consolidation ratio
149
The Nutanix Bible
Pro tip
You can override or manually set the number of reserved failover hosts with the
following ACLI command:
In the event of a host failure VM(s) will be restarted on the reserved host(s):
If the failed host comes back the VM(s) will be live migrated back to the original host to
minimize any data movement for data locality:
150
Book of AHV
Pro tip
Keep your hosts balanced when using segment based reservation. This will give the
highest utilization and ensure not too many segments are reserved.
In the event of a host failure VM(s) will be restarted throughout the cluster on the remaining
healthy hosts:
151
The Nutanix Bible
expect to have 0.25 extra overhead for the common case in future versions. Today,
the fragmentation overhead varies between 0.5 and 1 giving a total overhead of 1.5-2
per configured host failure.
When using a segment based reservation there are a few key constructs that come
in to play:
Segment size = Largest running VMs memory footprint (GB)
Most loaded host = Host running VMs with most memory (GB)
Fragmentation overhead = 0.5 - 1
Based upon these inputs you can calculate the expected number of reserved segments:
Reserved segments = (Most loaded host / Segment size) x (1 + Fragmentation
overhead)
4.3 Administration
More coming soon!
152
Book of AHV
Example:
ovs-appctl bond/show bond0
153
The Nutanix Bible
Go to networking page
2 Networking
154
Book of vSphere
Book of
vSphere
PART V
5.1 Architecture
5.1.1 Node Architecture
In ESXi deployments, the Controller VM (CVM) runs as a VM and disks are presented using
VMDirectPath I/O. This allows the full PCI controller (and attached devices) to be passed
through directly to the CVM and bypass the hypervisor.
5.1.3 Networking
Each ESXi host has a local vSwitch which is used for intra-host communication between the
Nutanix CVM and host. For external communication and VMs a standard vSwitch (default) or
dvSwitch is leveraged.
The local vSwitch (vSwitchNutanix) is for local communication between the Nutanix CVM
and ESXi host. The host has a vmkernel interface on this vSwitch (vmk1 - 192.168.5.1) and
the CVM has a interface bound to a port group on this internal switch (svm-iscsi-pg -
192.168.5.2). This is the primary storage communication path.
155
The Nutanix Bible
The external vSwitch can be a standard vSwitch or a dvSwitch. This will host the external
interfaces for the ESXi host and CVM as well as the port groups leveraged by VMs on the
host. The external vmkernel interface is leveraged for host management, vMotion, etc. The
external CVM interface is used for communication to other Nutanix CVMs. As many port
groups can be created as required assuming the VLANs are enabled on the trunk.
The following figure shows a conceptual diagram of the vSwitch architecture:
For both the full and fast file clones, a DSF fast clone is done, meaning a writable snapshot
(using re-direct on write) for each clone that is created. Each of these clones has its own
block map, meaning that chain depth isnt anything to worry about. The following will
determine whether or not VAAI will be used for specific scenarios:
Clone VM with Snapshot > VAAI will NOT be used
Clone VM without Snapshot which is Powered Off > VAAI WILL be used
Clone VM to a different Datastore/Container > VAAI will NOT be used
Clone VM which is Powered On > VAAI will NOT be used
156
Book of vSphere
157
The Nutanix Bible
5.3 Administration
5.3.1 Important Pages
More coming soon!
# Example
for i in `hostips`;do echo $i && ssh root@$i esxcli software vib install
-d /vmfs/volumes/NTNX-upgrade/update-from-esxi5.1-5.1_update01.zip;done
Performing a rolling reboot of ESXi hosts: For PowerCLI on automated hosts reboots
158
Book of vSphere
Install VIB
Description: Install a vib without checking the signature
esxcli software vib install --viburl=/<VIB directory>/<VIB name> --no-sig-check
# OR
esxcli software vib install --depoturl=/<VIB directory>/<VIB name> --no-sig-check
159
The Nutanix Bible
Book of
Hyper-V
PART VI
6.1 Architecture
6.1.1 Node Architecture
In Hyper-V deployments, the Controller VM (CVM) runs as a VM and disks are presented
using disk passthrough.
6.1.3 Networking
Each Hyper-V host has a internal only virtual switch which is used for intra-host
communication between the Nutanix CVM and host. For external communication and VMs a
external virtual switch (default) or logical switch is leveraged.
The internal switch (InternalSwitch) is for local communication between the Nutanix CVM
and Hyper-V host. The host has a virtual ethernet interface (vEth) on this internal switch
(192.168.5.1) and the CVM has a vEth on this internal switch (192.168.5.2). This is the primary
storage communication path.
The external vSwitch can be a standard virtual switch or a logical switch. This will host
160
Book of Hyper-V
the external interfaces for the Hyper-V host and CVM as well as the logical and VM
networks leveraged by VMs on the host. The external vEth interface is leveraged for host
management, live migration, etc. The external CVM interface is used for communication to
other Nutanix CVMs. As many logical and VM networks can be created as required assuming
the VLANs are enabled on the trunk.
The following figure shows a conceptual diagram of the virtual switch architecture:
6.3 Administration
6.3.1 Important Pages
More coming soon!
162
Book of Hyper-V
JK
Afterword
Thank you for reading The Nutanix Bible!
Stay tuned for many more upcoming updates and enjoy the Nutanix platform!
GH
163
The Nutanix Bible
164