Ldom Ovm Zones Arch
Ldom Ovm Zones Arch
Ldom Ovm Zones Arch
January 2014
Introduction ....................................................................................... 1
Overview ........................................................................................... 1
Architectural Principals and Design Goals ......................................... 5
The DRAM-Centric Hybrid Storage Pool ............................................ 6
Storage Object Structures and Hierarchy .......................................... 9
ZFS Data Protection ........................................................................ 12
ZFS Data Reduction ........................................................................ 14
Snapshot and Related Data Services .............................................. 15
Other Major Data Services .............................................................. 16
File Protocols ............................................................................... 16
Shadow Migration ........................................................................ 17
Block Protocols ............................................................................ 17
NDMP for Backup ........................................................................ 17
Management, Analytics, and Diagnostic Tools ................................ 17
Conclusion ...................................................................................... 18
Related Links ................................................................................... 18 Architectural Overview of the Oracle ZFS
Storage Appliance
Introduction
The Oracle ZFS Storage Appliance is a multiprotocol enterprise storage system designed to accelerate
application performance and simplify management efficiency in a budget-friendly manner. The Oracle
ZFS Storage Appliance is suitable for a wide variety of workloads in heterogeneous vendor environments
and is best-of-breed multiprotocol storage in any environment. Additionally, because of collaborative coengineering within Oracle, the Oracle ZFS Storage Appliance offers even more unique storage
performance and efficiency advantages in Oracle-on-Oracle environments. See "Realizing the Superior
Value of the Oracle ZFS Storage Appliance" to learn more about the business value of the Oracle ZFS
Storage Appliance. The purpose of this white paper, however, is to explore the architectural details of the
Oracle ZFS Storage Appliance, examine how it works from a high level, and explain why this approach
was taken to develop a unique enterprise storage product that is able to drive extreme performance and
efficiency advantages at an affordable cost.
Overview
To deliver high performance and advanced data services, the Oracle ZFS Storage Appliance uses a
combination of standard enterprise-grade hardware and a unique, storage-optimized operating system
based on the Oracle Solaris kernel with Oracle's ZFS file system at its core. The storage controllers are
based upon powerful Sun x86 Servers that can deliver the exceptional compute power required to
concurrently run multiple modern storage workloads along with advanced data services. Each Oracle ZFS
Storage Appliance system can be configured as a single controller or as a dual-controller system. In the
case of a dual-controller system, two identical storage controllers work as a cluster, monitoring one
another so that a single controller can take over the storage resources managed by the other controller in
the event of a controller failure. Dual-controller systems are required when implementing a high
availability (HA) environment.
Each controller ingests and sends the traffic from and to the storage clients via a high-performance
network. Ethernet, Fibre Channel, and InfiniBand connectivity options are Architectural Overview of the Oracle ZFS Storage
Appliance
supported for this front-end traffic. The controller then handles the computations required to implement
the selected data protection (i.e. mirroring, RAIDZ), data reduction (i.e. inline compression, deduplication),
and any other relevant data services (i.e. remote replication). The controllers also handle the caching of
stored data in both DRAM and flash. Our unique caching algorithm is key to the spectacular performance
that can be obtained from an Oracle ZFS Storage Appliance. As it processes this traffic, the storage
controller then sends the data to or receives the data from the storage media. A SAS fabric is used for this
back-end controller connectivity.
The disk/flash pools reside in enterprise-grade SAS drive enclosures. These drive enclosures contain
either high-speed or high-capacity SAS spinning disks, along with SAS SLC SSDs, which are used to
stage random writes so that they can be transferred sequentially to the spinning disks, thus accelerating
performance.
Both the controllers and drive enclosures have been configured with availability as the foremost thought.
Redundancy is built in to all systems, with features like dual power supplies, SAS loops, and redundant
OS boot drives.
Figure 1: Highlights of some key hardware features of an Oracle ZFS Storage ZS3-2 system
Storage Appliance
Current hardware models are the Oracle ZFS Storage ZS3-2 (for midrange enterprise storage workloads)
and the Oracle ZFS Storage ZS3-4 (for high-end enterprise storage workloads),. For specific details of the
current Oracle ZFS Storage Appliance systems hardware and configuration options, see the Oracle
storage product documentation pages.
All Oracle ZFS Storage Appliance systems run the same enterprise storage OS. This storage OS offers
multiple data protection layouts, end-to-end checksumming to prevent silent data corruption, and an
advanced set of data services, including compression, snapshot and cloning, remote replication, and
many others. Analytics is one of the most compelling and unique features of the Oracle ZFS Storage
Appliance and is a rich user interface to DTrace, a technology that runs within Oracle Solaris. This
analytics feature can probe anywhere along the data pipeline, giving unique end-to-end visibility of the
process with the ability to drill down on attributes of interest.
Figure 2: A sampling of important data services available on the Oracle ZFS Storage Appliance.
Finally, all of these data services can be managed by an advanced management framework, available as
either a command line interface (CLI) or a browser user interface (BUI). The BUI Architectural Overview of the Oracle
ZFS Storage Appliance
incorporates an advanced analytics environment based on DTrace, which runs within the OS on the
storage controller itself, offering unparalleled end-to-end visibility of key metrics.
For information on the current Oracle ZFS Storage Appliance hardware specifications and options, as well
as the latest listing of data services, please review the product data sheet.
.Architectural Overview of the Oracle ZFS Storage Appliance
5
compression), and manage the appliances advanced automatic data tiering, all while
simultaneously maintaining excellent throughput and transactional performance
characteristics. This is also a reason the Oracle ZFS Storage Appliance delivers high
performance in high burst, random I/O environments like virtualization.
To use hardware most effectively and cost efficiently in transactional workloads, the
ZFS file system is employed to manage traffic to and from clients in a way that isolates
it from the latency penalties associated with spinning disks. This caching, or autotiering, approach is referred to as the Hybrid Storage Pool architecture. Hybrid
Storage Pool is an exclusive feature of the Oracle ZFS Storage Appliance.
It is important to note that, for the read path, all data resides on spinning disk,
whether cached or uncached. That is to say, the automated caching done by the Hybrid
Storage Pool puts duplicate sets of blocks in ARC or L2ARCthe data is also safely
stored on disk. This is important because, in the event of a controller failure, all data is
protected because it is persistently stored on spinning disk. In a dual-controller
system, upon failure of a pools primary controller, the second controller can take over
the pool disk resources and have access to all data and serve the reads just as the first
controller would have done. Any cached reads are checksummed against the persistent
storage to ensure that any changed blocks are updated before serving the read to the
client. Architectural Overview of the Oracle ZFS Storage Appliance
8
The write path is handled differently. Incoming writes to the appliance initially land in
DRAM. (Clients can potentially issue writes to storage either synchronously or
asynchronously, although for most enterprise client OSs, hypervisors, and applications,
writes are generally requested synchronously for data protection and consistency
reasons.) Asynchronous writes are acknowledged as complete to the client
immediately upon landing in DRAM. While this results in extremely low latency, it is
risky from a consistency and data integrity standpoint because, should something fail
after acknowledgement of a write-complete but before destaging the write to
persistent storage, there is the opportunity for that write to be lost and, potentially, a
loss of consistency of the data set. For this reason, most writes are requested
synchronously. Synchronous writes to the Oracle ZFS Storage Appliance are not
acknowledged immediately upon landing in DRAM. Instead, they are acknowledged
only once they are persistently stored on disk or flash. The ZFS file system has a
mechanism for reducing write latency known as the ZFS Intent Log (ZIL). On the
Oracle ZFS Storage Appliance, the storage device that is used to hold the ZIL is a lowlatency, high-durability SLC SSD. There is a logbias that can be set under the share
properties as either latency or throughput depending on the particular workload
that an administrator expects a particular share to experience.
For latency workloads, such as redo log shares for transactional databases and certain
VM environments, the logbias = latency share setting is selected so that the ZIL is
enabled and writes are immediately copied from the system DRAM buffer into the SLC
SSD. Once stored on the SSD, the write is persistent and therefore is acknowledged as
write-complete to the client. The SSD accumulates these random writes and every five
seconds the contents of the system DRAM buffer are flushed to spinning disk and the
space used by the copy on SSD is freed. A failure of an SSD containing write data not
yet flushed to disk does not impact data integrity since the data is still in DRAM. The
copy of the write on the SSD is there so that if the DRAM contents should be lost due
to power outage or component failure, then the contents can be retrieved from the
persistent SSD upon redundant controller takeover or primary controller restoration.
This content is then placed back into the controller's DRAM (a process known as ZIL
replay) and ultimately migrated back to spinning disk. Additionally, SSDs can be
mirrored for additional data protection. Inside of this specialized write- Architectural Overview of the
Oracle ZFS Storage Appliance
9
optimized SSD are three primary components: the NAND flash used to persistently
store the ZIL data, a DRAM buffer to stage data entering the SSD before transferring
to the NAND flash, and a super capacitor that is designed to provide power to allow
flushing the DRAM buffer to the NAND flash in the event of power supply loss to the
SSD before the buffer is cleared to flash. The SSD contains the embedded circuitry
required to ensure persistent storage of the acknowledged data and handle any flash
error corrections needed to complete the restore upon power resumption. In this
manner, the SSD serves as a mechanism to persistently and safely stage writes and
accelerate write-complete acknowledgement, dramatically reducing write latency
without risking the integrity of the data set.
For throughput or streaming workloads, such as query-intensive database workloads
or media streaming, latency is not as critical as data transfer rates. In these workloads,
the logbias = throughput share setting should be used so that writes bypass the ZIL,
thus skipping the SSD and going straight from the controller DRAM buffer to spinning
disk. (Contrary to common misperception, groups of spinning disks are actually faster
than SSDs for throughput workloads.) In this case, once stored persistently on
spinning disk, the write-complete acknowledgement is delivered to the client.
With this architecture, reads are mostly cached in DRAM for optimal read
performance, and writes are handled either to optimize latency performance or
throughput performance, all while ensuring data integrity and persistency. The
performance benefits of the Oracle ZFS Storage Appliance are well documented and
independently verified. Oracle periodically publishes Storage Performance Councils
SPC-1 and SPC-2 benchmark results, as well as Standard Performance Evaluation
Corporations SPECsfs benchmark results to demonstrate performance results for the
Oracle ZFS Storage Appliance. Visit Storage Performance Councils website
(www.storageperformance.org) and Standard Performance Evaluation Corporations
website (www.spec.org) for the latest independently audited, standardized storage
benchmark results for the Oracle ZFS Storage Appliance and for results of many
competitors.
The linkage between physical hardware resources and the rest of the logical objects
that comprise ZFS is referred to as a pool. Disks, write flash-accelerating SSDs, and
read-accelerating SSDs are physically grouped together in these physical pools of
resources. Within a given pool, the storage devices all are subject to the same layout
(i.e., stripe, mirror, RAID) and are managed by the same assigned storage controller.
(Thus, in a dual-controller cluster, for an active/active setup, users must provision the
physical devices in at least two distinct pools so that each active controller can manage
at least one pool.) Each pool can contain multiple ZFS virtual devices (vdevs). For
example, if a single parity RAID layout is selected for the pool, a 3+1 stripe width is
used based upon the built-in best practice, and each vdev will contain four disks. The
Oracle ZFS Storage Appliance OS effectively masks the vdevs from the user, however,
and hot spares (unused disks that are preinstalled and ready for use as a
replacement in case of failure of an active data disk) are also provisioned automatically
based upon built-in best practices. Therefore, all the administrator need be concerned
with in terms of physical components is the pool level. (Note that ARC is shared across
all poolsthe Hybrid Storage Pool read tiering algorithm is global and not associated
to any particular pool for the ARC, but L2ARC is assigned by pool.) Architectural Overview of the Oracle
ZFS Storage Appliance
11
Once a pool has been created with the admin-specified devices and layout, that pool is
the basic physical resource that the system and the administrator have to work with.
Everything within the pool is essentially thin provisioned up to the limitations of that
pool. For example, each file system and each LUN associated with that pool will have a
maximum available capacity equal to the remaining formatted capacity of the whole
pool, nominally. File systems and LUNs have various share setting options, however, so
a share quota can be established so that the capacity within that share cannot exceed a
threshold and can therefore not consume more than its quota of the pool. A share
reservation also can be set such that the other shares in the pool cannot occupy more
of the pool than is possible without infringing on the reserved capacity. However, if a
share reservation is set to a number and the shares quota is set to that same number,
then thin provisioning is, in effect, defeated this is how an admin would create thick
provisioned shares, if desired.
After the pool, the next hierarchical data structure is known as the project. Think of a
project as a sort of a template for creating shares. Shares have many settings that can
be optimized for different use cases, and projects provide a way to make a template so
that shares can be easily provisioned for same or similar use cases repeatedly over
time without undue effort. For example, a typical Oracle ZFS Storage Appliance might
have three concurrently running workloads: user home directories in SMB shares, VM
files in NFS shares, and some LUNs associated with different e-mail or collaboration
software. A project could be created to group the shares separately for each of these
use cases. As new VM servers are deployed, or as new user home directories are
added, or as new e-mail or collaboration accounts are created, an admin might want to
create a new share that is totally new and empty, but has the raw properties that the
other similar shares utilize. To do so, the user will simply create a new share under the
correct project. (Once a share has been created based upon a project, the settings of
the individual share can be simply edited unique to that project, if needed, without
changing the project. In Architectural Overview of the Oracle ZFS Storage Appliance
12
other words, just because the user started with the template does not imply that the
user has to conform to it later. If the user changes the settings at the share level, that
share will have a setting that is distinct from the rest of the shares in that project. If,
however, the user changes a setting at the project level, then all shares in that project
will inherit the new setting.) Note that projects are completely thin provisioned in that
they have no notion of sizeit is simply a set of attributes that apply to their
associated shares. Thus, if physical capacity is added to a pool, the projects in that
pool are not impacted in any way. It simply increases the total available capacity for all
files, all shares, and all projects associated with the pool.
Figure 7: The ZFS approach to checksums can detect more types of errors than the traditional approach.
This robust protection from data corruption has yielded a surprising track record: the
Oracle ZFS Storage Appliance has surpassed 100 million hours in production with no
known data corruption errors to date.
The Oracle ZFS Storage Appliance also provides robust protection from hardware
failures. The most typical hardware failure in enterprise storage is, of course, disk
failures. ZFS provides multiple options for protection from disk failures. An Oracle ZFS
Storage Appliance administrator will form pools of physical disks when provisioning
storage. Each pool has a layout assigned to it at creation that defines how data could
be protected. Available layout options are stripe (no media failure protection), mirror
(single disk failure of a paired-set protection), triple-mirror (dual disk failure of a
triple-set protection), RAID Z1 single-parity raid (single disk failure protection within a
four-disk set), RAID Z2 dual-parity raid (dual disk failure protection within a 9, 10, or
12-disk set, depending on pool drive count), and RAID Z3 (triple-disk failure protection
within a multiple disk set, where stripe width varies depending on pool disk count).
Mirroring tends to offer the highest performance for latency-sensitive Architectural Overview of the
Oracle ZFS Storage Appliance
14
compute overhead, and because implementing inline compression means that the
traffic across the SAS fabric between the controller and the disks is compressed. The
net effect is increased bandwidth utilization and, often, a net increase in performance
versus using no compression at all (though this is not always the case).
GZIP-2, GZIP, and GZIP-9 are standard compression algorithms that are commonly
used industry wide in a variety of IT implementations (www.gzip.org provides more
information from the developers of these algorithms). These options have been
implemented as options on the Oracle ZFS Storage Appliance and can provide higher
compression than LZJB, but typically with greater compute resource utilization. In
systems that have significant excess compute capacity relative to the workload, or in
environments where data reduction is more important than maximum performance, it
is often satisfactory to implement GZIP or even GZIP-9. GZIP-2 strikes a compromise
between LZJB and GZIP for many customers.
ZFS deduplication is another data reduction available on the Oracle ZFS Storage
Appliance. This option must be used with caution. It uses a hash table to reference
duplicate ZFS blocks, meaning that each write that generates unique ZFS blocks
produces another entry in the table, and each read requires a lookup on the table. This
can become very computationally intensive in some workloads as the hash table grows
over time. Streaming workloads are particularly ill suited for ZFS deduplication. Thus,
ZFS deduplication should never be used for Oracle Recovery Manager (Oracle RMAN)
backups, for example. Oracle RMAN is a feature of Oracle Database. (The Oracle ZFS
Storage Appliance is an excellent option for Oracle RMAN backup workloads, but
deduplication is not recommended on the Oracle ZFS Storage Appliance for this
particular use case. Other superior data reduction options are advised, such as the
ZFS compression options; Oracle RMAN compression; Hybrid Columnar
Compression, ; and others.) ZFS deduplication can be a good option, however, in
workloads that do not cause the hash table to continuously grow, such as certain boot
image deduplication situations. If using ZFS deduplication, it is advisable to segregate
workloads by project, using deduplication only on the workloads for which it works
well and putting the other workloads on other shares that use compression instead.
With proper usage of the data reduction options, in concert with the variety of hostbased data reduction options available, excellent data reduction rates can be achieved
along with optimized performance. Oracle has many documented solutions and best
practices for a variety of environments, particularly Oracle Database environments, to
optimize overall system performance and data reduction. Additional information is
available on Oracle Technology Network page, the Oracle Optimized Solutions page, or
from an Oracle Sales Consultant.
mouse clicks. The BUI also offers a built-in visual, industry-leading analytics package
based on D-Trace that runs in the storage controller OS itself. This analytics package
gives unparalleled visibility into the entire storage stackall the way down to disk and
all the way up to client network interfaces, including cache statistics, CPU metrics, and
many other parameters. This analytics package is extremely useful to identify any
bottlenecks and tune the overall system for optimal performance. It is also very helpful
in troubleshooting situations along with a client system administrator as the
information can help the storage administrator to distinguish clearly upstream issues.
This is particularly useful, for example, in large-scale server virtualization
environments where one VM out of thousands has or is causing a performance issue
that needs to be discretely addressed. For further information on analytics, please see
the Analytics Guide on the Oracle ZFS Storage Appliance Product Documentation
website.
The Oracle ZFS Storage Appliance also offers an advanced scripting language,
ECMAScript, which is based on JavaScript. Workflows can be used to store scripts in
the appliance, and unlike many competitive frameworks that run on external machines,
the Oracle ZFS Storage Appliance can take inputs from users or other systems
workflows are executed. Workflows can be invoked via either the BUI or CLI, or by
system alerts and timers. Scripting is a powerful way for administrators to automate
complex but repetitive tasks allowing for customization. For more information on
scripting, visit this white paper on the topic.
Conclusion
The Oracle ZFS Storage Appliance is designed to extract maximum storage
performance from standard enterprise-grade hardware while providing robust data
protection, management simplicity, and compelling economics. The unique
architecture, based upon Hybrid Storage Pool, and the wide variety of advanced data
services make the Oracle ZFS Storage Appliance an excellent choice for a wide variety
of enterprise storage workloads that demand high performance.
Related Links
Main Oracle ZFS Storage Appliance website: http://www.oracle.com/zfsstorage
Oracle Technology Network Oracle ZFS Storage Appliance page:
http://www.oracle.com/technetwork/server-storage/sun-unifiedstorage/overview/index.html
Product Data Sheet: http://www.oracle.com/us/products/serversstorage/storage/nas/zs3-ds-2008069.pdf
Business Value White Paper: http://www.oracle.com/us/products/serversstorage/storage/nas/resources/zfs-sa-businessvaluewp-final-1845658.pdf
Copyright 2014, Oracle and/or its affiliates. All rights reserved.
Analyst White Paper on Oracle Integration:
This document is provided for information purposes only, and the contents hereof are
to change without notice. This document is not warranted to be error-free, nor
http://www.oracle.com/us/products/servers- subject
subject to any other warranties or conditions, whether expressed orally or implied in law,
including implied warranties and conditions of merchantability or fitness for a particular
storage/storage/nas/esg-brief-analystpurpose. We specifically disclaim any liability with respect to this document, and no
paper-2008430.pdf Architectural Overview of the Oracle ZFS
contractual obligations are formed either directly or indirectly by this document. This
Storage Appliance January, 2014 Author: Bryce Cracco, Product Manager
Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA
94065 U.S.A.
Worldwide Inquiries: Phone: +1.650.506.7000 Fax: +1.650.506.7200
oracle.com
document may not be reproduced or transmitted in any form or by any means, electronic
or mechanical, for any purpose, without our prior written permission.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names
may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All
SPARC trademarks are used under license and are trademarks or registered trademarks
of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo
are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a
registered trademark of The Open Group. 0113