VVR Concepts
VVR Concepts
VVR Concepts
February 2005
TABLE OF CONTENTS
Table of Contents ............................................................................................................... 3 Overview ............................................................................................................................. 4 Understanding the need for replication ............................................................................ 4 Replication Concepts.......................................................................................................... 5
Synchronous Replication .......................................................................................................... 5 Asynchronous Replication........................................................................................................ 6 Consistent VS. Current............................................................................................................. 6 Write Order Fidelity................................................................................................................. 6
OVERVIEW
VERITAS Volume Replicator can effectively and efficiently provide protection from disaster scenarios by replicating data to another location(s). VERITAS Volume Replicator allows organizations to replicate their data between any storage devices, over a standard IP connection, and across any distance for the ultimate in disaster recovery protection. This document provides current and potential VERITAS Storage Foundation users with a solid understanding of the VERITAS Volume Replicator (VVR) product. It is intended to provide an architectural/technical overview, as well as a product capability understanding.
The two sides of the disaster represent a unique set of problems. RPO represents lost data. All record of any business transaction during this time is lost. Imagine the case of selling a car in which the lost data identifies customers that have not paid for vehicles they have already received. Good for the customer, bad for the business. At the same time, this could be a customer paying for a vehicle, but their payment record is lost. This is bad for the customer and future business. RTO is a separate problem because it represents the time required to get back in business. RTO represents opportunity revenue, as well as a potential major hit in customer confidence and brand acceptance.
Wks
Days
Hrs
Mins
Secs
Secs
Mins
Hrs
Days
Wks
Recovery Point
Tape Backup Periodic Replication
Recovery Time
Clustering
Manual Migration
Tape Restore
Ideally, we need to shrink the RPO and RTO windows as much as possible within budgetary constraints. In a perfect world, RPO and RTO are zero. Using replication, we can maintain a current data copy at a remote site that is not hours or days behind, but minutes, seconds or, best of all, no time behind. Using replication can effectively remove the RPO portion of the problem. Using VVR, and VERITAS Cluster Server together to create a wide area failover solution, we can reduce RTO to the absolute minimum to start up the application at the remote side. This is no longer a case of Disaster Recovery, but actually Disaster Tolerance. Disaster Tolerance is effectively extending High Availability all the way out to the data center level and creating a Wide Area High Availability scheme.
REPLICATION CONCEPTS
Replication is a technology designed to maintain a duplicate data set on a completely independent storage system, possibly at a different geographical location. In most cases, this duplicate data set is updated automatically as the master, or primary, data set is updated. This differs from a tape backup and restore method, as replication is typically completely automatic and far less labor intensive. Replication is typically run in one of two modes. Synchronous replication ensures updates are posted at the primary and all configured secondaries prior to returning the write response to the application. Asynchronous replication immediately returns write completions to the calling application and queues the write data for later transmission to the secondaries.
Different technologies from various vendors address issues such as persistently queuing write data for asynchronous writes and maintaining the ordering of those writes in different ways. The following section explains various replication concepts in more detail.
SYNCHRONOUS REPLICATION
Synchronous replication ensures that a write update has been posted to the secondary node(s) and the primary before the write operation completes at the application level. This way, in the event of a disaster at the primary location, data recovered from any surviving secondary server is completely up-to-date
VERITAS ARCHITECT NETWORK
because all servers share the exact same data state. Synchronous replication produces full data currency, but may impact application performance in high latency or limited bandwidth situations. Synchronous replication is most effective in application environments with low update rates, but can be deployed in write-intensive environments where high bandwidth, low latency network connections are available.
ASYNCHRONOUS REPLICATION
During asynchronous replication, application updates are written at the primary, and queued for forwarding to each secondary host as network bandwidth allows. When the writing application experiences temporary surges in update rate, this queue may grow. Unlike synchronous replication, the writing application does not suffer from the response time degradation caused by each update incurring the cost of a network round trip. During periods when the update rate is less than the available network bandwidth, this queue drains faster than it grows, allowing the secondary data state to catch up rapidly with that of the primary. Asynchronous replication with adequate available bandwidth can help remove latency from a replication solution, but still provide near real-time updates. Application write operations are immediately acknowledged and updated data is sent to the remote site at nearly the same time. During an actual primary site outage, only those transactions on the wire are lost.
Data is consistent if the system or application using it can be successfully restarted to a known, useable state. For example, if the data being replicated is used by a database, the data is consistent if the database can be started and recovered to a useable state with zero data corruption. If the data contains a file system, the data is consistent if the file system check utility can be run and recover with no file system corruption. Ideally we want the secondary to be an exact duplicate of the primary. If this is not possible, we need the secondary to be an exact duplicate, minus some known amount of data updates. For example, in a database environment, one or more of the most recently committed transactions on the primary might be missing, but the secondary is in the exact state the primary was in prior to those transactions being applied. This means data must be applied to the secondary in exactly the same order it is applied at the primary. Data blocks cannot be applied out of order, nor can partial updates be allowed. In order to provide a consistent data set at a secondary, the replication solution must provide write order fidelity, explained below.
grouping data volumes so the update order in the group is preserved within and among all secondary volume copies. Hardware level replication solutions typically lack the ability to maintain write order fidelity when running in asynchronous mode. This is because they lack a persistent queue of writes that have not yet been sent to the secondary. If a replication solution does not log each write in the order it is received, it is impossible to maintain write order fidelity in asynchronous mode. Because of this, VVR necessarily logs each write in the order it is received. Other approaches that do not do this cannot genuinely provide asynchronous support, as they only track changed writes. If the user wishes to remove latency penalties imposed by synchronous replication, they lose recoverability on the remote site (rendering the DR copy essentially useless).
VVR extends the concept of a Disk Group and provides the concept of a Replicated Volume Group (RVG). An RVG is a volume subset within a given VxVM Disk Group, configured for replication to one or more secondary systems (up to 32 secondary systems can be configured to replicate one primary). Volumes that are associated with an RVG and contain application data are called data volumes. Data volumes are replicated Volume Manager volumes and are distinct from Storage Replicator Log (SRL) volumes, which are described later. The RVG data volumes are under the control of an application, such as a Database Management System, that requires write-order fidelity among the volume updates. Write ordering is strictly maintained within an RVG during replication to ensure that each remote volume is always consistent, both internally and with all other volumes of the group. Each RVG can have a maximum of 1023 data volumes. An RVG can be a primary or a secondary. A primary RVG receives write data from applications on the host it is running on and forwards that data to configured secondary RVGs over rlinks (described below). A secondary RVG receives data from its configured primary and writes it to the proper volumes. A secondary RVG does not imply a secondary system. A system can be a primary for some RVGs and secondary for others. This allows one site to be used as a hot standby for another, and visa versa.
RLINKS
An RLINK is a VVR Replication Link to a secondary RVG. Each RLINK on a Primary RVG represents the communication link from the Primary RVG to a corresponding Secondary RVG. An RLINK on a Secondary RVG represents the communication link from the Secondary RVG to the corresponding Primary RVG.
sequential accessed log. The data volume is a normal write, done in a lazy fashion, when it affects performance the least. Should the primary crash at any point, SRL data is fully recoverable. Asynchronous Application writes in asynchronous mode are first placed in the SRL, then immediately acknowledged to the calling application. Data is then sent as-soon-as-possible to the secondary RVGs, based on available bandwidth and the number of SRL outstanding writes. VVRs asynchronous replication therefore comprises the ideal choice for long-haul data mobility needs.
Empty
The simplest method starts with completely empty systems on each side. This can be done if VVR is installed while initially constructing a data center, prior to production. For VVR to use an empty data set, both sides must be identically empty. This can be achieved by creating volumes with the Initialize Zero setting from VMSA, or by copying /dev/zero into the volumes at each side. Over the wire (Autosync) Over-the-wire initialization essentially uses VVR to move all data from the primary to secondary over the rlink. Overall this is a very simple process, however, with larger data sets it can take a long time, especially if the primary is active while attempting to initialize the secondary. Local mirroring Local mirroring is an option for very large data sets. In this method, the data storage array for the secondary site is initially placed at the primary site and attached as a VxVM plex. Once the mirror is complete, the plex is split off and the array is shipped to the secondary site. This mode allows large amounts of data to be initialized at SAN speeds, as well as allowing subsequent data written during the shipping period to be spooled to the primary SRL. Hot backup and Checkpoint Checkpoint initialization is a truly unique VVR feature. This allows huge data set synchronization using tape technology and immediately replication initiation. This capability is unique to VVR and is not possible on hardware-based replication solutions. Checkpoint initialization is essentially a primary side hot backup, with the SRL providing the map of what was changed during the backup. When a checkpoint initialization commences, a check-start pointer is placed in the SRL. A full block-level backup is then taken of the primary volumes. When complete, a check-end pointer is placed into the SRL. The data written between the check-start and checkend pointers represents data that was modified while the backup was taking place. This constitutes a hot backup. The tapes are then transported to the secondary site and loaded (remember the old saying never underestimate the bandwidth of a station wagon full of tapes). When the tape load is complete, the secondary site is connected to the primary site with a checkpoint attach of the rlink. The primary then forwards any data that had been written during the backup (that data between the check-start and checkend). Once this data is written to the secondary, the secondary is an exact duplicate of the primary, at the time the backup completed. At this point the secondary is consistent, and simply out of date. The SRL is then replayed to bring the secondary up to date. This example again surfaces the difference between consistent versus up-to-date. Once the data is replayed between the checkpoints, the secondary is considered consistent, and is available as a recovery site. Once the SRL is replayed from the check-end to the end of the SRL, the secondary is then up-to-date.
Secondary/Network outage A secondary system outage or outage of the network to the secondary is identical as far as VVR is concerned. When the secondary is no longer available, the primary simply spools to the SRL. When the secondary is repaired or network problems are resolved, the SRL is drained to the secondary. The ability of VVR to tolerate extended outages is completely controlled by the size of the SRL. The ability to rapidly recover is governed by the size of the rlink pipe. For example, if the customer has a 3day outage and this results in 30 Gigabytes of data written, the SRL must be able to accommodate 30 Gigabytes, plus additional data while the SRL drains. The ability to rapidly move the 30 Gigabytes to the secondary is set by the rlink network capacity. Secondary Failure A secondary failure is better defined as a failure of the secondary storage, resulting in a data loss on the secondary side. There are several methods to recover from a secondary failure. The first is to rebuild the secondary storage and re-initialize using one of the methods discussed above. The second method is to take regular backups of the secondary environment using a VVR feature called Secondary Checkpoints. Secondary Checkpoints allow a pointer to be placed in the primary SRL to designate a location where a backup was last taken on the secondary. Assuming the primary has a large enough SRL and secondary backups are routinely taken, a failure of the secondary is repairable by reloading the last backup and rolling the SRL forward from the last secondary checkpoint. Primary Failure A failure of the primary can be broken into several possible problems. A complete failure of the primary site is handled by promotion of a secondary to a primary, affecting a disaster recovery takeover. This is exactly the scenario VVR is built for. For primary outages, such as server failure, or server panic, the customer has the choice to wait for the primary to recover, or shift operations to a secondary. For situations involving actual data loss at the primary, the customer can shift operations to a secondary, or restore data on the primary.
10
dary takeover, the original primary must be synchronized to be an exact duplicate of the new primary. Data present on the old primary that was not sent to the new primary is lost. When a secondary comes up in a takeover mode, it maps all writes to all data volumes in Data Change Maps (DCM). The use of DCM allows the original primary to be synched with the new primary with a minimum amount of data transfer.
11
Time of day
In the next example, we see a case where data update rates average more than available rlink bandwidth, resulting in an SRL that continues to fill and eventually overflows.
Time of day
In the final example, we have a case where rlink bandwidth exceeds write rate for all but a small portion of the day. This results in a very small window where the secondary is actually behind any significant amount.
12
Dual T1 example (1.4 Gigabyte/hour), with data rates averaging more than 1 Gig/hour
SRL, Pipe and Writes in Gigabytes 2.5 2 1.5 1 0.5 0
0 :0 0 :0 0 0 :0 0 :0 6: 8: 12 14 :0 18 20 10 16 :0 0 00 00
Time of day
LATENCY CONSIDERATIONS
Probably more important than bandwidth is latency. In the VVR context, this refers to the amount of time to send a packet to the secondary and receive acknowledgement, discounting VVR overhead. In a synchronous environment, each application write must spool to the SRL, be sent to the secondary and acknowledged by the secondary before the write complete is sent to the calling application. As distances between sites increases, so does latency. Distance is not the real issue, but rather infrastructure between sites that induces latency. As a common example, consider using replication from New York to New Jersey. A direct path from Manhattan to Jersey City is no more than 5 miles, well within range of single mode fiber. If one purchases SONET services from a major provider, actual distance traveled could be much greater, as it follows whatever infrastructure the provider uses. (Traditional Multi-Megabit service has focused on available bandwidth, and not latency. So a New York to New Jersey could easily route through available fiber through Boston). In the same setup, satellite services travel 52,000 miles! This example shows that the primary/secondary geographic distance is not the governing factor, but rather the intermediate infrastructure. Actual data traveling at near speed of light over fiber has very low latency. But intermediate switches, repeaters, routers, etc. can add significant latency.
13
One of the most compelling real-world VVR features is its ability to maintain full consistency at secondary sites while operating in asynchronous mode. Maintaining write order fidelity in asynchronous mode allows VVR to truly exploit potential asynchronous replication performance benefits. By providing a high bandwidth connection, enterprises can completely remove the replication latency penalty and still maintain near up-to-the-second data at the remote site. At the primary site, applications are acknowledged as soon as data is placed in the SRL. Hence, application processing continues normally. The data is then sent out almost immediately over the rlink to the secondary site(s). With adequate bandwidth, the SRL does not fill, so the actual data outstanding between primary and secondary is realistically whatever data is currently on the wire. This means enterprises can have near up-to-the-second replication, at arbitrary distances, without application penalties.
14
2005, Jim Senicka, All rights reserved. Used with permission of Jim Senicka.
15
For additional information about VERITAS Software, its products, VERITAS Architect Network, or the location of an office near you, please call our corporate headquarters or visit our Web site at www.veritas.com.
2005 VERITAS Software Corporation. All rights reserved. VERITAS, the VERITAS Logo, Volume Replicator , VERITAS Storage Foundation, and FlashSnap are trademarks or registered trademarks of VERITAS Software Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners.
17