VVR Concepts

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Understanding VERITAS Volume Replicator

Understanding VERITAS Volume Replicator


By Jim Senicka VERITAS Technical Product Management VERITAS Software Corporation

February 2005

VERITAS ARCHITECT NETWORK

Understanding VERITAS Volume Replicator

TABLE OF CONTENTS
Table of Contents ............................................................................................................... 3 Overview ............................................................................................................................. 4 Understanding the need for replication ............................................................................ 4 Replication Concepts.......................................................................................................... 5
Synchronous Replication .......................................................................................................... 5 Asynchronous Replication........................................................................................................ 6 Consistent VS. Current............................................................................................................. 6 Write Order Fidelity................................................................................................................. 6

Technical requirements of a replication solution............................................................. 7 VVR Overview and Components ....................................................................................... 7


Replicated Volume Groups ...................................................................................................... 7 Storage Replicator Log ............................................................................................................. 8 Rlinks ......................................................................................................................................... 8

VVR technical details......................................................................................................... 8


Operational Modes and data flow ........................................................................................... 8
Synchronous.......................................................................................................................................... 8 Asynchronous ....................................................................................................................................... 9

Initializing Secondary Systems ................................................................................................ 9


Empty.................................................................................................................................................... 9 Over the wire (Autosync)...................................................................................................................... 9 Local mirroring ..................................................................................................................................... 9 Hot backup and Checkpoint.................................................................................................................. 9

Recovery after problems......................................................................................................... 10


Secondary/Network outage ................................................................................................................. 10 Secondary Failure ............................................................................................................................... 10 Primary Failure ................................................................................................................................... 10

VVR Role Changes.................................................................................................................. 10


Primary Migration............................................................................................................................... 10 Secondary Takeover............................................................................................................................ 10

Using the Secondary System .................................................................................................. 11

VVR in the customer environment .................................................................................. 11


Understanding Bandwidth Needs .......................................................................................... 11 Latency Considerations .......................................................................................................... 13 Effects of VVR on Application Performance........................................................................ 13 Using Asynchronous Replication to Decouple Application Latency .................................. 13

About the Author.............................................................................................................. 15

VERITAS ARCHITECT NETWORK

Understanding VERITAS Volume Replicator

OVERVIEW
VERITAS Volume Replicator can effectively and efficiently provide protection from disaster scenarios by replicating data to another location(s). VERITAS Volume Replicator allows organizations to replicate their data between any storage devices, over a standard IP connection, and across any distance for the ultimate in disaster recovery protection. This document provides current and potential VERITAS Storage Foundation users with a solid understanding of the VERITAS Volume Replicator (VVR) product. It is intended to provide an architectural/technical overview, as well as a product capability understanding.

UNDERSTANDING THE NEED FOR REPLICATION


In todays business environment, information processing system reliance continues to grow on an almost daily basis. Information systems, which once aided a company in doing business, have become the business itself. As companies become more reliant on critical information systems, potential business disruption due to an information system loss increases. Over the past several years, numerous technologies have emerged to provide solid protection against individual component systems failure within data centers. State of the art products providing storage management, data protection (backup and recovery) and application clustering provide an excellent defense against component or system failures at the local level. Just as the level of protection available at the local level has grown, so has the reliance on the information systems being protected. Many companies now realize that local protection is no longer adequate protection. Loss of a complete data center or information processing facility would so greatly affect business capability that protection must be established at the data center level. This loss can stem from simple issues such as power or cooling failure, to natural disasters such as fire and flooding, to acts of terrorism or war. Many companies have implemented significant Disaster Recovery (DR) schemes to protect against the complete loss of a facility. Complete plans exist to recover voice and data capability in a remote location. One common issue among many DR plans is the information processing recovery plan. For many years companies have been taking regular data backups at the primary data center, then duplicating these tapes on a regular basis for shipment offsite to the DR facility. There is a significant problem with DR solutions based on offsite backup tape copies. This plan, considered more than adequate a few years ago, does not consider the shear reliance on todays information systems. Companies not only need information systems online, but also need accurate, up-to-theminute information. Imagine a typical DR scheme built around offsite protection using duplicate tapes. At regular intervals, complete data backups are created at the local site. When the backups complete, copies of these tapes are made and sent to the DR site. This typically occurs, at best, every day and, worst case, up to a week between shipments. This means that following a primary data center disaster, data processing can resume at the DR facility with data a minimum of 24 hours out of date! Companies then face an issue where information systems are restored after some number of hours or days, depending on the time required to load all backup data, and, at that time, can begin processing based on data that was out of date when the disaster occurred. In many businesses, such as electronic commerce, online banking and others, this is categorically unacceptable. To examine this problem closer, refer to the diagram below. Recovery Point Objective (RPO) represents the data currency at the backup site. This point details the amount of data that is lost if a disaster occurs. At the center, we have the disaster. (This can be any type of disaster, including system loss due to power, flooding, cooling or any other problem). Recovery Time Objective (RTO) represents the time it takes to restore online services, using the remote site data.

VERITAS ARCHITECT NETWORK

Understanding VERITAS Volume Replicator

The two sides of the disaster represent a unique set of problems. RPO represents lost data. All record of any business transaction during this time is lost. Imagine the case of selling a car in which the lost data identifies customers that have not paid for vehicles they have already received. Good for the customer, bad for the business. At the same time, this could be a customer paying for a vehicle, but their payment record is lost. This is bad for the customer and future business. RTO is a separate problem because it represents the time required to get back in business. RTO represents opportunity revenue, as well as a potential major hit in customer confidence and brand acceptance.

Wks

Days

Hrs

Mins

Secs

Secs

Mins

Hrs

Days

Wks

Recovery Point
Tape Backup Periodic Replication

Recovery Time
Clustering

Asynchronous Replication Synchronous Replication / Mirroring

Manual Migration

Tape Restore

Ideally, we need to shrink the RPO and RTO windows as much as possible within budgetary constraints. In a perfect world, RPO and RTO are zero. Using replication, we can maintain a current data copy at a remote site that is not hours or days behind, but minutes, seconds or, best of all, no time behind. Using replication can effectively remove the RPO portion of the problem. Using VVR, and VERITAS Cluster Server together to create a wide area failover solution, we can reduce RTO to the absolute minimum to start up the application at the remote side. This is no longer a case of Disaster Recovery, but actually Disaster Tolerance. Disaster Tolerance is effectively extending High Availability all the way out to the data center level and creating a Wide Area High Availability scheme.

REPLICATION CONCEPTS
Replication is a technology designed to maintain a duplicate data set on a completely independent storage system, possibly at a different geographical location. In most cases, this duplicate data set is updated automatically as the master, or primary, data set is updated. This differs from a tape backup and restore method, as replication is typically completely automatic and far less labor intensive. Replication is typically run in one of two modes. Synchronous replication ensures updates are posted at the primary and all configured secondaries prior to returning the write response to the application. Asynchronous replication immediately returns write completions to the calling application and queues the write data for later transmission to the secondaries.

Different technologies from various vendors address issues such as persistently queuing write data for asynchronous writes and maintaining the ordering of those writes in different ways. The following section explains various replication concepts in more detail.

SYNCHRONOUS REPLICATION
Synchronous replication ensures that a write update has been posted to the secondary node(s) and the primary before the write operation completes at the application level. This way, in the event of a disaster at the primary location, data recovered from any surviving secondary server is completely up-to-date
VERITAS ARCHITECT NETWORK

Understanding VERITAS Volume Replicator

because all servers share the exact same data state. Synchronous replication produces full data currency, but may impact application performance in high latency or limited bandwidth situations. Synchronous replication is most effective in application environments with low update rates, but can be deployed in write-intensive environments where high bandwidth, low latency network connections are available.

ASYNCHRONOUS REPLICATION
During asynchronous replication, application updates are written at the primary, and queued for forwarding to each secondary host as network bandwidth allows. When the writing application experiences temporary surges in update rate, this queue may grow. Unlike synchronous replication, the writing application does not suffer from the response time degradation caused by each update incurring the cost of a network round trip. During periods when the update rate is less than the available network bandwidth, this queue drains faster than it grows, allowing the secondary data state to catch up rapidly with that of the primary. Asynchronous replication with adequate available bandwidth can help remove latency from a replication solution, but still provide near real-time updates. Application write operations are immediately acknowledged and updated data is sent to the remote site at nearly the same time. During an actual primary site outage, only those transactions on the wire are lost.

CONSISTENT VS. CURRENT


Data that is current or up-to-date contains all the latest changes made at the primary processing site. For example, if you are replicating a database, the most recently committed transaction is available on the secondary. Whether the data on the secondary must always be current, or not, is a business decision. It is a consequence of choosing between synchronous and asynchronous replication. Synchronous mode guarantees that the data on the secondary is up-to-date, at the cost of application performance Asynchronous mode does not guarantee data currency, but does provide maximum application availability and the ability to utilize more cost effective telecommunications.

Data is consistent if the system or application using it can be successfully restarted to a known, useable state. For example, if the data being replicated is used by a database, the data is consistent if the database can be started and recovered to a useable state with zero data corruption. If the data contains a file system, the data is consistent if the file system check utility can be run and recover with no file system corruption. Ideally we want the secondary to be an exact duplicate of the primary. If this is not possible, we need the secondary to be an exact duplicate, minus some known amount of data updates. For example, in a database environment, one or more of the most recently committed transactions on the primary might be missing, but the secondary is in the exact state the primary was in prior to those transactions being applied. This means data must be applied to the secondary in exactly the same order it is applied at the primary. Data blocks cannot be applied out of order, nor can partial updates be allowed. In order to provide a consistent data set at a secondary, the replication solution must provide write order fidelity, explained below.

WRITE ORDER FIDELITY


To use the secondary in a disaster recovery scenario, data written at the secondary must faithfully track the primary. It can be behind in time, but must be a consistent image of the primary at some point in the past. This is called write order fidelity. Without write order fidelity, there is no guarantee a secondary will have consistent recoverable data. In a database environment, both the log and data spaces of a database management system both update in a fixed sequence. The log and data space are usually in different volumes, and the data itself can span several additional volumes. A well designed replication solution needs to safeguard write order fidelity consistently. This may be accomplished by logically
VERITAS ARCHITECT NETWORK

Understanding VERITAS Volume Replicator

grouping data volumes so the update order in the group is preserved within and among all secondary volume copies. Hardware level replication solutions typically lack the ability to maintain write order fidelity when running in asynchronous mode. This is because they lack a persistent queue of writes that have not yet been sent to the secondary. If a replication solution does not log each write in the order it is received, it is impossible to maintain write order fidelity in asynchronous mode. Because of this, VVR necessarily logs each write in the order it is received. Other approaches that do not do this cannot genuinely provide asynchronous support, as they only track changed writes. If the user wishes to remove latency penalties imposed by synchronous replication, they lose recoverability on the remote site (rendering the DR copy essentially useless).

TECHNICAL REQUIREMENTS OF A REPLICATION SOLUTION


Architecturally, a complete solution must provide an exact copy of all data at the primary site, including database files as well as any other necessary binary and control files. This data must be accurate and recoverable. The replication solution must provide an up-to-the-second copy, or as near as possible, without unduly effecting primary application operations. This means the replication solution must not inject unacceptable latency into primary application I/O traffic. The replication solution must be capable of being configured to support a secondary site(s) in an acceptable location. This means different businesses, operating in different areas have different geographical requirements for disaster protection. The replication solution must operate at distances of hundreds or thousands of miles without adding undue cost or complexity. This means the replication solution MUST provide asynchronous support, over a long distance, without additional high cost items such as communication converters or additional disk space for staging data. The replication solution must provide a consistent primary image at the secondary, regardless of choosing Synchronous or Asynchronous replication.

VVR OVERVIEW AND COMPONENTS


VVR is a logical extension of VERITAS Volume Manager, found in VERITAS Storage Foundation. It allows VERITAS Volume Manager volumes on one system to be exactly replicated to identically-sized volumes on another system over an IP network. For example, an Oracle database may use several different volumes for various tablespaces, redo logs, archived redo logs, indexes and other storage. Each component is typically stored in a VxVM volume, or multiple volumes. The database may also use data stored in VERITAS VxFS file systems, created in these volumes for better manageability. VVR can provide an exact duplicate of these volumes on another system, at another site. The data in the secondary volumes is always kept consistent, meaning the application can use the data as it is written. This requires the secondary copy is a faithful copy in terms of what volumes are written, in what order. This is very important in a DR mode, as the copy must be capable of being brought up as soon as possible. VVR is architecture-independent, which means it can replicate data between any storage platforms from any manufacturer. For example, expensive enterprise storage arrays using RAID 1+0 can be replicated to lower cost arrays at the remote site using more economical RAID 5. The only requirement is matching VxVM volume sizes on each side. Replication occurs over a standard IP link, removing the need for expensive proprietary network hardware.

REPLICATED VOLUME GROUPS


VERITAS ARCHITECT NETWORK

Understanding VERITAS Volume Replicator

VVR extends the concept of a Disk Group and provides the concept of a Replicated Volume Group (RVG). An RVG is a volume subset within a given VxVM Disk Group, configured for replication to one or more secondary systems (up to 32 secondary systems can be configured to replicate one primary). Volumes that are associated with an RVG and contain application data are called data volumes. Data volumes are replicated Volume Manager volumes and are distinct from Storage Replicator Log (SRL) volumes, which are described later. The RVG data volumes are under the control of an application, such as a Database Management System, that requires write-order fidelity among the volume updates. Write ordering is strictly maintained within an RVG during replication to ensure that each remote volume is always consistent, both internally and with all other volumes of the group. Each RVG can have a maximum of 1023 data volumes. An RVG can be a primary or a secondary. A primary RVG receives write data from applications on the host it is running on and forwards that data to configured secondary RVGs over rlinks (described below). A secondary RVG receives data from its configured primary and writes it to the proper volumes. A secondary RVG does not imply a secondary system. A system can be a primary for some RVGs and secondary for others. This allows one site to be used as a hot standby for another, and visa versa.

STORAGE REPLICATOR LOG


All data writes destined for volumes configured for replication are first persistently-queued in a log called the Storage Replicator Log. VVR implements the SRL at the primary side to store all changes for transmission to the secondary(s). The SRL is a VxVM volume configured as part of an RVG. The SRL gives VVR the ability to associate writes to specific volumes within the replicated configuration in a specific order, maintaining write order fidelity at the secondary. All writes sent to the VxVM volume layer, whether from an application such as a database writing directly to storage, or an application accessing storage via a file system are faithfully replicated in application write-order to the secondary.

RLINKS
An RLINK is a VVR Replication Link to a secondary RVG. Each RLINK on a Primary RVG represents the communication link from the Primary RVG to a corresponding Secondary RVG. An RLINK on a Secondary RVG represents the communication link from the Secondary RVG to the corresponding Primary RVG.

VVR TECHNICAL DETAILS


OPERATIONAL MODES AND DATA FLOW
Synchronous In Synchronous mode, all data writes first post to the SRL, and are then sent, in parallel, out all configured synchronous rlinks. When all Synchronous rlinks receive an acknowledgement from their secondaries, the application is acknowledged. The secondary acknowledges receipt as soon as the full write transaction (as sent by the application on the primary) is received into VVR kernel memory space. This removes actual secondary data volume writing from the application latency. The primary tracks these writes in its SRL until a second acknowledgement is received from the secondary, signalling that the data has been written to physical storage. Both acknowledgements have an associated timeout. If a packet is not acknowledged, VVR resends. Note: It is incorrect to assert VVR waits for the secondary data to be written to disk. In reality, VVR does not wait for data to be written at the secondary, only received. This improves application performance. But it tracks all acknowledged, uncommitted transactions and can replay any necessary transactions if the secondary crashes prior to actually writing the data. On the primary side, data is written as two independent writes to the SRL and actual data volumes. The actual data volume is written in the background asynchronously to maximize application write performance. Once again, about it is incorrect to regard VVR writing twice as a performance issue. VVR does write twice, but only once where it affects application latency. The SRL write is a very fast write to a
VERITAS ARCHITECT NETWORK

Understanding VERITAS Volume Replicator

sequential accessed log. The data volume is a normal write, done in a lazy fashion, when it affects performance the least. Should the primary crash at any point, SRL data is fully recoverable. Asynchronous Application writes in asynchronous mode are first placed in the SRL, then immediately acknowledged to the calling application. Data is then sent as-soon-as-possible to the secondary RVGs, based on available bandwidth and the number of SRL outstanding writes. VVRs asynchronous replication therefore comprises the ideal choice for long-haul data mobility needs.

INITIALIZING SECONDARY SYSTEMS


In order to begin changed block replication, a replication solution must first begin with a known duplicate data set at each site. VVR offers several ways to initialize secondary systems.

Empty
The simplest method starts with completely empty systems on each side. This can be done if VVR is installed while initially constructing a data center, prior to production. For VVR to use an empty data set, both sides must be identically empty. This can be achieved by creating volumes with the Initialize Zero setting from VMSA, or by copying /dev/zero into the volumes at each side. Over the wire (Autosync) Over-the-wire initialization essentially uses VVR to move all data from the primary to secondary over the rlink. Overall this is a very simple process, however, with larger data sets it can take a long time, especially if the primary is active while attempting to initialize the secondary. Local mirroring Local mirroring is an option for very large data sets. In this method, the data storage array for the secondary site is initially placed at the primary site and attached as a VxVM plex. Once the mirror is complete, the plex is split off and the array is shipped to the secondary site. This mode allows large amounts of data to be initialized at SAN speeds, as well as allowing subsequent data written during the shipping period to be spooled to the primary SRL. Hot backup and Checkpoint Checkpoint initialization is a truly unique VVR feature. This allows huge data set synchronization using tape technology and immediately replication initiation. This capability is unique to VVR and is not possible on hardware-based replication solutions. Checkpoint initialization is essentially a primary side hot backup, with the SRL providing the map of what was changed during the backup. When a checkpoint initialization commences, a check-start pointer is placed in the SRL. A full block-level backup is then taken of the primary volumes. When complete, a check-end pointer is placed into the SRL. The data written between the check-start and checkend pointers represents data that was modified while the backup was taking place. This constitutes a hot backup. The tapes are then transported to the secondary site and loaded (remember the old saying never underestimate the bandwidth of a station wagon full of tapes). When the tape load is complete, the secondary site is connected to the primary site with a checkpoint attach of the rlink. The primary then forwards any data that had been written during the backup (that data between the check-start and checkend). Once this data is written to the secondary, the secondary is an exact duplicate of the primary, at the time the backup completed. At this point the secondary is consistent, and simply out of date. The SRL is then replayed to bring the secondary up to date. This example again surfaces the difference between consistent versus up-to-date. Once the data is replayed between the checkpoints, the secondary is considered consistent, and is available as a recovery site. Once the SRL is replayed from the check-end to the end of the SRL, the secondary is then up-to-date.

VERITAS ARCHITECT NETWORK

Understanding VERITAS Volume Replicator

RECOVERY AFTER PROBLEMS


VVR is very robust in terms of tolerating network connectivity outages to the secondary. The SRL provides the key to recovering from outages, while maintaining a consistent secondary.

Secondary/Network outage A secondary system outage or outage of the network to the secondary is identical as far as VVR is concerned. When the secondary is no longer available, the primary simply spools to the SRL. When the secondary is repaired or network problems are resolved, the SRL is drained to the secondary. The ability of VVR to tolerate extended outages is completely controlled by the size of the SRL. The ability to rapidly recover is governed by the size of the rlink pipe. For example, if the customer has a 3day outage and this results in 30 Gigabytes of data written, the SRL must be able to accommodate 30 Gigabytes, plus additional data while the SRL drains. The ability to rapidly move the 30 Gigabytes to the secondary is set by the rlink network capacity. Secondary Failure A secondary failure is better defined as a failure of the secondary storage, resulting in a data loss on the secondary side. There are several methods to recover from a secondary failure. The first is to rebuild the secondary storage and re-initialize using one of the methods discussed above. The second method is to take regular backups of the secondary environment using a VVR feature called Secondary Checkpoints. Secondary Checkpoints allow a pointer to be placed in the primary SRL to designate a location where a backup was last taken on the secondary. Assuming the primary has a large enough SRL and secondary backups are routinely taken, a failure of the secondary is repairable by reloading the last backup and rolling the SRL forward from the last secondary checkpoint. Primary Failure A failure of the primary can be broken into several possible problems. A complete failure of the primary site is handled by promotion of a secondary to a primary, affecting a disaster recovery takeover. This is exactly the scenario VVR is built for. For primary outages, such as server failure, or server panic, the customer has the choice to wait for the primary to recover, or shift operations to a secondary. For situations involving actual data loss at the primary, the customer can shift operations to a secondary, or restore data on the primary.

VVR ROLE CHANGES


Role changes are actions to swap the primary role for a replicated volume group to a system that was previously a secondary. This can be due to an outage disabling or destroying the primary, where a secondary is used to continue operations, or simply a role reversal to allow a secondary site to take over operations. Primary Migration A migration of primary to secondary systems is a controlled shift of primary responsibility for an RVG. Data is flushed from the existing primary SRL if necessary, and then control is handed to the existing secondary. The original primary is demoted to a secondary, and the original secondary is promoted to a primary. This very simple operation is accomplished with a single vradmin command initiated by the operator or VERITAS Cluster Server and allows rapid shift of replication primary between sites. There is zero chance for data loss, as all data outstanding at the primary site is sent to the secondary prior to allowing the migration to take place. Secondary Takeover A secondary takeover is a somewhat more violent event, in that the secondary is promoted to a primary without a corresponding demotion of the original primary. When a takeover is accomplished, the secondary is brought up in read-write mode in the exact state it was in at time of takeover. Any data written by the primary in ASYNC mode and not sent to the secondary is not available. After a seconVERITAS ARCHITECT NETWORK

10

Understanding VERITAS Volume Replicator

dary takeover, the original primary must be synchronized to be an exact duplicate of the new primary. Data present on the old primary that was not sent to the new primary is lost. When a secondary comes up in a takeover mode, it maps all writes to all data volumes in Data Change Maps (DCM). The use of DCM allows the original primary to be synched with the new primary with a minimum amount of data transfer.

USING THE SECONDARY SYSTEM


One of the more common VVR questions is Can I use the replicated volumes at the secondary site? The answer is no, not while volumes are a part of an ongoing replication process. Most people understand that the volumes at the secondary site cannot be written to, as data corruption results. What is not understood is why the volumes may not be used in a Read-Only mode. VVR is forwarding all writes from the primary site to the secondary. At the secondary site, these writes are being written directly to the VxVM volumes by VVR, without interaction with any other software layer. For example, at the primary side, a VxFS file system may be sitting on top of the volume in an RVG. Writes occurring to this volume happen through VxFS, so the file system is aware that changes are occurring on the underlying volume. At the secondary site, VVR is changing the volumes directly, based on writes received from the primary. If someone mounts a file system in read-only mode, it would mount acceptably, but would be unusable. The file system would not see changes to the underlying data in the volumes and would present inconsistent data to the requesting application. Imagine an application requesting data about a specific customer. Part of the data may be in memory from a previous read request, so the file system uses the cached copy. It then performs a read to disk for the remainder of the data. This data may be newer data written by VVR to the volumes. In a database environment, the database itself caches blocks in memory. If a request were processed that involved cached blocks and newly read blocks, an inconsistent response to a query results. The overall issue is one of cache consistency. Applications (including VxFS) cache data in memory to improve performance. VVR is replicating below the applications at the volume level and has no way to invalidate the application cache when it writes new data to the volumes. For this reason, VVR volumes on a secondary system may not be used in any way while replication is active. This limitation is identical on any hardware based replication solution as well. In order to use the data in the secondary RVG, a point-in-time copy must be made to allow a static view of the storage. VERITAS offers the capability to create full mirror break-offs which can be used on the secondary host or any other SAN attached host at the secondary site. Customers can also choose to use Space Optimized Snapshots (SOS). A Space-Optimized Snapshot uses Copy On Write capability to keep a point in time view on a small cache volume. For more info on Point In Time solutions for use with VVR data, please see your VERITAS sales team.

VVR IN THE CUSTOMER ENVIRONMENT


UNDERSTANDING BANDWIDTH NEEDS
A very common question is How much bandwidth do I need? The answer is enough. Adequate bandwidth must be available to move all write traffic to each secondary site in any given time period. For example, if 10 Gigabytes of data are written in a 24 hour period, then enough bandwidth must be provided to move 10 Gigabytes of data in 24 hours. The SRL can be used to spool data during time periods when write traffic exceeds replication bandwidth, but this data must drain at some future point, or the SRL eventually overflows. In the following chart, we see a case where update rate heavily exceeds rlink pipe size during the business day. This results in the SRL filling over the course of the day and draining at night. This may be acceptable, knowing that an outage during peak hours will result in a fairly substantial data loss.

VERITAS ARCHITECT NETWORK

11

Understanding VERITAS Volume Replicator

SRL fill example 1


SRL Depth in Gigabytes 100 80 60 40 20 0
1: 00 3: 00 5: 00 7: 00 9: 0 11 0 :0 13 0 :0 15 0 :0 17 0 :0 19 0 :0 21 0 :0 23 0 :0 0

Data Written in Gigabyte/Hour RLINK Pipesize SRL fill depth

Time of day

In the next example, we see a case where data update rates average more than available rlink bandwidth, resulting in an SRL that continues to fill and eventually overflows.

T1 example, with data rates averaging more than 1 Gig/hour


5 SRL Depth in Gigabytes 4 3 2 1 0
8: 00 10 :0 0 12 :0 0 14 :0 0 16 :0 0 18 :0 0 20 :0 0 6: 00

Data Written in Gigabyte/Hour RLINK Pipesize SRL fill depth

Time of day
In the final example, we have a case where rlink bandwidth exceeds write rate for all but a small portion of the day. This results in a very small window where the secondary is actually behind any significant amount.

VERITAS ARCHITECT NETWORK

12

Understanding VERITAS Volume Replicator

Dual T1 example (1.4 Gigabyte/hour), with data rates averaging more than 1 Gig/hour
SRL, Pipe and Writes in Gigabytes 2.5 2 1.5 1 0.5 0
0 :0 0 :0 0 0 :0 0 :0 6: 8: 12 14 :0 18 20 10 16 :0 0 00 00

Data Written in Gigabyte/Hour RLINK Pipesize SRL fill depth

Time of day

LATENCY CONSIDERATIONS
Probably more important than bandwidth is latency. In the VVR context, this refers to the amount of time to send a packet to the secondary and receive acknowledgement, discounting VVR overhead. In a synchronous environment, each application write must spool to the SRL, be sent to the secondary and acknowledged by the secondary before the write complete is sent to the calling application. As distances between sites increases, so does latency. Distance is not the real issue, but rather infrastructure between sites that induces latency. As a common example, consider using replication from New York to New Jersey. A direct path from Manhattan to Jersey City is no more than 5 miles, well within range of single mode fiber. If one purchases SONET services from a major provider, actual distance traveled could be much greater, as it follows whatever infrastructure the provider uses. (Traditional Multi-Megabit service has focused on available bandwidth, and not latency. So a New York to New Jersey could easily route through available fiber through Boston). In the same setup, satellite services travel 52,000 miles! This example shows that the primary/secondary geographic distance is not the governing factor, but rather the intermediate infrastructure. Actual data traveling at near speed of light over fiber has very low latency. But intermediate switches, repeaters, routers, etc. can add significant latency.

EFFECTS OF VVR ON APPLICATION PERFORMANCE


VVR typically has very little effect on application performance (on the order of 3-5%) and in many cases will actually result in a net performance increase for an application. All data written for an application is written to a sequential log, with very little disk head movement. The actual write performance for a given application may actually show an increase, as the log method used by the SRL may be more efficient that writing to the actual data locations in the underlying volumes. As we have seen, there is no significant performance hit from VVR writing twice as evidenced by real world installations exhibiting little to no performance degradation. A second write does occur, but it has no effect on the application since it is performed by the operating system in the background.

USING ASYNCHRONOUS REPLICATION TO DECOUPLE APPLICATION LATENCY

VERITAS ARCHITECT NETWORK

13

Understanding VERITAS Volume Replicator

One of the most compelling real-world VVR features is its ability to maintain full consistency at secondary sites while operating in asynchronous mode. Maintaining write order fidelity in asynchronous mode allows VVR to truly exploit potential asynchronous replication performance benefits. By providing a high bandwidth connection, enterprises can completely remove the replication latency penalty and still maintain near up-to-the-second data at the remote site. At the primary site, applications are acknowledged as soon as data is placed in the SRL. Hence, application processing continues normally. The data is then sent out almost immediately over the rlink to the secondary site(s). With adequate bandwidth, the SRL does not fill, so the actual data outstanding between primary and secondary is realistically whatever data is currently on the wire. This means enterprises can have near up-to-the-second replication, at arbitrary distances, without application penalties.

VERITAS ARCHITECT NETWORK

14

Understanding VERITAS Volume Replicator

About the Author


Jim Senicka has been with VERITAS Software for over 5 years and is presently the Director of Technical Product Management for VERITAS clustering products on UNIX platforms. His primary role is future product feature definition and product interoperability. He has also been a VERITAS Software Enterprise Consulting Services enterprise architect and high availability practice lead. In this prior role, Jim provided vision and direction for creating and managing programs targeted at increasing VERITAS client availability and operational efficiency. Before joining VERITAS Software, Jim was a high availability product expert at Auspex Systems and Silicon Graphics working with Auspex ServerGuard and SGI Failsafe. In his spare time, among other passions such as flying radio controlled airplanes, Jim can usually be found racing motorcycles on closed circuit road courses with his track day organization, the Northeast Sport Bike Association. Below is a picture of Jim pulling away from yet another hapless competitor.

2005, Jim Senicka, All rights reserved. Used with permission of Jim Senicka.

VERITAS ARCHITECT NETWORK

15

Understanding VERITAS Volume Replicator

VERITAS Software Corporation


Corporate Headquarters 350 Ellis Street Mountain View, CA 94043 650-527-8000 or 866-837-4827

For additional information about VERITAS Software, its products, VERITAS Architect Network, or the location of an office near you, please call our corporate headquarters or visit our Web site at www.veritas.com.

2005 VERITAS Software Corporation. All rights reserved. VERITAS, the VERITAS Logo, Volume Replicator , VERITAS Storage Foundation, and FlashSnap are trademarks or registered trademarks of VERITAS Software Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners.

VERITAS ARCHITECT NETWORK

17

You might also like