Veritas File System Performance
Veritas File System Performance
Veritas File System Performance
VERITAS Software Corporation 1600 Plymouth Street Mountain View, CA 94043 1.800.258.8649 in the USA 415.335.8000 FAX 415.335.8050 E-mail: [email protected] World Wide Web: http://www.veritas.com/ VERITAS, the VERITAS logo, VxFS, VxVM, FirstWatch and VERITAS FirstWatch are registered trademarks of VERITAS Software Corporation. VxServerSuite and VxSmartSync are trademarks of VERITAS Software Corporation. Other product names mentioned herein may be trademarks and/or registered trademarks of their respective companies. 1996 VERITAS Software Corporation. All rights reserved. 11/96
Table of Contents
Table of Contents......................................................................................................................................i Introduction ..............................................................................................................................................3 Benchmark Testing platform.............................................................................................................4 Benchmark Program - VERITAS vxbench .......................................................................................5 Benchmarks used in this Report .......................................................................................................7 VERITAS File System .............................................................................................................................9 Journaling..........................................................................................................................................9 Performance Components................................................................................................................9 Extent Based Allocation.........................................................................................................................11 Buffered File System Benchmark...................................................................................................11 Buffered Aged File System Benchmark..........................................................................................14 Cache Policies .......................................................................................................................................17 File System Alignment....................................................................................................................18 Direct I/O.........................................................................................................................................18 Discovered Direct I/O......................................................................................................................19 Controller Limited Benchmark - 4 way Stripe .................................................................................19 Disk Limited Benchmark - 8 way Stripe ..........................................................................................22 Tuning UFS File Systems ...............................................................................................................26 Tuning UFS ..............................................................................................................................28 Passing the GB/sec Barrier.............................................................................................................28 Quick I/O for Database ..........................................................................................................................31 File Systems and Databases ..........................................................................................................31 UNIX raw I/O ...................................................................................................................................31 raw I/O Limitations....................................................................................................................32 Quick I/O for Database ...................................................................................................................32 Multiple Thread Random I/O Benchmarks .....................................................................................32 Direct I/O vs. Quick I/O for Database ....................................................................................................35 Direct I/O vs. Quick I/O for Database Benchmarks ........................................................................35 Conclusion .............................................................................................................................................41
Performance
VERITAS Software
ii
Chapter
Introduction
The VERITAS storage management product line has been designed in response to the needs of commercial computing environments. These are the systems that are now supporting mission critical applications, and playing a major role in corporate computing services. Typical environments include online transaction processing systems, both inter as well as intra-networked database servers, and high performance file services. VERITAS specializes in systems storage management technology which encompasses a product line that offers high performance, availability, data integrity, and integrated, online administration. VERITAS provides three complementary products: VERITAS FirstWatch, VERITAS Volume Manager, and the VERITAS File System. VERITAS FirstWatch is a system and application failure monitoring and management system that provides high-availability for mission-critical applications. FirstWatch dramatically increases server application availability through the use of duplicate cluster monitoring processes on each node in a FirstWatch system pair. These monitoring processes communicate with each other over dedicated, duplicated heartbeat links. VERITAS Volume Manager is a virtual disk management system providing features such as mirroring, striping, disk spanning, hot relocation and I/O analysis. The VERITAS Visual Administrator exists as a graphical interface to VERITAS Volume Manager offering visual representation and management of the virtual disk subsystem including drag and drop features and simple or complex device creation. The VERITAS File System is a high availability, high performance, commercial grade file system providing features such as transaction based journaling, fast recovery, extentbased allocation, and on-line administrative operations such as backup, resizing and defragmentation of the file system. This report describes the performance mechanics of the 2.3 version of the VERITAS File System. This report will provide a discussion of the key performance components in the VERITAS File System. This report will also present a series of benchmark tests comparing the throughput and CPU utilization of different VERITAS File System component technologies, as well as provide some tests comparing the VERITAS File System (VxFS) with the Solaris UNIX File System (UFS). VERITAS Software developed a series of benchmark tests for the express purpose of testing performance throughput of the installable file system, software component of the
VERITAS File System
Performance
UNIX OS. Since there are a number of components involved in computer file system, it was deemed necessary to develop a methodology, and later a program, to allow the configuration and running of a number of different I/O streams, in an effort to understand the role that file systems play in the overall file I/O model. The testing included assembling a hardware test platform that would give the testers the ability to induce hardware bottlenecks at some specific points in order to see the overall effect on file system throughput. For the sake of this testing the performance hardware bottleneck areas focused on are: CPU - For most tests the desire was to not create a CPU bottleneck, but rather allow enough system CPU cycles, and system RAM, to exist for testing so that the throughput emphasis could be analyzed elsewhere. I/O Controller - The controller bus utilized in the tests was a Fast/Wide SCSI bus with the theoretical throughput of 20mb/sec. In some tests the controller was saturated in order to determine the overall effect on file system I/O performance. Disk - The disks utilized in the tests were all Fast Wide Differential SCSI hard drives with an aggregate throughput of approximately 6.5mb/sec. In other tests the disks were saturated in order to determine their effect on file system I/O performance.
VERITAS Software
-w workload selects a type of I/O workload valid workloads are: read sequential read of the test files write sequential write of the test files rand_read random read of the test files rand_write random write of the test files rand_mixed mix of random reads and writes mmap_read use mmap to read the test files mmap_write use mmap to overwrite the test files -i subopts specify sub options describing test valid sub options are: nrep=n repeat the I/O loop in the test n times
5
File System Performance
VERITAS Software
nthreads=n iosize=n fsync remove iocount=n reserveonly maxfilesize randseed truncup rdpct=n
number of threads accessing each file size of each I/O do an fsync on the file after writing it remove each file after the test number of I/Os reserve space for the file but don't do I/O maximum offset in KB for random I/O tests seed value for random number generator set an initial file size for random I/O set read percentage of job mix for mixed tests
-o opentype specify flags for opening the file valid opentypes are: append use appending writes sync set the O_SYNC flag for synchronous file I/O trunc truncate the test files on open -c cacheopts specify VxFS caching advisories valid cache options are: direct use direct I/O to bypass the kernel cache dsync use data synchronous I/O noreuse set the VX_NOREUSE cache advisory random set the VX_RANDOM cache advisory seq set the VX_SEQ cache advisory -e extsize specify a fixed extent size -r reservation specify space reservation -f flags specify flags for reservation and fixed extents valid flags are: align require aligned extents chgsize set the file size to the reservation size contig require contiguous allocation noextend don't allow writes to extend the file noreserve allocate space but don't set file reservation trim trim reservation to file size on last close
Specifying multiple filenames will run tests in parallel to each file, thus simulating multiple simultaneous users. If multiple threads are also specified, then each simulated user will run multiple threads so the total number of I/O threads will be 'users * nthreads'. An example usage of vxbench would be as follows. If you wanted to measure I/O throughput of sequentially writing a 1024 MB file in 8 KB blocks, you would invoke vxbench as follows: ./vxbench -w write - i iosize=8k,iocount=128k /dev/vx/dsk/perfvol1 There is also a built-in help file that can be invoked by: ./vxbench -h.
VERITAS Software
CPU Measurements
A note regarding the CPU measurements reported in this paper. The vxbench program measures two types of CPU times: 1. time spent in the operating system (system time) 2. time spent in the application (user time) The way in which the measurements were reported in these tests was that both times were combined to come up with a single measurement of CPU impact. If the application time was reported as 10.5 seconds and the system time was 189.5 seconds, the final measurements would be reported as 200 CPU seconds. CPU utilization is, strictly speaking, this time divided by the elapsed time (which is not reported). The reason for using CPU seconds is to compare the relative CPU seconds per file system option when transferring the same amount of data.
VERITAS Software
VERITAS Software
Chapter
Journaling
The VERITAS File System employs a variation on the general file system logging or journaling technique by employing a circular intent log. All file system structure changes, or metadata changes, are written to this intent log in a synchronous manner. The file system will then periodically flush these changes out to their actual disk blocks. This increases performance by allowing all metadata writes to be written out to the permanent disk blocks, in an ordered manner out of the intent log. Because the journal is written synchronously, it may also be used to accelerate small (less than or equal to 8KB) synchronous write requests, such as those used for database logging. Writes of this class may be written to the journal, a localized sequential block store, before they are moved to their places in the larger file system; this can reduce head movement and decrease the latency of database writes. By using this intent log, the VERITAS File System can recover from system downtime in a fraction of the time. When the VERITAS File System is restarted in the same scenario, the system simply scans the intent log, noting which file system changes had completed and which had not, and proceed accordingly. In some cases, the VERITAS File System can roll forward changes to the metadata structures, because the changes were saved in the intent log. This adds availability and integrity to the overall file system.
Performance Components
The VERITAS File System has been developed with many of the latest industry file system performance improvements in mind. These improvements can be divided into the following feature categories: Extent Based Allocation
Performance
Unlike traditional UNIX file systems, which assign space to files one block at a time, the VERITAS File System allocates blocks in contiguous segments called extents. Extent sizes are chosen based on the I/O pattern of the file, or may be explicitly selected to suit the application. Extent-based allocation can accelerate sequential I/O by reducing seek and rotation time requirements for access, and by enabling drivers to pass larger requests to disks. Cache Policies The UNIX operating system supplies a standard asynchronous mode for writing to files, in which data is written through a write-back page cache in system memory, to accelerate read and write access. It also calls for a synchronous mode, which writes through the cache immediately, flushing all structures to disk. Both of these methods require data to be copied between user process buffer space to kernel buffers before being written to the disk, and copied back out when read. However, if the behavior of all processes that use a file is well-known, the reliability requirements of synchronous I/O may be met using techniques which offer much higher performance, often increasing file access to about the speed of raw disk access. The VERITAS File System provides two types of cache policies which enable these types of performance improvements. The first method is called Direct I/O, and using this method the VERITAS File System does not copy data between user and kernel buffers; instead, it performs file I/O directly into and out of user buffers. This optimization, coupled with very large extents, allows file accesses to operate at raw-disk speed. Direct I/O may be enabled via a program interface, via a mount option, or with the 2.3 version of the VERITAS File System, Direct I/O can be invoked automatically based upon the I/O block size. This feature is known as Discovered Direct I/O. The second cache policy available with the 2.3 version of the VERITAS File System is the Quick I/O for Database. While Direct I/O improves many types of large I/O performance, the single writer lock policy of the UNIX OS creates a performance bottleneck for some types of file system writes. Database application writes are particularly affected by this. Included in the VERITAS ServerSuite Database Edition 1.0, the Quick I/O for Database bypasses the single writer lock policy in the UNIX OS by representing files to applications as character devices. This allows database applications suited to utilizing raw partitions, the ability to operate like they are using a raw partition, on a file system. This combines the manageability of file systems with the performance of raw partitions.
VERITAS Software
10
Chapter
11
Performance
These buffered tests were done using the default UFS and VxFS parameters. The hardware disk configuration used was a single SCSI controller connected to 4 SCSI drives, creating a single 8 GB RAID-0 volume.
VERITAS Software
12
25000
VxFS
20000
KB/sec
15000
10000
UFS
5000
0
1 2 4 8 16 24 32 40 48 56
64
72
80
88
96
104
112
120
128
160
192
224
320
384
448
512
1024
UFS VxFS
256
2048
4096
8192
16384
32768
131072
262144
523072
1048576 1048576
50000
VxFS
40000
KB/sec
30000
20000
10000
UFS
0
1 2 4 8 16 24 32 40 48 56
64
72
80
88
96
104
112
120
128
160
224
256
320
384
UFS VxFS
192
448
512
1024
2048
4096
8192
16384
32768
131072
262144
523072
VERITAS Software
13
2097144
File Size KB
65536
2097144
File Size KB
65536
As the results indicate, both UFS and VxFS standard buffered read throughput begins to accelerate at the 512KB size range, however the increase for VxFS is much larger than UFS peaking at almost 26 MB/sec whereas UFS almost reaches the 12 MB/sec range. Read CPU utilization indicates a very similar curve between the two file systems. The buffered write results provide a similar picture on the right side of the table. These results show the same similar increase in throughput at the 512KB size, however the increase in VxFS throughput climbs to almost 50 MB/sec while UFS manages almost 16 MB/sec. Note that during the buffered write tests, the VERITAS File System, by default, reached its maximum limit of one half the system RAM for use as a buffer cache. Since our test platform system had 256 MB of memory, by default the VERITAS File System will limit itself to not using more than one half, or in this case 128 MB, of the installed memory for system buffers. If you were to add memory to this system, this would increase this ceiling, and increase the performance throughput measurements.
VERITAS Software
14
Results indicate that the fragmentation effect reduces the throughput for both UFS and VxFS. However, the results indicate that VxFS begins and maintains a higher read throughput rate and is less affected by the fragmentation. The CPU time curves are almost identical for both VxFS and UFS. (In the next chapter will illustrate the VxFS Discovered Direct I/O technology which results in much lower CPU utilization, in addition to boosting large block I/O performance.) The following graphs look at these test results:
VERITAS Software
15
VxFS
30000
25000
KB/sec
20000
UFS
15000
10000
5000
0
64k .5m 2m 1 1 1 1 1 1 64k 64k 1 2 2 2 .5m .5m 1 2 2 2 2m 1 2 2m 2 2 64k 64k 64k 1 2 3 3 3 3 .5m .5m .5m 1 2 3 3 3 3 2m 1 3 2m 2 3 2m 3 3 64k 64k 64k 64k 1 2 3 4 4 4 4 4 .5m .5m .5m .5m 1 2 3 4 4 4 4 4 2m 1 4 2m 2 4 2m 3 4 2m 4 4
UFS VxFS
30
25
CPU Seconds
20
15
10
0
64k .5m 2m 1 1 1 1 1 1 64k 64k 1 2 2 2 .5m .5m 1 2 2 2 2m 1 2 2m 2 2 64k 64k 64k 1 2 3 3 3 3 .5m .5m .5m 1 2 3 3 3 3 2m 1 3 2m 2 3 2m 3 3 64k 64k 64k 64k 1 2 3 4 4 4 4 4 .5m .5m .5m .5m 1 2 3 4 4 4 4 4 2m 1 4 2m 2 4 2m 3 4 2m 4 4
UFS VxFS
VERITAS Software
16
Chapter
Cache Policies
The UNIX operating system supplies a standard asynchronous mode for writing to files, in which data is written through a write-back page cache in system memory, to accelerate read and write access. It also calls for a synchronous mode, which writes through the cache immediately, flushing all structures to disk. Both of these methods require data to be copied between user process buffer space to kernel buffers before being written to the disk, and copied back out when read. As mentioned previously, the VERITAS File System provides two types of cache policies which enable the File System to circumvent the standard UNIX write-back page cache in system memory. The first cache policy is a feature called Direct I/O. VERITAS implemented this feature in their file system and it provides a mechanism for bypassing the UNIX system buffer cache while retaining the on disk structure of a file system. This optimization, coupled with very large extents, allows file accesses to operate at raw-disk speed. Direct I/O may be enabled via a program interface, via a mount option, or with the 2.3 version of the VERITAS File System, Direct I/O can be invoked automatically based upon the I/O size. This feature is known as Discovered Direct I/O. The second cache policy available with the 2.3 version of the VERITAS File System is the Quick I/O for Database. While Direct I/O improves many types of large I/O performance, the single writer lock policy of the UNIX OS creates a performance bottlenecks for some types of file system writes. Database application writes are particularly affected by this. Included in the VERITAS ServerSuite Database Edition 1.0, the Quick I/O for Database bypasses the single writer lock policy in the UNIX OS by representing files to applications as character devices. This allows database applications suited to utilizing raw partitions, the ability to operate like they are using a raw partition, on a file system. The most important requirement for implementing these two VERITAS File System cache policies is that all I/O requests must meet certain alignment criteria. This criteria is usually determined by the disk device driver, the disk controller, and the system memory management hardware and software. First the file offset must be aligned on a sector boundary. Next the transfer size must be a multiple of the disk sector size. Finally depending on the underlying driver, the application buffer may need to be aligned on a sector or page boundary, and subpage length requests should not cross page boundaries. The method for guaranteeing this requirement, as well as generally improving performance for RAID level volumes, is by utilizing a technique called file system alignment.
17
Performance
In the 2.3 version of the VERITAS File System this is done automatically if the disks are being managed by the 2.3 version of the VERITAS Volume Manager.
Direct I/O
VERITAS Software has developed a cache policy called Direct I/O in their file system and it provides a mechanism for bypassing the UNIX system buffer cache while retaining the on disk structure of a file system. The way in which Direct I/O works involves the way the system buffer cache is handled by the UNIX OS. In the UNIX operating system, once the type independent file system, or VFS, is handed a I/O request, the type dependent file system scans the system buffer cache, and verifies whether or not the requested block is in memory. If it is not in memory the type dependent file system manages the I/O processes that eventually puts the requested block into the cache.
18
VERITAS Software
Since it is the type dependent file system that manages this process, the VERITAS File System uses this to bypass the UNIX system buffer cache. Once the VERITAS File System returns with the requested block, instead of copying the contents to a system buffer page, it instead copies the block into the applications buffer space. Thereby reducing the time and CPU workload imposed by the system buffer cache. In order to ensure that Direct I/O mode is always enabled safely, all Direct I/O requests must meet certain alignment criteria. This criteria is usually determined by the disk device driver, the disk controller, and the system memory management hardware and software. First the file offset must be aligned on a sector boundary. Next the transfer size must be a multiple of the disk sector size. Finally depending on the underlying driver, the application buffer may need to be aligned on a sector or page boundary, and sub-page length requests should not cross page boundaries. Direct I/O requests which do not meet these page alignment requirements, or which might conflict with mapped I/O requests to the same file, are performed as datasynchronous I/O. This optimization, coupled with very large extents, allows file accesses to operate at near raw-disk speed.
19
technology the real limit approaches 16 MB/sec throughput. Since each SCSI disk in the test platform can produce approximately 6.5 MB/sec, with four disks per controller this creates a bottleneck at the controller level. This was done in order to illustrate the performance differences between the tested technologies in this limited throughput environment. Controller Limited Read Tests - VxFS Buffered / Discovered Direct / Direct / UNIX raw I/O
I/O Transfer Block Size KB 64 256 512 1024 2048 VxFS Buffered KB/sec 14732 14723 14732 14730 14722 Discovered Direct I/O KB/sec 14733 14687 13684 14140 14770 9392 12867 13722 14149 14680 8582 12683 13745 14124 14898 Direct I/O KB/sec raw I/O KB/sec VxFS Buffered CPU sec 6.67 6.28 6.6 6.29 5.91 Discovered Direct I/O CPU sec 6.03 6.24 0.7 0.49 0.72 1.26 0.76 0.85 0.45 0.7 1.15 0.79 0.75 0.71 0.74 Direct I/O CPU sec raw I/O CPU sec
These controller limited tests were done using the default UFS and VxFS parameters. The file size used in all iterations was 256 MB. The hardware disk configuration used was a single SCSI controller connected to 4 SCSI drives, creating a single 8 GB RAID-0 volume. Controller Limited Read Test Graphs - VxFS Buffered / Discovered Direct / Direct / UNIX raw I/O
14000
12000
10000
6000
4000
2000
raw I/O
VERITAS Software
20
These file system read results show the benefit of Discovered Direct I/O. While the I/O size remains below the discovered direct I/O size of 256K, the file system performs standard buffered I/O. Once above that size, both Discovered Direct I/O and Direct I/O throughput performance climb along the same curve of raw I/O. Note that the raw I/O final throughput of nearly 16 MB/sec is almost the maximum available throughput given the controller limited testing. This was done in order to illustrate that in terms of its scalability the VERITAS File System can provide throughput that is very close to the actual hardware limits. This model of providing the highest realized throughput for each installed system, scales as you install the VERITAS File System on larger and faster platforms. The second interesting result is the fact that the CPU utilization is high while using standard buffered I/O in the Discovered Direct I/O category, and then drops appreciably once Direct I/O is invoked once past the 256 KB block size. This demonstrates the potential for tremendous scalability which will be realized in testing later in this chapter.
4 CPU Seconds 3
raw I/O
VERITAS Software
21
These set of disk limited tests were done using the default UFS and VxFS parameters. The file size used in all iterations was 256 MB. The hardware disk configuration used was 4 SCSI controllers connected to 8 SCSI drives, creating a single 16 GB RAID-0 volume. This set of tests compares the file system read performance throughputs of VxFS Buffered, Discovered Direct I/O, Direct I/O and Solaris UNIX raw I/O. Here again we see the combined advantages of VxFS buffered I/O and Direct I/O performance. As soon as Discovered Direct I/O invokes the Direct I/O mode, the performance throughputs between the three technologies is very similar. We also see the same drop in CPU utilization once Discovered Direct I/O invokes the Direct I/O technology. Again in this series of tests, note that the raw I/O final throughput of nearly 52 MB/sec is the maximum available throughput given the disk limited testing (6.5 MB/sec times 8 drives). This again illustrates that in terms of its scalability the VERITAS File System can provide throughput that is very close to the actual hardware limits. This also illustrates that standard buffered technology reaches bottlenecks very quickly when pushed. This model of providing the highest realized throughput for each installed system, scales as you install the VERITAS File System on larger and faster platforms.
VERITAS Software
22
Disk Limited Read Test Graphs- VxFS Buffered / Discovered Direct I/O / Direct I/O / raw I/O
45000
40000
25000
20000
15000 VxFS Buffered Discovered Direct I/O 5000 0 64 256 512 I/O Transfer Block Size KB 1024 2048 Direct I/O raw I/O
10000
CPU Seconds
VERITAS Software
23
The next series of disk limited benchmarks compare the performance throughput differences between Solaris UNIX raw I/O and UFS buffered I/O with the VERITAS File System Discovered Direct I/O and Direct I/O. Additionally this entire series of disk limited tests were done utilizing a specific mode of the vxbench program. This mode is defined here as a multiple application mode in which multiple threads perform their respective file I/O, to their own unique file. This is an application workload that is similar to a standard server environment. These tests were done using the default UFS and default VxFS parameters with one change. The vxtunefs parameter of max_direct_iosz was set to 2 MB. The file size used in all iterations was 256 MB. The hardware disk configuration used was 4 SCSI controllers connected to 8 SCSI drives, creating a single 16 GB RAID-0 volume. Disk Limited Multiple Application Read Tests
Number of Files I/O Transfer Block Size KB 1 2 4 8 16 1 2 4 8 16 1 2 4 8 16 1 2 4 8 16 1 2 4 8 16 64 64 64 64 64 256 256 256 256 256 512 512 512 512 512 1024 1024 1024 1024 1024 2048 2048 2048 2048 2048 UFS Buffered KB/sec 11818 14600 12992 13406 15244 11722 12895 12897 13427 15170 11771 14791 12952 13390 15228 11781 14482 12997 13461 15195 11795 14152 13209 13309 15266 VxFS Discovered Direct KB/sec 31797 28665 28855 30179 30553 34146 28275 29193 29762 30385 29738 24510 26814 29435 30705 45345 26498 28668 30262 31314 48987 27654 29916 30991 31650 10615 11727 16797 26025 27639 20276 18421 25344 28598 29404 30181 25317 27085 29477 30212 38373 26531 29143 30191 31264 47010 27128 30138 30737 31195 5.8 13.4 31 69.3 132.4 5.8 12.6 30.2 67.2 127.8 5.6 12.9 30.1 66.7 132.9 5.2 13.5 30.1 67.8 131.7 5.5 13.3 30.3 68.9 131 VxFS Direct KB/sec UFS Buffered CPU sec VxFS Discovered Direct CPU sec 6.6 14.9 31.2 64.2 133.6 5.8 13.9 29.9 63.2 130.8 0.6 1.6 3.4 6.6 13.4 0.8 1.6 3.1 7 13.5 0.6 1.4 3.4 7.3 13.8 1.3 2.8 5.5 9.8 21.4 0.8 1.7 3.6 7.2 15 0.8 1.8 3.1 6.4 12.9 0.8 1.7 3.3 6.2 13.5 0.7 1.3 3.1 6.5 13.5 VxFS Direct CPU sec
These test results offer a good example of the combination performance of VxFS buffered and Direct I/O, in the Discovered Direct I/O feature. Note that while the UFS throughput results remain relatively flat, VxFS Discovered Direct I/O provides better initial throughput performance than Direct I/O, and then provides a similarly increasing curve as Direct I/O. The CPU time measurements again indicate the CPU resource differences between buffered file system activity, and Direct I/O file system activity.
VERITAS Software
24
KB/sec
VxFS Direct
UFS Buffered
140
UFS Buffered
120
100
80
60
40
64 1
64 2
64 4
64 8
64 16
256 1
1024 1024 2 1
VERITAS Software
25
CPU Seconds
VERITAS Software
26
Disk Limited Multiple App. Read Test Graphs - Tuned UFS Buffered / VxFS Discovered Direct
KB/sec
512 1
512 512 2 4
512 8
512 16
VxFS Discovered Direct Tuned UFS Buffered I/O Transfer Block KB Number of Files
160
140
120
80
60
40
64 1
64 2
64 4
64 8
64 16
256 1
256 2
256 4
256 8
256 16
512 1
512 2
512 4
512 8
512 16
VxFS Discovered Direct I/O Transfer Block KB Number of Files Tuned UFS Buffered
VERITAS Software
27
CPU Seconds
100
Tuned UFS demonstrates a pretty consistent performance throughput while VxFS again demonstrates higher throughput during a majority of the testing. These file system read performance numbers indicate that proper tuning of Solaris UFS can increase overall throughput while reducing CPU utilization over the standard default Solaris UFS. However VxFS with Discovered Direct I/O, still provides better throughput with reduced CPU utilization for large file I/O. Additionally, tuning UFS for larger file I/O does impact general purpose servers that continue to perform a mix of I/O. For example tuning UFS for large file I/O, with more than one application, can cause serious degradation in systems performance, due to the OS paging mechanism and excessive CPU utilization. A better alternative is VxFS with Discovered Direct I/O Tuning UFS There are a number of information resources available in the UNIX industry which describe tuning UFS for different application workloads. Among those, DBA Systems, Inc. of Melbourne, FL. has done a lot of work in the area of tuning UFS for performing large file I/O. A report completed by William Roeder of DBA Systems, Inc. outlines the basic steps involved in performing this tuning. The information presented there is repeated here only in summary form. In tuning UFS one of the most important settings is the maxcontig parameter. By using this to alter the number of contiguous file blocks that UFS allocates for a file, the UFS system will actually operate more like an extent based file system. Other settings that can be used for tuning UFS for large block I/O are fragsize, nbpi and nrpos. Using this information the UFS file system created for this large file I/O testing was done with the following command: >mkfs -F ufs -o nsect=80,ntrack=19,bsize=8192,fragsize=8192, cgsize=16,free=10,rps=90,nbpi=32768,opt=t,apc=0,gap=0,nrpos=1, maxcontig=1536 /dev/vol/rdsk/vol01 256880
28
36 - Genroco Ultra-Wide S-Bus SCSI host adapter cards Storage Hardware: 4 Terabytes of RAID-5 disk arrays, provided by Maximum Strategy Gen5-S XL Disk Storage Servers. Each disk server was attached via multiple UltraSCSI channels to the UltraSCSI host adapters on the ULTRA Enterprise I/O boards. Disk attachment was via the Ultra-Wide SCSI channels provided by the Genroco S-Bus to UltraSCSI host adapters The software platform consisted of: Solaris 2.5.1 VERITAS Volume Manager (VxVM 2.3) VERITAS File System (VxFS 2.3 Beta) VERITAS Software vxbench benchmark program Instrumental's Performance Manager (PerfStat) The UltraSPARC 6000 was configured with 4 UltraSPARC CPU boards, each containing 2 CPUs and 832 MB of RAM, installed in 4 of its 16 slots. The remaining 12 slots were installed with the UltraSPARC I/O cards, each containing 3 S-Bus card slots. This provided a total of 36 S-Bus card slots into which the 36 Genroco Ultra-Wide S-Bus SCSI host adapter cards were installed. Each of the Genroco SCSI adapters was attached to one of the six ports on the Maximum Strategy Gen5-S XL RAID-5 disk arrays. In this configuration each of the ports on the Gen5-S XLs appear to the VERITAS Volume Manager as a single drive, even though they actually consist of a RAID-5 array of disks. The VERITAS Volume Manager was then used to configure all 6 Gen5-S XLs, as a single RAID-0, 36 column array. Each column used a stripe width of 2 MB, presenting a full stripe size of 72 MB. With the machine configured in this manner the testing was performed by starting 30 processes using vxbench, with each process performing I/O on one sixth of a full stripe, or 12 MB. These 30 processes would perform successive 12 MB I/O operations in parallel on a single 2 GB file in the first series of tests. The first performance throughput numbers were measured using 30 Discovered Direct I/O threads performing reads on the same 2 GB file, multiple times. Using this method we demonstrated 960 MB/sec file system throughput. We used this same method to produce a multithreaded write test on a single large file with a throughput of 598 MB/sec. At this point the testers determined that the throughput numbers while impressive, were wholly constrained by the raw disk speed. This determination was reached since Solaris raw I/O generated the same performance throughput as VxFS. As a result, with all I/O slots in the UltraSPARC 6000 filled, testers felt that the 1024 MB/sec barrier could be reached, by hooking up additional drive arrays to the server. In order to accomplish this several smaller disk arrays were attached to the SCSI host adapters built into the UltraSPARC I/O cards. Next multiple file I/O operations were run in parallel, by performing a single I/O operation on the combined Gen5-S XL large array, and by performing a single I/O operation for each of the smaller drive arrays. The final performance throughput measured was 1049 MB/sec while performing file system reads on multiple files.
VERITAS Software
29
An interesting side note. The highest performance throughput measured on a single Genroco Ultra SCSI controller was 27 MB/sec. Once the system was configured with 36 controllers in parallel, the performance throughput only decreased to 26.67 MB/sec per controller card. This demonstrates continued impressive scalability for VxFS.
VERITAS Software
30
Chapter
31
Performance
database applications have found that the performance of the database will increase if the system buffer cache employs a most frequently used caching policy. Another database application bottleneck with file systems, is that the UNIX OS maintains a single writer / multiple reader access policy on each individual file block. This allows the OS to enforce a policy of system updates being guaranteed to be updated by a single user at a time. This would keep file blocks from being corrupted with multiple simultaneous writes. However database applications lock data updates at a much more granular level, sometimes going so far as to lock updates based upon fields in a database record. As a result, locking an entire file block for data contained in a single field slows down database updates. Bypassing the file system and using a raw I/O interface allows the database vendor to lock system writes in a manner most efficient for their application. Using raw I/O allows a database vendor to employ an I/O system that is optimized to provide the best performance for their application. The largest problem that using raw I/O creates is the fact that raw I/O disks do not contain a file system. Therefore the data on the disks cannot be accessed using file system based tools, such as backup programs. raw I/O Limitations The largest single category of limitations that exist with raw I/O partitions are their management. Since the application manages all file system services, then any file system services such as backup, administration, and restoring, must be done within the application. Most of these tools treat the raw partition as one large image rather than separate files. System backups and restores must be done as a whole image, and performing maintenance on any one section of the raw system, can be very time consuming. As database servers grow in size the management problems associated with raw partitions increase.
32
All of the multiple thread random tests were done using the default UFS and default VxFS parameters. The file size used in all iterations was 1 GB, the block size used in all iterations was 2 KB. The hardware disk configuration used was 4 SCSI controllers connected to 16 SCSI drives, creating four 8 GB RAID-0 volumes. Finally twenty 1 GB files were pre-allocated, 5 each to a single volume for performing these tests. These benchmarks illustrate that for reads, all technologies provide very similar throughput. In some cases Quick I/O for Database actually provides slightly better throughput than the raw partition. This is likely due to some of the alignment features inherent in the combination of VERITAS Volume Manager and the VERITAS File System. The only big difference in these benchmarks is in the marked decrease in CPU utilization changing from buffered I/O to non-buffered I/O. Again Direct I/O and Quick I/O for Database perform on par with raw I/O. However when switching to the write benchmark tests it becomes apparent how much of a performance cost the single writer lock policy in the UNIX OS incurs. It is important to note that these bottlenecks exist utilizing this type of I/O stream, due to the fact that I/O will queue up behind the UNIX locking. Note that while buffered and Direct I/O reach bottlenecks in throughput at their respective levels, the Quick I/O for Database technology demonstrates impressive throughput while circumventing this system limitation. The following are the combination graphs of these results looking at the 16 thread tests:
VERITAS Software
33
raw I/O
Quick I/O
Direct I/O
VxFS Buffered
500
1000
1500
2000
2500
3000
3500
4000
4500
raw I/O
Quick I/O
Direct I/O
VxFS Buffered
8 CPU Seconds
10
12
14
16
VERITAS Software
34
Chapter
35
Performance
These benchmark tests continued the previous chapter testing by performing multiple thread, file system write testing, comparing Direct I/O with Quick I/O for Database for large block I/O, typical of imaging and other multimedia application workloads. All of the final comparison tests were done using the default VxFS parameters. The file size used in all iterations was 256 MB. The hardware disk configuration used was 4 SCSI controllers connected to 8 SCSI drives, creating a single 16 GB RAID-0 volume. Note that the performance curves begin to separate when multiple threads are used at the 4 thread level. Quick I/O for Database continues to demonstrate better throughput as the testing continues, while Direct I/O maintains a regular decrease in performance whenever the block size is smaller. CPU utilization curves appear very similar only until the larger thread ranges are reached. There the Quick I/O for Database technology utilizes more CPU time, not for file system activity but rather this is mostly attributed to thread re-synchronization. The following are the graphs of these results: Direct I/O vs. Quick I/O for Database Multiple Thread Writ e Test Graphs
60000
Quick I/O
50000
40000
30000
0
1024 1024 1024 1024 1024 16 32 8 4 1 2048 2048 2048 2048 2048 16 32 8 4 1
64 1
64 4
64 8
64 16
64 32
256 1
256 4
256 8
256 16
256 32
512 1
512 4
512 8
512 16
512 32
VERITAS Software
36
KB/sec
4.5
3.5
1.5 1 0.5 0
1024 1024 1024 1024 1024 16 32 8 4 1 2048 2048 2048 2048 2048 16 32 8 4 1
Direct I/O
64 1
64 4
64 8
64 16
64 32
256 1
256 4
256 8
256 16
256 32
512 1
512 4
512 8
512 16
512 32
These results demonstrate an interesting fact when it comes to Direct I/O. It is shown here that as the I/O transfer size increases, the Direct I/O throughput increases. This demonstrates that when the I/O requested gets larger the UNIX single writer lock policy affect is lessened. The concept behind this result is that when using low I/O sizes the disk subsystem is waiting for work to do based on the lock bottleneck. However as the I/O size increases, the disk subsystem has more work to do, and the bottleneck changes from the file system and balances with the disk subsystem. This concept is further tested in the last series of benchmark tests. Here instead of performing multiple thread writes to a single file, the comparison testing of Direct I/O and Quick I/O for Database involves performing multiple writes to different files, simulating the activity of multiple applications. This is typical of the workload imposed in a multiuser server environment.
VERITAS Software
37
CPU Seconds
Direct I/O vs. Quick I/O for Database M ultiple Application Write Tests
Files Written 1 2 4 8 12 1 2 4 8 12 1 2 4 8 12 1 2 4 8 12 1 2 4 8 12 I/O Transfer Block Size KB 64 64 64 64 64 256 256 256 256 256 512 512 512 512 512 1024 1024 1024 1024 1024 2048 2048 2048 2048 2048 Direct I/O KB/sec 6122 10803 21718 25750 27154 15826 22058 27187 29542 30014 23400 28667 26731 30202 30242 31935 37247 38730 38269 37413 36134 41792 43821 43575 42729 Quick I/O for DB KB/sec 5875 10989 21176 26027 27581 15681 22534 27289 29084 29944 23683 28878 26716 30199 30223 31335 33647 36731 38260 37454 36313 43808 43474 43398 42627 Direct I/O CPU sec 0.89 2.31 4.93 10.29 15.58 1.06 1.53 3.11 6.8 10.31 0.71 1.34 2.75 6.26 9.66 0.54 1.27 3.05 5.93 8.39 0.61 1.32 2.76 5.79 8.65 Quick I/O for DB CPU sec 1.15 2.23 5.02 10.39 15.38 1.07 1.58 3.49 6.94 10.38 0.68 1.53 2.93 5.92 9.62 0.76 1.14 2.91 5.49 8.55 0.68 1.43 2.89 5.51 8.32
This time the performance curves are almost identical. An interesting result considering the different technologies. This certainly demonstrates that for large I/O in a multiuser environment, utilizing Direct I/O technology, the UNIX single writer lock policy is less of an issue on overall system throughput. The following are the graphs of these results:
VERITAS Software
38
Direct I/O vs. Quick I/O for Database Multiple Application Write Test Graphs
45000
40000
25000
Direct I/O
64 1
64 2
64 4
64 8
64 12
256 1
256 2
256 4
256 8
256 12
512 1
512 2
512 4
512 8
512 12
16
14
12
0
1024 1024 1024 12 1024 1024 8 4 2 1 2048 2048 2048 2048 2048 12 8 4 2 1
64 1
64 2
64 4
64 8
64 12
256 1
256 2
256 4
256 8
256 12
512 1
512 2
512 4
512 8
512 12
VERITAS Software
39
CPU Seconds
10
KB/sec
VERITAS Software
40
Chapter
Conclusion
This performance report focused on the performance of the 2.3 version of the VERITAS File System. Included here has been a discussion of the key performance components in the VERITAS File System. This report also presented a series of benchmark tests comparing the throughput and CPU utilization of different VERITAS File System component technologies, as well as some tests comparing the VERITAS File System (VxFS) with the Solaris UNIX File System (UFS). It is clear that the VERITAS File System presents technologies for improved performance at the smaller I/O sizes with features such as extent based allocation. The VERITAS File System also presents technologies for advanced performance with large I/O using the Direct I/O technology. With the release of the 2.3 version of the VERITAS File system both buffered and Direct I/O performance can be combined in one file system with the Discovered Direct I/O feature. For database implementations the Quick I/O for Database provides throughput very close to that of database servers running on raw partitions. This report outlined the performance advantages and system scalability of the VERITAS File System. Using the VERITAS File System and a Sun UltraSPARC server, VERITAS has been able to generate file system performance throughput in excess of 1 GB/sec. In closing, VERITAS Software presents software technology that provides commercial class performance, availability, and manageability. This report presents a description of the very powerful performance component of VERITAS Software. Once the performance component is understood, it is important to realize that this performance comes not as a result of a standard UNIX file system, but comes coupled with the highly available, journaled, VERITAS File System.
41
Performance
VERITAS Software
42