0

I've been asked to look at a database/application server that's "performing slow". No one could really tell me what was performing slow (except for the database backups) so this has been a discovery process.

This server is a Dell PowerEdge R720 running Windows Server 2008 R2 and SQL Server 2012 Standard (11.0.2100.60) (x64).

-Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz, 2500 Mhz, 6 Core(s), 12 Logical Processor(s)

-16GB RAM

-DELL PERC S110 SCSI Disk Device running as a software RAID 5.

The disk is partitioned into two local drives: C and D. The OS files are all on C. SQL Server is installed on D, and the database files (.mdf and .ldf) are also on D. In addition, we're writing our nightly backups to the D drive! :-)

The database mdf file is around 6GB. Backups are taking 30 minutes.

A quick look at the wait types revealed that the WRITELOG wait type was the most prevalent - consuming 69%:

enter image description here

Suspecting transaction log/disk performance issues, I enabled various PerfMon counters and established baseline averages over a couple of days. The PerfMon counters include, but are not limited to:

\PhysicalDisk(0 C: D:)\Avg. Disk sec/Write - Avg = 0.073
\SQLServer:Databases\Log Flush Wait Time - Avg = 71.403
\SQLServer:Databases\Log Flushes/sec - Avg = 1.910
\SQLServer:Databases\Log Bytes Flushed/sec - Avg = 1724.604

The transaction log was originally configured with an initial size of 100MB and set to grow at 10%. When I first logged onto the machine, the transaction log had grown to 5GB across 256 VLFs suggesting some fragmentation.

Paul Randall's query against sys.dm_io_virtual_file_stats reveals some serious write latency with both the data and log files - although contrary to the WRITELOG wait type, the data file write latency is quite a bit more:

Data File: 1369
Log File: 66

Based on the article by Kimberly Tripp, I "rebuilt" the transaction log and assigned it a size of 2GB (somewhat arbitrarily), yielding 24VLFs.

This seemed to decrease the WRITELOG wait type percentage some (now at ~61%) as well as the \SQLServer:Databases\Log Flush Wait Time (now at ~48ms).

Bus this still seems high.

I realize that the data and log files (and backups) should all be on a different drives for parallel reads/writes and maximum throughput. Best practices aside, we have other production systems configured the same way as this one, but that are running better, e.g. Avg Log Flush Wait Time = < 3 ms.

What else could I look at to figure out why this server in particular is slower with regards to disk read/write response time and backups?

Could index fragmentation/fill factor settings cause this poor of performance? What about disk fragmentation?

If I need to tell the client we need another disk, then so be it - I just want to be certain before spending money.

Any other ideas???

3
  • 2
    Points: 1) RAID 5 is a poorly performing configuration for database files especially for writes, and 2) Partitioning is even worse. Yes, it works and is reliable, but it's definitely a low-performance disk configuration that I would only use myself for development or the most light-weight of apps. Commented Jun 9, 2015 at 14:41
  • @RBarryYoung you should change this to an answer. The only thing missing is to check if the raid array is degraded because of a failed disk but this disk config will never spin off. But to make certain that you have the data you can get the PAL toolkit (pal.codeplex.com) and run a performance trace and then a report from that trace.
    – Spörri
    Commented Jun 9, 2015 at 14:52
  • This is not a log problem. If backups, which are sequential, run at only 7MB/s then your IO system is overloaded. That's probably why all IO is slow, not just log IO. Optimizing the RAID might bring you some gains but not the gains you need. 256 VLFs are not a major problem and they can't cause the slow backup.
    – usr
    Commented Jun 9, 2015 at 21:02

1 Answer 1

1

Points:

  1. RAID 5 is a poorly performing configuration for database files especially for writes, and

  2. Partitioning is even worse.

Yes, it works and is reliable, but it's definitely a low-performance disk configuration that I would only use myself for development or the most light-weight of apps. The standard configuration for the physical host of a SQL Server are separate physical volumes for the system and data disks using RAID 1+0.

I will note @Spörri's recommendation also: first, check to see if the RAID array is degraded because of a failed disk.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.