Designing a RAID 5 group

In a blog called, “Disk I/O for Oracle DBAs” we talked about a simple way of estimating disk I/O.  Below is a list of disk types and relative IOPS:

Enterprise Flash Drives (EFD): between 2000 – 4000 IOPS
Fibre Channel 15k RPM drives: 180 IOPS
Fibre Channel 10k RPM drives: 120 IOPS
SATA 7200k RPM drives: 90 IOPS

Then we used some simple math to estimate IOPS:

Number of disks = 50
Type of disk = Fibre Channel 15k RPM drives: 180 IOPS

So the total IOPS would look like: 50 disks * 180 IOPS = ~9000 IOPS

Let’s complicate the picture a bit by adding RAID 5 into the mix. As most DBAs know RAID 5 uses block-level striping with parity data to evenly distribute both data and parity across all the disks in the RAID 5 group. Use this link for researching RAID 5 and other levels of RAID:

Wikipedia The Free Encyclopedia: Standard RAID Levels
http://en.wikipedia.org/wiki/Standard_RAID_levels

By the way what does RAID stand for? Redundant Array of Independent or Inexpensive Disks. Now this is where we begin to drive into some complexity as different models of storage arrays have guidelines for creating RAID 5 groups. For example, the recommendation for a RAID 5 group on one storage array might be 4 + 1 which means for four (4) disks in terms of capacity and one used for parity. The calculation for determining capacity looks like this:

Smallest disk capacity  x (total number of disk  – 1) or the short version: Smin x (n – 1)

Let’s say we are using 1 TB disks with a 4 + 1 RAID disk group then our calculation looks something like this:

1 TB x (5 – 1)  = 4 TBs of total data capacity!

Great we know the capacity and it’s simple to estimate our disk IOPS:

4 Fibre Channel 15k RPM drives x 180 IOPS = ~720 IOPS

So in summary with a RAID 5 (4 + 1) group made of 1 TB 15k RPM drives we can estimate:

Group Size                 Estimated Capacity               Estimated IOPS
(4 + 1)                         4 TB                                        ~720

Did you see what happened? We lost both capacity and IOPS in using RAID 5! Due to the parity stripping the equivalent of one disk is dedicated to providing some resiliency in the case of a disk failure. Most of us would gladly dedicate one disk if it meant keeping our Oracle databases up and running. Let’s do the same calculations for a larger 6 + 1 disk group keeping everything else the same.

Capacity:  1 TB x (7 – 1) = 6 TB
IOPS:        6 x 180 IOPS = 1080 IOPS

Lets add to our table:

Group Size                 Estimated Capacity               Estimated IOPS
(4 + 1)                         4 TB                                        ~720
(6 + 1)                         6 TB                                        ~1080

Interesting, as a RAID 5 group consisting of 7 disks offers 33% more IOPS than a group with 5 disks and with the same resiliency for each group. It would seem the larger the RAID group the less disk capacity lost and the greater the performance. Remember that the trump card is the RAID 5 design that works most efficiently in your EMC storage array.

Now lets have some fun by adding a RAID 5 (4+1) Enterprise Flash Drive (EFD) into our table.

Capacity:  400 MB x (5 – 1) = 1.6 TB
IOPS:        4 x 2000 IOPS = 8000 IOPS

Group Size                 Estimated Capacity               Estimated IOPS
(4 + 1)                         4 TB                                        ~720
(6 + 1)                         6 TB                                        ~1080
(4 +1)                          1.6 TB                                     ~8000

The downside is we lose some capacity in that our EFD disks are only 400 GB. Still the ~8000 IOPS offer us the opportunity to significantly reduce access times and increase the performance of our database. One idea is to use the new 11gR2 Flash Cache feature with a small EFD RAID 5 group to increase the overall performance of the database.

In the next blog we will talk about how EMC improves the performance of RAID 5 using our array caching. Are you interested in some assistance with storage design contact your EMC Sales Representative or use this link:

http://www.emc.com/services/portfolio/implementation/design-implementation/index.htm

Have a great day

About these ads

About Sam Lucido

I’m a family man that loves playing with the kids, gardening and sitting outside talking with the neighbors. I work for EMC utilizing my 14 years experience with Oracle technology to talk with customers about some of the great storage and virtualization solutions we have to offer.
This entry was posted in Uncategorized and tagged . Bookmark the permalink.

2 Responses to Designing a RAID 5 group

  1. Sam Lucido says:

    Thanks to Matt Kaberlein for proof reading this blog!

  2. Marc Jellinek says:

    Using IOPS to calculate storage and performance requirements (alone) is very misleading. IOPS started out as a technical measure but has since been co-opted by the marketing folks. They know us techies like acronyms and numbers.

    They should be used as a LOOSE guide, especially where RAID5 is concerned.

    There is a write penalty with RAID5 when compared with a straight disk or another method that doesn’t write parity.

    As you mentioned, RAID5 writes a parity block for every N-1 blocks written (where N = the number of disks in the stripe).

    Let’s say you’ve created a RAID5 stripe using 9 disks (contrived example in order to make the math easier) and an 8K block size.

    So if your block size is 8K and you want to write 63K (slightly less than 8 blocks), you are actually writing 9 blocks, 8 data blocks and 1 parity block. Data block 1 is written to disk 1, data block 2 is written to disk 2 and so on. The parity block is written to disk 9.

    So for every 8 input/output operations, there is a 9th input/output operation performed, to either calculate and read the parity block or to write/rewrite the parity block. Call it a performance penalty of slightly more than .125.

    You don’t lose an IOP, you are forcing additional IOPS.

    This is valid, if you are writing data that requires a number of blocks equal to a complete stripe. That’s why my contrived example was 63K… it required more than 7 blocks but less than 8 blocks to be written… plus the parity block.

    Now let’s say that you are writing slightly more than can be held in a single stripe, 71K.

    You’ll write 64K of data in 8 data blocks, plus 8K in a parity block. You’ll write the remaining 7K to another data block… and update the 8K parity block associated with that stripe. That’s 16K in parity (two 8K parity blocks) associated with 71K of data.

    This example will hold up if you are writing large blocks of data…. data export, ETL, initially loading a database, etc. You can be assured of writing complete stripes of data and the “tail ends” will be relatively small in comparison with the number of complete stripes written.

    What’s even more fun is figuring out the IO requirements for an OLTP system that uses small, atomic transactions. If each transaction writes less data than a single stripe, you may only wind up writing 3 or 4 blocks. But you have to also recalculate and rewrite the parity block associated with that stripe.

    Using the example above, the parity overhead isn’t slightly more than .125 (one parity block write for every 8 data block writes), it’s an extra parity write for every 3 or 4 blocks (overhead of .25-.33).

    The real mess starts when you bring in SAN filers which virtualize the exposed file system (like NetApp does with WAFL or EMC does with Block Storage Virtualization) or databases like Oracle and SQL Server which write there data to “pages”. Figuring out that overhead makes me rumbly in my tumbly.

    Add the marketing mix of terms like “Total IOPS” vs. “Random Read IOPS”, “Random Write IOPS”, “Sequential Read IOPS” and “Sequential Write IOPS” and the fact that no system is totally read-only or write-only and it takes some doing to make sure that all reads and writes are sequential… well, you start to see how these numbers become more of a very loose guide than anything proscriptive.

    I’m interested in attacking this from a different, complimentary angle.

    The traditional way of dealing with storage from a database perspective is to separate data and transaction logs (my expertise is in Microsoft SQL Server… the analogous files in Oracle would be redo logs).

    With DAS (direct attached storage), the ideal is to have data stored on storage with a dedicated controller and have transaction logs stored on separate storage with a separate dedicated controller. The thought is to separate I/O requests from data and transactions so they don’t contend with each other.

    But the interesting thing is: if you want more thruput, throw more spindles (and more potential IOP capacity). I’m wondering if anyone has taken a look at where this approach is applied to SANs.

    Where does it become advantageous to co-mingle data and transaction logs on the same storage, through the same controller against a huge number of spindles; or is it better to separate data and transaction logs to their own discrete storage and controllers? I’m assuming that at some point the IO benefits of many spindles outweighs the potential for contention when having both data and transactions managed on the same disks through the same controller.

    Do you have any insight into this issue?

    Since you are at EMC… if you know Duwayne Harrison… tell him I said hello!

    regards,

    Marc Jellinek

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s