Configuring Servers for Streaming, Part Two
More About RAID Parity
RAID parity is a means of duplicating data by emulating it through parity information. Parity is quite small in size compared to the actual data but can be used to re-create the original data should it be destroyed as a result of a disk failure. Sophisticated algorithms are used to code and decode parity.
More About RAID Levels
There are six main types of RAID array architectures: RAID-0 through RAID-5. When combining RAID 0 and RAID 1 (sometimes called RAID 10), we actually have seven variants of RAID arrays. Each provides disk fault-tolerance (except RAID 0) and each represents trade-offs depending on the characterization of the application workload. The trade-offs of the various RAID levels concern the speed of data reading, data writing, and the application it is performing. Let’s examine the performance attributes of these RAID levels as applied to streaming media requirements.
RAID Level 0
RAID 0 just stripes without writing parity. RAID 0 is fast, however because it is not redundant, the fault tolerant attribute to lose a disk without losing data is lost. Because backup/restore times can be slow given the size of most streaming media content, RAID level 0 is not advised.
RAID Level 1
RAID level 1 adds one or more mirror drives to a single drive without striping. As mentioned before, you can use non-striped SCSI drives as long as they are routinely de-fragmented. Adding a mirror adds fault tolerance. Remember that a mirror drive is not a backup for data, it’s only a backup for a disk crash. If data is corrupted on the primary drive, that corruption is propagated to the mirror drive(s).
RAID Level 0+1 (RAID 10)
RAID 0+1 essentially stripes data across multiple drives with a mirror of each drive adding fault tolerance. This is the fastest RAID solution available but also the most expensive. RAID 10 requires at least 2X the drive space for any data to be stored. Two additional fault tolerance features exist with RAID 10: 1) The mirror of each drive can exist in a different RAID enclosure. If the first enclosure fails, the second enclosure can continue to stream without interrupting the I/O operation. 2) If the RAID 10 mirror-set is temporarily broken, the mirror can be backed up without affecting the performance of the streaming media on the primary drive, and then the mirror-set can be re-established.
RAID Level 2
RAID level 2 is not used today because it was designed with built in error detection/correction coding (ECC), which today is accomplished within the disk drives themselves.
RAID Level 3
RAID level 3 writes parity on a separate, dedicated parity disk drive. RAID 3 is best suited for long sequential reads, or reading the data of large files with data blocks stored contiguously on disk. This makes RAID 3 the best RAID for reading streaming media. RAID 3 is a poor choice for reading data (files) of varying lengths, so do not co-locate Word files with your movie files on RAID 3. Dedicate RAID 3 arrays to huge streaming media files.
RAID Level 4
RAID 3 implements and accesses data at a byte level. RAID 4 is the same as RAID 3, only it implements at a block level. RAID 4 is rarely used in commercial operations. RAID 3 is dominant.
RAID Level 5
RAID 5 writes (interleaves) the parity data within the actual data. Parity is calculated during the write operation. For this reason, RAID 5 has an inherent write penalty in terms of performance. RAID 5 is best suited for mixed data transfer sizes. Because most commercial operations cannot practically predict what size files the users need to store, RAID 5 serves as a compromise for large and small data transfers. It’s not too fast, yet it’s not too slow and cache on controllers often mitigate the write penalty. RAID 5 can be acceptable for reading streaming media if a mixed file size is required for user storage. However, if you can, stick with RAID 3 for streaming media applications.
More About RAID Controllers
The RAID controllers that connect the array to the server are critical to performance in a hardware based RAID array system. It’s the controller that manages the reads and writes within the various RAID levels.
Tip: Choose RAID controllers located inside the disk array. These solutions typically provide better performance for robust streaming media requirements than the RAID controller located on the computer (which is a combination SCSI controller and RAID controller). Array based RAID controllers attach to a computer’s SCSI HBA and are typically optimized for performance and redundancy.
Tip: Whenever possible, use multiple RAID controllers for each array. Multiple SCSI HBAs can be attached to multiple RAID controllers within the disk array. This provides fault tolerance and the multiple data paths increase performance.
Tip: Avoid software-based RAID configurations and go with hardware based RAID controllers. Even though host-based (a.k.a. software based) RAID systems are readily available on the market (for instance, the MD RAID software available for Linux), it is better to go with a hardware based RAID controller because all RAID I/O activity occurs in the hardware controller, resulting in higher performance. Comparatively, software based RAID is run like an application by the operating system, which can result in slower performance.
Next Page: Bandwidth Pointers >>