A Storage Area Network Primer
When I first heard the acronym SAN a few years ago, I started asking questions about the technology. The conversation started to go over my head very quickly, so I switched to questions about price and market share. Then, when I found out from one vendor that a "low-cost" storage area network started around $55,000, I filed the SAN away as an idea that sounded great but probably wasn’t practical for small digital media businesses.
The last 18 months have brought about a significant change in the price and complexity of SANs, however, with the widespread adoption of 1GB Ethernet, serial ATA drives that rival SCSI performance, and software that takes the pain out of administering the SAN.
SAN vs NAS
Before we jump into a generic SAN setup, though, let’s look at another form of network storage–network-attached storage (NAS). A typical NAS is designed to allow multiple users access to content on the same volume or drive, which means that most NAS solutions are used as central file repositories without the need for an external computer host or server. NAS products are typically not robust enough to handle large reads and writes, as file system overhead tends to cripple throughput. A few NAS products, such as the EditShare products for digital media post-production, are optimized to handle large data files and simultaneous read/writes on a single volume.
SANs, by comparison, are heavily optimized for numerous requests for data, but typically only allow one user at a time to write content to a drive while all other users are limited to reading from that drive. In this way, SANs are ideal for streaming media solutions, as most volumes would have one encoding device writing content to the drive while multiple streaming servers would read the content from the volume and serve it out to viewers.
SANs of Change
In the early days, the only interface capable of handling the data throughput that SANs produced was Fibre Channel (FC). FC was capable of 2Gbps bi-directional throughput long before Gigabit Ethernet appeared in mainstream computers.
These days, Gigabit Ethernet SAN solutions are more prevalent, thanks in part to the iSCSI standards. iSCSI, or internet SCSI, is a transport layer in the SCSI-3 spec framework, allowing use of the SCSI protocol over TCP/IP networks. Since TCP/IP networks have some difficulty co-existing with Ethernet networks, an alternative called ATA over Ethernet has also emerged, which doesn’t require access to network layers above Ethernet (i.e., IP, UDP, TCP). The limitation of ATA over Ethernet networks is that they are not routable over LANs; iSCSI systems are.
Another, less frequently used SAN interface is FireWire, which has the benefit of not requiring a host computer but lacks the throughput of 1Gb Ethernet or 2GB Fibre Channel. Within a few months, too, expect to see 10Gb Ethernet SANs become more mainstream as a viable solution for small digital media businesses.
A SAN part list
At its essence, a SAN consists of four pieces:
--Storage devices, which are either single drives with fiber or copper connectors or multiple drives housed in a single chassis that has a fiber or copper connector. With storage sizes up to 2 petabytes (2000 terabytes), your centralized data storage needs should be met for at least another year or two.
--Metadata controller, which is a computer that acts as the host or gatekeeper, managing access to shared storage.
--Switch, either copper or fiber, that routes storage traffic to/from the appropriate storage device, once the metadata controller has given permission for a computer on the network to access the volume.
--Host bus adapter (HBA), a card in each computer or node that allows content to move to and from the computer to the storage device.
Some SANs add to this list, as will be noted in the following section. Other SAN vendors are beginning to embed the metadata controller in a storage device chassis, as a blade alongside redundant power supplies, NIC cards, etc. While this doesn’t necessarily reduce the cost of the overall system, it does provide a reduction in physical space requirements.
Maintaining Redundancy
SANs are set up for two primary reasons: to maintain access to mission-critical data that is necessary to keep the company in business and do so without sacrificing speed.
To meet the former need, there are several ways that SANs provide redundancy to maintain access to mission-critical data. These redundancies most often work on a failover model: if one device fails, the initial request rolls over to a secondary device.
Redundancy starts at each node; most HBAs have dual connectors, so that if one cable is loose or stops working, the second cable provides an alternate path to the SAN. Many SANs also use two fiber or copper switches, so that the each of the two cables from the HBA is plugged into a discrete switch, providing multiple pathways to the storage devices.
Some SANs also provide metadata controller failover, in which a secondary computer is dedicated to managing access to shared storage. As SANs become more sophisticated, though, and vendors attempt to reach customers that can’t afford to dedicate additional hardware, another model is emerging in metadata controller failover. This model uses any of the nodes as the alternate metadata controller, meaning that access to data is still available as long as one node is active.
Final Thoughts
Setting up a SAN is a significant undertaking, given the number of options that exist for varying levels of redundancy and performance. As with all complex network infrastructures, hiring professional help to design a SAN is a good idea. Product manufacturers such as Dell/EMC, Apple, and QLogic are also responding to customer requests for "build your own SAN" packages that allow the average IT group to build and optimize the storage area network.
Persevering through the complexity of SAN options and system designs is well worth it, though, as a SAN is the best current model for redundant access to mission-critical data.