Scalable Video Coding: The Future of Video Delivery?
Over the next two or three years, streaming producers will be increasingly tasked with supplying optimized video streams to devices as disparate as cell phones and set-top boxes, along with different quality versions for users accessing the content over the general internet. While there have been multiple proprietary approaches to this problem, including Microsoft’s multiple bitrate video, one very strong candidate will be an H.264 extension called Scalable Video Coding.
The Problem
In the dawn of streaming, web sites had to supply multiple streams to satisfy users connecting via different devices, and web sites with different icons for modem, ISDN and LAN connections were very common. In the case of three streams, this tripled the administrative burden of encoding and linking these files, and very much complicated distribution over content delivery networks, particularly edge networks where files were stored for local delivery.
The most prevalent streaming suppliers of the day, RealNetworks and Microsoft, both developed technologies to reduce the problem, most notably Microsoft’s multiple bitrate technology, which encapsulated multiple streams into one file. Not only did this reduce the administrative burden associated with multiple files, a Microsoft Streaming server could also dynamically adjust to changing line conditions, sending a lower bit rate stream when the player reported packet loss.
Once broadband became pervasive and modems disappeared, the problem largely went away, as one 500Kbps stream could satisfy virtually all likely classes of users. With video over cellular becoming increasingly important, and high bitrate streams to set-top boxes in the living rooms also on the roadmap for many streaming producers, efficiently delivering different quality streams to multiple devices over various connection bandwidths again becomes critical.
At least three new technologies have moved into the space, Adaptive Streaming from Move Networks, Dynamic Streaming from Adobe, and Smooth Streaming for Silverlight from Microsoft. Within the context of a relatively closed system—single server to single class of player with no delivery network—all of these technologies are very sound solutions.
However, once you involve a content delivery network, or alternative form of transport or player, things get more complicated. For example, to add a CDN to the mix for any of the three mentioned technologies, the CDN may have to change their infrastructure to support the technology, or at the very least, fine tune their platforms to ensure optimal performance of the proprietary technology, which take time and investment. For set-top boxes and cellular phones to play the streams, they would also have to support the proprietary technology, which the manufacturers of these devices are loath to do.
For the streaming market to successfully expand to the living room and cellular markets, it’s very likely that a vendor-agnostic standard for adaptive streaming would have to emerge. Fortunately, there’s one handy—the Scalable Video Codec extension to H.264 (H.264 SVC)—and it offers a very efficient and elegant solution to the problem of supplying multiple streams.
H.264 SVC
H.264 SVC encodes video into "layers," starting with the "base" layer, which contains the lowest level of detail spatially (resolution), temporally (frames per second) and from a quality perspective (higher detail). Additional layers can increase the quality of the stream using any or all of these variables.
For example, the base layer of a stream might be encoded at 15 frames per second, a resolution of 320x240, and a data rate of 300Kbps. Additional layers could expand that stream to 720p video at 3Mbps suitable for a set-top box, with convenient stopping points for relative high-quality streaming over the Internet, say at 640x480x30 fps @ 600 kbps and 720p at 2Mbps. All the layers are incorporated into a single file, reducing the administrative expense of linking and distributing via CDNs. Compared to other approaches, H.264 SVC is very efficient, as the SVC-encoded file should only be about 20% larger than the file size necessary to supply the highest quality representation. In other words, if you were encoding the file to send to the set-top box at 3Mbps, the SVC encoded file would have an overall data rate of about 3.6Mbps. In addition, the base layer should be compatible with existing H.264 players, so no player upgrade will be necessary to view the base layer stream. With hardware encoders, streaming producers can convert their current formats to SVC compatible streams on the fly, so video publishers like CNN and ESPN won’t need to convert their entire library to leverage the new technology.
You want it when?
When will H.264 SVC become generally available? Within the context of widespread adoption, it’s going to be awhile, because all three elements—encoder, server and player—must be SVC-aware to fully leverage the technology. CDN support will also be necessary for larger outlets.
The wheels are in motion, however. For example, H.264 encoding vendor MainConcept showed a technology preview at IBC 2008 in Amsterdam, and is currently looking for technology partners to the end-to-end technology components – encoder, decoder and "network components." However, the technology will likely find its initial implementations in closed systems, like the Google’s Gmail Chat, which is based upon H.264 SVC technology licensed from Vidyo, and security applications like those offered by GE Security.
Clearly, none of the other contenders are waving a white flag, and Adobe, Microsoft and Move are all aggressively pushing their proprietary alternatives. But with the ITU and ISO both behind H.264, and mobile emerging as the next great frontier for streaming, it’s tough to bet against a standard.
As I wander the disparate floors at NAB this spring, I’ll keep my eye out for new H.264 SVC applications. I’ll be surprised if I don’t see a boatload of them.
Companies and Suppliers Mentioned