-->

How to Build a Scalable Low-Latency Streaming Solution

Article Featured Image

Low-Latency HLS

Many streamers are using HTTP Live Streaming (HLS) today, particularly those focused on iOS devices, which don’t support other protocols. Some of us may be using a mix of Dash or HLS, but if you want to support one format today, you're using HLS because the iOS devices only support HLS. Given how effective chunk-transfer encoding is in the DASH world, we should mimic that approach on the HLS manifest side.

Periscope open-sourced this approach on top of HLS in the Periscope application. Twitch does it as well. Doing so  involves a lot of the same mechanisms, such being able to pre-request segments that don't exist yet or half-exist, and using chunk transfer to feed bytes into the player playback before the segment is done.

As this community version of low-latency HLS was being solidified, Apple came out with their own approach. Instead of relying on chunk-transfer streaming, Apple’s version takes segments and chops them up into even smaller parts that the player can request a lot more rapidly. This gives players half-second segments they can start playing as soon as one is downloaded.

The reason Apple didn’t go the chunk-transfer route, as I understand it, is that each of these parts is still delivered at live speed. This means you can actually measure those parts, assess the network speed, and make decisions about how to adapt.

The downside of Apple LL-HLS is that to get latency down to sub-4 seconds or even 2 seconds, it requires some modern networking features. It relies heavily on two parts of HTTP/2. One is a persistent connection that the browser or device can make using H2 to make lots of these little requests without a ton of overhead of those requests.

That’s one benefit of HTTP/2. The other is HTTP/2 Push, or h2.push. Here’s how it works: Let’s say I’m making a manifest request. While the server recognizes that I’m making a manifest request for the higher rendition, it knows I’m also going to want the first segment of that high rendition, so it sends that to me also. Instead of having to make separate requests for the manifest and one request for the segment, with HTTP/2 Push the server is smart enough to send that through. Thus the browser receives the first segment, caches it, and makes it available to you behind the scenes in the browser. So if the player makes a request for it, it’s there, and ready to go really fast.

This is a great mechanism. It’s beneficial when you're in an environment where you control all these things, such as iOS devices. In browsers, it presents some challenges, such as requiring the player to make 10-70 times as many requests as it would with standard HLS. Because Javascript is a single-threaded language--meaning it can do only one thing at a time--introducing a mechanism that has to perform a task 10 times more than standard HLS can impact interactions on other parts of the page, blocking Javascript, and that’s going to block the player. This will prevent the player from making requests and keeping up with latency. That’s a challenge that we need to solve with this part-based approach.

The other challenge with h2.push is that the player doesn't actually know that something has been pushed to the browser when it tries to measure it. The player actually doesn't know a particular part was delivered at live speed or if it was already cached and shouldn’t be used for measurement. Regardless of how they were delivered, all the segments look cached.

So we have some work to do on the Apple version of this approach to understand how it might work in a browser. Otherwise, we’re likely to end up in a split ecosystem here where in the browser, we have a DASH-based, chunk-transfer approach, and on iOS devices, we're using the Apple LL-HLS approach.

A month ago, we had an HLS Forum at Apple with Roger Pantos, people from Disney Streaming, Hulu, and all the major CDNs. The purpose of this was essentially to try and get alignment between Apple’s part-based approach and the chunk-transfer approach, at least to make it so that CDNs can operate the same way. Currently, CDNs are using one mechanism for getting data from the origin to the edge, and the rest of the time they're taking the part-based approach. There’s a big push to make it possible for CDNs to use chunk transfer from the origin to the edge, and then make both mechanisms possible at the edge. But we may be stuck supporting both approaches for the foreseeable future.

Getting the Video Out

Going from the origin and the transcoder out to the players, we have some similar faces on the network side (Figure 5, below). There are a couple of issues with using real-time protocols like RTMP and WebRTC. One is it sets up a persistent connection between the client and the server that takes up resources on the edge server or the origin or wherever it’s connecting to. For as long as that connection is active, that server can do fewer things.

Getting the video out 

Figure 5. Getting the video out

Compare that to an HTTP server or general CDN edge server. It uses commodity-based hardware. The mechanism requests the file, and the server delivers it. Then it sets up that connection again any time it needs a new file, but releases those resources as the video is going on.

CDNs have been optimizing this approach for 20 years now. That’s why it's super-cheap to stream on a CDN compared to an RTMP or WebRTC-based system, where you need to scale up all these special servers that are not commoditized, and creating these persistent connections.

Our general calculation there is that to scale up to events viewed by 1,000 people, you're looking at will be 10 times more expensive using one of these persistent connection-based protocols than with an HTTP-based approach. You can see why so many people in the industry are pushing for this HTP-based approach in order to keep the costs down while hopefully getting to a level of latency that works for these interactive experiences.

Players

As for native players, with Android you’ll work with Google’s Android ExoPlayer. Its current version supports both low-latency DASH and the low-latency HLS. On the iOS side, you have AVPlayer, which supports Apple LL-HLS. You still have to be whitelisted in order to put the app version into production. Until then you’re just testing it.

On iOS native, you can build DASH players, but that means going outside of the safety of the Apple AVPlayer and networking. Netflix, YouTube, and companies building proprietary players to sell will do this, but not many others given the difficulty of developing and implementing these players.

On the web browser side, hls.js is probably the most popular HLS parser; dash.js and Shaka Player are the popular DASH parsers. Js.dash supports the low-latency approach I’ve discussed here, but I don’t believe Shaka does quite yet. DASH is where we’re likely to see the most progress on the player side, including that newer version of measuring the network for the adaptive algorithm.

On the OTT TV side, player support is really varied. It depends on the TV itself specifically supporting chunk-transfer decoding in the mechanism. Many TVs are built on top of an internal browser-based UI, in which case they’re pulling in a lot of the browser features that are already there, but sometimes in an inconsistent way.

To address this issue, the Consumer Technology Association (CTA) has launched the WAVE Project to establish standards as more TV manufacturers and TV devices embed browser-based video playback into their applications. CTA WAVE aims to identify the browser features that we as streamers and player developers expect to have browser so we can create a consistent environment between TV devices. They’ve been making good progress, and a lot of TV devices are starting to align with the specifications the WAVE Project has created.

Building Your Solution

Figure 6 (below) combines all the elements of a scalable low-latency streaming solution that I’ve discussed here in a single graphic. Starting with capture, your best bet for professional production is a camera plus a hardware encoder. Then you sending it out via a wired network, most likely RTMP today because it’s so widely supported, but maybe SRT if you’re trying to get to the bleeding edge of low latency to your origin or transcoder, where you’re likely transcoding to CMAF, or DASH if you’re trying to do this today in the browser or on Android or wherever else you can make it work.

scalable low-latency streaming solution

Figure 6. Key components of a scalable low-latency streaming solution 

On today’s iOS devices, the lowest latency you’ll be able to achieve without any of these mechanisms is around 10 seconds. Short of hacking the AVPlayer API, you’re waiting for a future where we have Apple’s approach to low latency available on those devices.

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues