Demuxed: A Video Engineer's Nirvana
Once the system was implemented, Wowza found it could deliver between 1.5- to 2.5-second end-to-end latency at scale, including delivery via midgress and edge servers, each running the Wowza Streaming Engine, with further optimizations and improvements planned all along the design chain. In terms of implementation, Wowza built its system on Microsoft Azure because it offered a very robust traffic manager which allowed it to optimize first mile performance via origin selection, and last mile performance via edge selection. In terms of the player, Wowza found that it could deliver two frames at a time most efficiently, with a very small buffer to minimize latency. They also measured drift to ascertain when the system fell behind and implemented a catchup algorithm that could accelerate playback to restore proper timing.
As implemented, the system does have several key limitations, including that it's a single-bitrate approach tied to H.264/AAC, and since it relies upon the Media Source Extensions, it can't be played on iOS browsers. Since it's not a chunk-based protocol, it also can't leverage a traditional HTTP infrastructure. Looking forward, Wowza is considering HTTP 1.1 chunked transfer encoding using CMAF, and adding the Secure Reliable Transport (SRT) delivery to enhance operation.
Overall, the talk was an interesting description of how Wowza created a new low-latency service offering at scale. The service is currently in preview with about 200 customers, and should be available for release in 4Q2017.
Bitmovin's Adaptive Delivery
One of the newer concepts in streaming is adaptive delivery, where files are encoded using a standard bitrate ladder, with delivery monitored so that higher bitrate segments will only be delivered when they improve perceptible quality. An example of this is MediaMelon's QBR technology, which we reviewed here.
At Demuxed, web infrastructure provider Bitmovin discussed a similar technology, as detailed by principal solutions architect Reinhard Grandl in his talk "Going Beyond Bitrate: Adaptation Based on Video Content." Called Per-Scene Adaptation, Bitmovin's technology makes the client aware of the quality of each scene in the video as measured by the Bitmovin encoder (Figure 5).
Figure 5. Bitmovin's per-scene adaptation is similar to MediaMelon QBR.
As shown on the right in Figure 5, Bitmovin's per-scene adaptation uses quality-related data to make stream switching decisions, in addition to the traditional buffer and performance-related data. So, if a 5800Kbps 1080p stream doesn't deliver additional quality as compared to a 3400Kbps 1080p stream, the player won't retrieve it.
As with MediaMelon, Bitmovin claims that its system can both save bandwidth costs and avoid stalls by predicting high-bitrate scenes and saving up bandwidth for them to play at higher bitrates (MediaMelon calls this "uplift"). Bitmovin's implementation can switch playback between all rungs on the encoding ladder and is codec- and container format-agnostic. Though not currently available as a service offering, Bitmovin plans to introduce Per-Scene Adaptation over the next few months.
Other Points of Interest
Beyond these discussions in my encoding wheelhouse, several other presentations caught my eye. Stephen Robertson, a software engineer at Google, showed how YouTube collected and visualized data to help diagnose delivery problems. During the talk, Robertson presented several highly usable techniques that any OTT shop could easily deploy to assist their QoE improvement efforts.
Gary Katsevman, a senior software engineer at Brightcove tackled the always scary post-lunch time slot with an epic tutorial on closed captions, which included a detailed comparison of WebVTT and TTML. His conclusion? That WebVTT is better for the web, and TTML better everywhere else. Anyone seeking to get up to speed quickly on the rules, regulations, history, and technologies relating to closed captioning should check out this talk.
The final talk of the day was among the most interesting, as two engineers from Twitch compared the live transcoding capabilities of its own homegrown encoder against FFmpeg. Yueshi Shen, a principal research engineer at Twitch, was the primary speaker, while software engineer Jeff Gong filled in many of the technical details.
The pair started by discussing Twitch's scale, which often involved up to 40,000 concurrent input channels, creating exceptional demands on encoding efficiency, particularly since most streams are input at 60fps. As shown in Figure 6, Twitch transmuxes the input stream for HLS delivery, and transcodes it to the other rungs shown on the upper right. This avoid the generational loss associated with transcoding, and reduces CPU requirements for producing the full bitrate ladder.
Figure 6. Twitch's encoding workflow and encoding ladder.
Then, point by point, the duo reviewed the reasons why FFmpeg's admittedly efficient live encoder wouldn't work for them, and how and why they designed their own live transcoder. The results? When both transcoders produced a 720p60 stream from the 1080p60 input, FFmpeg was slightly faster than the Twitch Transcoder. However, when producing 720p60 and 720p30 outputs from the same source, the Twitch Transcoder was about 40% more efficient, a margin that grew to 65% when producing the entire output ladder.
When you consider the costs savings associated with a 65% CPU reduction spread over 40,000 input streams, the scope of the accomplishment is clear. This was a lively and highly interesting description of a very successful product development.
Food, Drink, and More Discussion
Then it was off to heavy hors d'oeuvres and adult beverages. In mingling with other attendees, I asked what presentations had caught their eye. A group from CBS Interactive touted a talk by Allison Deal that detailed how to take traditional broadcast input and convert it to a personalized OTT experience. An engineer from MLBAM related that he found Open Broadcast Systems' Kieran Kunhya's presentation on converting from HD-SDI based workflows to IP-based workflows a useful overview of the challenges faced by broadcasters making that transition from a knowledgeable source. A group of engineers from multiple sources were still chuckling over the amusing anecdotes of senior video encoding engineer Derek Buitenhuis, who related the challenges of successfully working with the amazing variety of input formats uploaded to Vimeo.
This still leaves about four talks unmentioned, and one of them very well could be of critical interest to you. So check out the entire list at Demuxed. While you're there, you might follow Demuxed on Twitter or YouTube so you'll be sure to be in the loop for next year's event. If you get the opportunity to go, I heartily recommend it.
Related Articles
The annual conference for video engineers by video engineers produced must-watch sessions from Akamai, YouTube, Mux, and many others. Here's a survey of useful sessions, along with a link to the video library.
19 Nov 2018
Companies and Suppliers Mentioned