Choosing the Optimal Data Rate for Live Streaming
The object of the exercise was to show how output quality was affected by the data rate of the incoming video in a cloud transcoding scenario.
Results, Please
Let’s start with a brief description of each test and what it purports to measure. The Moscow University test measures the comparative quality of the clips, with lower scores being better. Scores are not tied to anticipated subjective ratings in any way.
The SSIM wave test assigns a numerical score to each clip, and those numerical scores are designed to align with anticipated subjective evaluations. For example, all clips that score in the 80–100 range should be considered excellent quality by real-world viewers, while clips that measure between 60 and 80 should be rated as good quality by real viewers.
With this as background, let’s dive into the results.
Talking Head
Chart 1 (below) shows the talking-head results as measured by the MSU VQMT tool. As a brief aside, the Moscow University VQMT offers a range of different analyses, including Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Video Quality Metric, which is the measure I used in all my tests with this tool. With the VQM test, again, lower scores are better.
Chart 1. Talking-head source and transcoded clips as measured by the VQMT. Lower scores are better.
The blue line shows the scores for the source clips, which starts high at 1 Mbps and drops down quite nicely with the increased data rate, with the quality improvement continuing to persist even at the higher data rates. The red line shows the VQM values for the 360p transcoded clips, while the yellow line shows the values for the 1080p transcoded clips. As you can see in the chart, while the values continue to improve with the increased data rate of the source clip, you reach diminishing returns at around 4 Mbps.
Let’s see what this looks like with the SSIMWave test, which uses the SSIMplus algorithm developed by the company to predict the subjective evaluation of actual viewers (Chart 2, below). As you can see, these tests follow the same pattern as the VQM. Specifically, the source clip continues to improve at higher data rates, while the 360p transcoded clip essentially stops improving at around the 3 Mbps mark. At 3 Mbps, the 1080p clip hits the 92.5% mark, and improves perhaps another 1.5% at higher capture bit rates. Of course, with all scores at 90 or above, all clips should be perceived as excellent by subjective viewers.
Chart 2. Talking head source and transcoded clips as measured by the SSIMWave SQM. Higher scores are better.
Taken together, the two tests indicate that data rates in excess of 3 Mbps deliver very little quality benefit that a viewer will actually notice. This is partially verified by Figure 1 (below), which shows the talking-head clip split between two transcoded clips; on the left a clip encoded from the 3 Mbps source clip, on the right the clip encoded from the 6 Mbps source. In this low motion clip, it’s very tough to notice any difference between the two. In fact, taken literally, the SSIMWave tests suggest that you could drop the capture data rate down to one megabit per second and still produce a very high quality transcoded clip with an SSIM score of around 90 for the 720p clip and 95 for the 360p clip.
Figure 1. Two transcoded clips, on the left from the 3 Mbps source, on the right the 6 Mbps source.
If you compare the 720p output from the 1 and 3 Mbps source files, you will notice a slight pulsing at key frames in the 1 Mbps source clip, but otherwise, there’s very little to distinguish the two. Why doesn’t the quality difference in the source clips proportionately pass through to the transcoded clips? Likely because the overall degradation caused by the second encoding is a great leveler that minimizes even moderate quality differences in the source clips.
This is not the first time I’ve seen this effect. Once I tried to ascertain the optimal data rate for uploading mezzanine files for transcoding by a movie service. Starting with a ProRes mezzanine file, I produced 1080p H.264 output at data rates ranging from 10 to 50 Mbps to represent the potential alternatives. Comparing the 10 and 50 Mbps files using the Moscow University tool produced about a 20% quality differential. Then I encoded both clips using the most demanding preset in the client’s adaptive group, and analyzed the output, and found that the quality difference had shrunk to 3%.
In this analysis, the bottom line is that for a low-motion talking-head clip shot with minimal detail in the foreground or background, source clip data rates in excess of 3 Mbps are hard to justify, and you likely could produce very good quality at substantially less. Let’s see how all this changes with the concert clips.
Concert Clips
Chart 3 (below) shows the VQMT scores for the concert clips, both source and transcoded. Again, the source clip show improving quality at the higher data rates, though surprisingly the benefits slow as compared to the talking head clip. The 360p clip levels out almost completely at around 3 Mbps, while the 1080p clip reaches diminishing returns at the same data rate.
Chart 3. Concert source and transcoded clips as measured by the VQMT. Lower scores are better.
Not surprisingly, the SSIMWave SQM results (Chart 4, below) show a similar pattern. While the quality of the source clips continue to improve nicely at the higher data rates, the 360p clip stalls out almost completely at 4 Mbps, at which point quailty improvements in the 1080p clip also start to slow.
Chart 4. Concert source and transcoded clips as measured by the SSIMWave SQM. Higher scores are better.
Figure 2 (below) shows The transcoded clip from the 3 Mbps source on the left in the clip from the 6 Mbps source on the right. As the charts suggest, there is very little difference between the two. Even when perusing the frames at high magnification, it is difficult to see major differences between the encoded clips.
Figure 2. Two transcoded clips, on the left from the 3 Mbps source, on the right the 6 Mbps source.