-->

’Round the Horn at NAB 2024: Videon, Telestream, Phenix, Ateme, V-Nova, Twelve Labs, Norsk, Dolby, and NETINT

Article Featured Image

Any NAB report is like the story of the blind man and the elephant: what you experience is what you touch, representing a fraction of the whole and perhaps not even a good sample. That being said, here’s what I touched during the show. Many of these experiences are accompanied by video that I shot of the interviews.

Videon LiveEdge Node and Max

One of my first stops at the show was at the Videon booth to see the LiveEdge Node and Max (Figure 1) as demonstrated by chief product officer Lionel Bringuier.

Videon LiveEdge Max

Figure 1. Videon’s LiveEdge Max delivers more than twice the performance of node, offers a confidence monitor, and like Node, accepts Docker containers.

Briefly, Node and Max are compact edge live encoders with the specs shown in Table 1. Node is the established product while Max is the new product with more than double the capacity plus an onboard confidence monitor.

Feature

LiveEdge Node

LiveEdge Max

Inputs

1 x 3G-SDI or HDMI

Single or dual 12G-SDI 4Kp60 inputs (w/ 16 channel audio) or HDMI

Outputs

4Kp30/1080p60

Dual 4Kp60

Codecs

H.264/HEVC

H.264/HEVC

Resolution

Up to 4K P30, commonly used for 1080 P 60

Up to dual 4K PS 60

Power

Power over Ethernet (PoE)

Power over Ethernet (PoE+)

Confidence Monitor

Not on the device, available in the Cloud

Yes, both on the front panel of the device and in the Cloud

Cloud Management

API for device and fleet management via the cloud

API for device and fleet management via the cloud

Additional Features

Docker container support for third-party applications

Enhanced processing power, Docker container support

The LiveEdge products include an API for individual device management and a cloud API for overseeing fleets of devices remotely. This dual API system is particularly useful for operations involving multiple devices across various locations, such as stadiums or event venues. Fleet management is facilitated through a cloud platform, which does not process media but offers tools for remote device supervision and control, enhancing efficiency and reducing the need for on-site management.

There are many live transcoders for event shooters, and most have cloud platforms. What distinguishes LiveEdge devices is their support for Docker containers, which allows them to integrate third-party applications directly into the hardware. Videon has a marketplace for such applications, which includes DRM from EZDRM, watermarking from Synamedia, error correction from Zixi, and LCEVC encoding from V-Nova. This lets users customize device functionality to suit specific needs and streamlines workflows by allowing direct on-device processing.

Telestream Vantage: AI-Driven Workflow Creation

My next stop was the Telestream booth for a quick demo of the AI-generated workflows in Telestream's Vantage Workflow Designer by John Maniccia, Director of Sales Engineering and Support. As you may know, Vantage is workflow-driven, so users can easily create different workflows with branching to deliver different outcomes based on file characteristics. For example, Vantage can detect whether a file is 1080p or 4K and assign it to a different encoding ladder based upon that determination.  

In the past, you built workflows via drag and drop, and a completed workflow is shown in Figure 2. What's new is the ability to type in the desired result in English and have Vantage build the workflow for you. On the upper right in Figure 2, you see the text that generated the workflow shown in the main panel.

Telestream Vantage

Figure 2. Vantage will build workflows from plain English commands. (Click the image to see it at full size.)

Given what we've all learned about generative AI over the last 18 months, this is more evolution than revolution, but it takes us one step closer to when you won't have to be a compression expert to create transcoding workflows. Good for management bad for compression experts, but inevitable.

There are still several missing pieces, like how you should configure the ladder for mobile vs. the living room, or how to choose among various codecs and HDR and DRM technologies. Still, that level of automated operation will almost certainly be included in products like Vantage or AWS Elemental MediaConvert within a year or two. Telestream gives us a first glance at what that might look like.

Phenix: Low Latency and Ring Around the Collar

The last time I heard from Phenix Real-Time Solutions was an email pitch to participate in low-latency trials performed while viewing the 2024 Super Bowl. I declined, but when I ran into Phenix COO Kyle Bank at the show, I couldn't resist asking about the result. As shown in Figure 3, the latency figures are shocking, with Paramount + delivering the lowest latency while still 43 seconds behind real time. The report also found that the drift, or ranges of lag experienced by viewers, went from a low of 28 seconds to an astonishing high of 134 seconds. To be clear, this means that viewers watching on the same service were as much as 134 seconds apart.

Phenix Super Bowl 2024

Figure 3. Average lab behind real-time for streaming services at Super Bowl 2024

Interestingly, Kyle mentioned that the 2024 latency results were actually worse than 2023, so it doesn't look like the identified services or their customers care much about latency. This led to a discussion of whether low latency is the Ring Around the Collar of the streaming world, a made-up problem to sell solutions that none of the major services seem to feel are necessary. This is especially so if you don’t have close neighbors who can spoil your experience with a cheer two minutes before the score or interception appear on your SmartTV.

Kyle politely explained that while Boomers may watch an event via a single screen, most younger generations watch with an eye on social media. So even if you don't share a wall with a sports fan with a faster service, posts on X can serve as a similar spoiler.

This prompted a conversation about deficits in WebRTC-based services that limited their attractiveness for traditional broadcasts. Kyle shared that Phenix has integrated server-side ad insertion and supporting adaptive bitrates within their WebRTC-based platform, addressing two of the major shortcomings. Kyle also mentioned that Phenix has served audiences as large as 500,000 viewers and can serve at least 1 million at latencies under .5 seconds.

That said, like most low-latency platforms, Phenix primarily serves the sports betting and gaming sectors, webinar platforms, and social media applications that integrate live content and influencers to drive user engagement. Still, it's good to see that Phenix—and presumably similar services—are advancing their low-latency technologies to serve an ever-broader range of viewers.

V-Nova PresenZ

One of my first stops during the show was at the Ateme booth, where I saw a demonstration of MV-HEVC, an extension of HEVC designed for encoding Multiview video content like 3D video. Specifically, MV-HEVC allows for efficient coding of multiple simultaneous camera views, using inter-layer prediction to improve compression by leveraging redundancies between the views.

In the Ateme booth I saw a demo of MV-HEVC on the Apple Vision Pro, and it was impressive, with excellent quality video. In the headset, the video image hovered a few feet away from me. When I turned to the left and right, the video cut off after about 180 degrees, and reached an edge when I looked too high or too low (see Figure 4 on the left).

That’s because MV-HEVC is a coding technology primarily designed for encoding multiview video content, which is typically used in stereoscopic 3D applications where different angles of the same scene are presented to each eye to create a sense of depth. The Apple iPhone 15 Pro and Pro Max produce video for MV-HEVC encoding by recording two 1080p videos simultaneously using two different lenses, then merging them into a single file.

The video appears to have depth because of the two different source files, and it does extend 180 degrees in the Apple Vision Pro, but it’s not fully immersive because there is no true depth. This brings us to the distinction between what’s called three degrees of freedom in movie viewing and 6 degrees of freedom as shown in Figure 4. The former lets me move my head to the left and right, and up and down, and this is what I saw in the Ateme booth. What I was about to see in the V-Nova booth was the PresenZ format, which delivers 6 degrees of freedom that puts me inside the video.

V-Nova PresenZ

Figure 4. The difference between 3 degrees of freedom (MV-HEVC) on the left, and 6 degrees of freedom (PresenZ) on the right. From here).

Specifically, in the V-Nova booth I donned a Meta Quest3 headset and found the experience mind-jarringly different than the Ateme demo. In the robot fight scene that I viewed, I flinched when debris flew towards my head and the combatants tumbled around me. If I took two steps into a room, I could see around a corner and view what previously was hidden by a wall. I could turn 360 degrees and as high and low as I could without extending beyond the video, though the quality was a bit soft, like 720p video scaled to 1080p. Noticeable to a compression geek, but not distracting. 

V-Nova's Tristan Salomé offered a detailed explanation of these technologies. He highlighted that while the Apple Vision Pro creates an impeccable stereoscopic view by tracking the viewer's eye movements, the VR technology I experienced on the device did not support changes in viewer perspective relative to the content—akin to viewing on a standard 3D TV. In contrast, PresenZ reacts when a viewer moves their head in any direction (up, down, forward, backward, or side to side), enriching the sense of immersion and presence in the virtual environment by mimicking real-life interactions more closely.

Producing films for PresenZ involves using computer-generated imagery (CGI) or capturing scenes with multiple cameras positioned around the subject. These methods help create a volumetric or 3D portrayal of the scene that users can interact with in a VR setting. Tristan pointed out the significant computational demands and sophisticated encoding required to manage the extensive data involved in creating these immersive experiences. This is why V-Nova acquired the PresenZ technology, to marry it with their codec, LCEVC.

It's hard to see how a technology like PresenZ scales, though that's an issue with all AR/VR. It's also uncertain if most viewers, who have long enjoyed movies from a static seat or recliner, will find a more immersive experience appealing. Still, of everything I saw at NAB, PresenZ was the most striking.

Note that there is an upcoming standard called MPEG-I (for immersive) that will support full six degrees of freedom. Until that is available, PresenZ may be the best alternative, and yes, it is compatible with the Apple Vision Pro. 

Twelve Labs: Automated Deep Metadata Extraction

 

For many publishers, metadata is the key to unlocking the value of archived content, but manually creating metadata is expensive, time-consuming, and ultimately incomplete. But what if there were a way to automatically generate extensive metadata that would enable you to find and retrieve footage using an extensive array of prompts?

That's what Twelve Labs has done. I spoke with the Head of Operations, Anthony Giuliani. He explained that the company's technology utilizes advanced multimodal video understanding models that enable a deep understanding of video content similar to human cognition without relying on traditional metadata (Figure 5).

Twelve Labs

Figure 5. Twelve Labs’s AI understands videos like humans.

Instead, the system creates video embeddings, akin to text embeddings in large language models, which facilitate dynamic interaction with video content. This technology allows users to search, classify, and perform other tasks with video data efficiently, complementing any existing metadata. Unlike text-based metadata, the technology harnesses various modalities within a video, including sound, speech, OCR, and visual elements, to enrich the video understanding process.

As an example, Giuliani asked me to think of a scene where the protagonist had to choose between a red pill and a blue pill. If you've seen the The Matrix, you'll instantly flash to the scene where Keanu Reeves has to make that choice. Giuliani explained that this demonstrates how the human mind can instantly recall specific cinematic moments without needing to sift through every watched movie or rely on tagged metadata.

Twelve Labs' technology mimics this human-like recall by creating video embeddings, allowing dynamic interaction with video content. This enables users to quickly and efficiently pull up specific scenes from vast video databases, akin to how one might instantly remember and visualize the iconic Matrix scene.

Twelve Labs offers this technology primarily through an API, making it accessible to developers and enterprises looking to integrate advanced video understanding into their applications. The pricing model is consumption-based, charging per minute of video indexed, with options for indexing on private or public clouds or on-premises. This flexible and scalable approach allows a wide range of users, from individual developers in the playground environment with up to ten free hours to large enterprises, which may require extensive, customized usage.

Currently, the platform serves diverse clients, including major names like the NFL, who utilize the technology to enhance their video content management and retrieval, particularly for managing large archives and post-production processes. The potential applications of this technology are vast, ranging from media and entertainment to security and beyond, indicating a significant advancement in how we can interact with and understand video content at a granular level.

Norsk: No Code/Low Code Media Workflows

<

I next chatted with Adrian Roe from id3as/Norsk, who introduced their new product, Norsk Studio, at NAB. Norsk Studio builds upon the Norsk SDK launched at Streaming Media East in May 2023, providing a graphical interface that enables users to drag, drop and connect pre-built components into a publishing workflow with no coding required.

Studio comes with multiple pre-built inputs, processes, and outputs, ranging from simple ten-line scripts to more complex modules, facilitating customized media workflows that can adapt to the specific needs of any project. Customers can build new reusable components using the Norsk SDK, which is supported by various programming languages. Adrian explained that most customers prefer TypeScript due to its expressiveness and the availability of skilled developers. Adrian also discussed Norsk's deployment options, noting that both SDK and Studio-created programs can be run on-premises or in the cloud.

Finally, Adrian shared that Norsk had won the IABM BaM award in the Produce category (Figure 6), which “celebrates outstanding technological innovations that deliver real business and creative benefits.”

Dolby Professional: Hybrik Cloud Media Processing

Dolby Hybrik is a cloud media processing facility that has long prioritized the ability to build QC into encoding workflows. At NAB, I spoke with David Trescot, Hybrik co-founder, who showed me multiple QC-related innovations, several of which were enabled via AI.

Some of the most useful additions relate to captions, a staple for most premium content. For example, Dolby added a dialogue enhancement capability that separates dialogue from background music. The dialogue can then be transcribed, and if the video doesn't have captions, Hybrik can create them. Hybrik can also compare the transcribed captions to the actual captions in the package to verify that they belong to that video and are in the correct language and can verify all language tracks in the master. From a pure audio mixing perspective, once the dialogue and background are separated, you can remix them to make the dialogue more distinct.

Hybrik also added a useful GUI to the QA function so you can visually examine the video and listen to the audio at the locations of reported problems (Figure 7). For example, on the upper left of the timeline you see a spike in the Blockiness measure that warrants a look, as well as black detection on the upper right. For audio, you see an emergency alert signal on the bottom middle and silence detection on the far right. Absent the GUI, you'd have to download and play the content in your player of choice, which is cumbersome. Now, you can drag the playhead directly to the problem and assess it.

dolby hybrik

Figure 7. Hybrik's new GUI for QA. Click the image to see it at full resolution.

Interestingly, the technology behind the player, called Resource-Agnostic Swarm Processing (RASP), is as interesting as the player itself. Here's why: Most cloud infrastructures can't play media files, particularly files stored in high-quality mezzanine formats like ProRes.

So, imagine if you have your masters in the cloud in ProRes or similar format and must perform some QC function or visual inspection. Your only option would be to either download the file or transcode the file to a friendlier format and inspect that, but you'd still need a frame-accurate player. If you transcode the file, you may have to transcode the entire file, which is expensive, and then you can either store the transcoded file, which adds to your monthly cost or delete it and risk having to create it again for a later task.

RASP is a cloud media operating system that streamlines these operations by transcoding assets in small chunks only when necessary for the specific operation. In Figure 7, to sample the blocky region at the clip's start, the operator would drag the playhead over, click Play, and RASP would transcode the required video on the fly as needed. These operations would be transparent to the user, who has an experience similar to files stored locally. RASP is a natural for any application involving media stored in the cloud and will be available on a cost-per-minute basis from Dolby.

One Final Stop: NETINT

My last stop was at the NETINT booth to greet some former colleagues. There I saw the public debut of Whisper transcription as integrated into the new NETINT Bitstreams Edge media processing application that runs on the NETINT Quadra Video Server Ampere Edition. The server is powered by a 96 core Ampere Altra CPU and ten NETINT Quadra transcoders and costs $19,000.

There were lots of products and services delivering captions via Whisper at the show. What’s unique about this server is the ability to support up to 30 simultaneously transcoded live channels each with five HLS or DASH packaged profiles encoded in H.264, HEVC, and AV1.

NETINT has been selling ASIC-based transcoders since 2019, but the Bitstreams Edge is the first homegrown server software. NETINT and Ampere presented the solution at the Streaming Summit at the show, of video of which should be available by the end of April. The captions shown in Figure 8 were created live using the new solution during this presentation.

netint whisper captions
Figure 8. NETINT demo'ing captioning with Bitstreams Edge/Whisper integration

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

NAB 2024: Blackmagic Design, SMPTE 2110, and Video Over IP

In this interview from the Blackmagic Design booth at NAB 2024, Blackmagic Design's Bob Caniglia and Streaming Media's Shawn Lam discuss how Blackmagic is enabling producers to convert 4K and HD signals to SMPTE 2110 so they can move content across IP networks, with their new open-source 2110 IP codec and new 10 gig port-equipped Blackmagic Design cameras that support it like the PYXIS 6K and the URSA Cine 12K.

NAB 2024: vMix Talks vMix 27 and Zoom Integration

Among the key features of vMix 27 are Zoom integration, enabling remote streaming producers to bring in (theoretically) an unlimited number of remote guests, vMix Senior Systems Engineer Heath Barker reports in this interview with Streaming Media's Shawn Lam in the vMix booth at NAB. Barker also does a quick hands-on demo of how the feature works.

Introduction: Foundations of AI-Based Codec Development

To promote the understanding of AI codec development, Deep Render's CTO and Co-founder, Arsalan Zafar, has launched an educational resource titled "Foundations of AI-Based Codec Development: An Introductory Course." In a recent interview with Streaming Media Magazine, Zafar provided insights into the course's content, target audience, and expected outcomes.

SMNYC Sneak Preview: Keynote: Microsoft and Fremantle Beat the BUZZR with Gamified FAST

On Tuesday, May 21, Microsoft CTO Andy Beach and Fremantle SVP, Global FAST Channels Laura Florence will present Streaming Media NYC's opening keynote. They will discuss the new Beat the BUZZR platform, how it gamifies FAST and capitalizes TV, the collaboration that brought it about, and the AI under the hood.

NAB 2024: Top New Tech for a Disruption-Ready Streaming Industry

Exciting new and (mostly) AI-driven tools and services from NAB 2024 that very specific problems, from shooting great iPhone footage to automatically creating short clips to providing live low-latency translation and captioning to creating customized radio programming to building purpose-driven social communities.

NAB 2024: Assessing the AI Revolution in Entertainment

The signal-to-noise ratio in today's relentless AI buzz is far from optimal—particularly at NAB 2024, where the Everything AI vibe was off the chart from the moment the show began—but one session that cut through the noise and swapped hype for refreshing insight and candor was titled The AI Revolution in Entertainment: One Year On…

Companies and Suppliers Mentioned