The State of Machine Learning and AI 2019
Per-title encoding comes out of research that Netflix has published about adapting encoding to individual videos. “The way to do it without machine learning is kind of brute force. You end up doing like a hundred test encodes to find the optimal encoding ladder, which is fine if you’re Netflix because you have a lot of compute and a relatively small catalog, but it doesn’t work for a lot of other types of users,” Dahl says.
“We do it with machine learning because we can do what would take a hundred hours of compute in a few seconds of compute,” says Dahl. “It’s a neural netbased system that relies on three layers that classify low-level attributes of video frames, and then matches that against the trained model, and the trained model knows the optimal encoding ladder for different types of video.”
“It’s not exactly this, but the simplest way to think about this is a neural net recognizes that this content is complex, this content is simple, or this piece is a little bit blurry or a little bit sharp.” It actually looks for very-low-level elements of video frames. Like what? “Well, that’s one interesting thing about neural nets, it doesn’t necessarily know what it’s doing. It’s just trained to deliver certain results.”
Image Quality
SSIMWAVE is working with multichannel video programming distributors (MVPDs) to make automated choices about video content processing, compression, and image quality. They are able to train systems to learn, analyze, and predict optimal scenarios for the highest quality image delivery.
Video content undergoes various processing stages such as resolution, frame-rate, scan-order, dynamic range, color format, and color space conversions at different points in a delivery chain. Encoding and transcoding variations, plus various transmission technologies, are some of the other variables that factor into analysis, says Abdul Rehman, Ph.D., CEO of SSIMWAVE.
SSIMWAVE is working with multichannel video programming distributors to make automated choices about video content processing, compression, and image quality.
Often, sources with the same content are not identical. “This allows for an opportunity to select the source that would provide better viewer experience across target viewing devices. For example, an MVPD may have CNN available in 1080@29.97i, MPEG2, 40Mbps, as well as a version in 720p60 compression using H.264 (high), 22Mbps,” says Rehman. “Higher resolution does not always equal better viewing quality in real world conditions, and identifying this in ever-increasing amounts of content requires machine learning and AI. It’s not possible to do this manually.”
Microsoft’s Video Indexer
For those who want to try out a tool without having development skills, Video Indexer from Microsoft provides 27 different machine learning and AI tools via a visual interface (there are also APIs for developers too). “[This solution] allows our customers to leverage AI for their video content without any prior machine learning knowledge and without the heavy lifting of integrating models together and tuning them,” says Ella Ben-Tov, principle program manager, Microsoft Video Indexer.
Video Indexer’s sweet spot is for content identification within large archives of video and audio, says Ben-Tov. “By indexing archives, we enable customers to know what’s in their content. One type of use cases we see is customers creating new content out of existing content more effectively, like creating a trailer for a movie or grab old clips on a specific subject for a news cast without having to manually scan all the assets.” There are also inline closed-captioning and translation capabilities, in near real-time, to 54 different languages.
Microsoft Video Indexer excels at identifying content topics and emotion within large archives of video and audio.
“The latest model we added is the cross-channel topic inferencing model that is using multiple insights such as transcript, OCR and celebrity face recognition in order to infer the topics discussed in the video, even if they’re not explicitly mentioned,” says Ben-Tov. You can find free demo accounts on the website.
IBM Watson
IBM Watson develops rich metadata to provide insights into what was otherwise unstructured data—visual descriptions, full-text transcripts, and other sentiment analysis is surfaced as content is processed.
“The use case I run into most as part of this team is really around sports highlights,” says Ethan Dreilinger, solutions engineer, Watson Media and IBM Cloud Video, which has worked on projects for Fox Sports, FIFA World Cup, the US Open, and the Masters. “[These are] all examples of where we’re taking in a broadcast game feed, creating markers of moments that are highlight-worthy, and then exposing that back either to an end user like we did with the World Cup or to an editorial team [that] makes a decision on it, like we do it for the US Open and the Masters.”
“We spent some time training Watson, our AI engine, on the specific sport or the specific events,” says Dreilinger. In the case of the FIFA World Cup, Watson was taught a dozen and a half moves or events like kicks, goals, red cards, etc. “The editorial team were then able to find specific plays from specific players in specific matches that they were interested in.”
“Output from Watson is a JSON [JavaScript Open Notation] file that has, in the case of video enrichment, lots of extended metadata and, in the case of closed captioning, it’s a speech-to-text file and an encapsulated closed caption file SRT,” says Dreilinger.
Better Metadata = More Viewing
Canadian broadcaster Rogers Communications’ Sportsnet uses Iris.tv to increase content consumption. “Iris helps us recommend video content that helps us grow our audience and our monetizable inventory,” says Dale Fallon, director, product management. “Iris has an algorithm which picks the next video (for viewers to watch) based on a number of factors. When there’s ten seconds left in the video, we bring up an overlay which tells the viewer the next video up will be highlights from the basketball game last night, and often we will go into a 15-second ad before the next video.” Iris.tv is also working with IBM Watson to dive even deeper into hidden content metadata.
“We have a number of clients using joint Iris.tv/Watson services,” says Field Garthwaite, co-founder and CEO of Iris.tv. Video libraries are run through Watson, then Iris.tv analyzes for keywords to create new categories to map assets to. “(For metadata enrichment) we use their APIs including Natural Language Classifier as well as the wide array of APIs such as audio/ visual analysis to help better identify and structure metadata,” says Garthwaite.
One large European publisher had hundreds of thousands of videos in its library as well as syndicated video with no taxonomy and limited metadata. Putting content through the two technologies provided increased video lift to 125% and revenue increased to 87%, says Garthwaite.
What Have We Learned?
So, how can you start leveraging machine learning and AI for your workflow or consumer-facing service? For companies that have the resources to hire development services, the idea of mixing and matching services is now viable. “Last year you could get by with just having a machine learning service and it’s cool because I can detect objects and I can pull transcription out of speech within the video,” says Jun Heider, chief technology and operations officer, RealEyes Media. Now he’s seeing certain platforms excelling in specific areas. “A lot of these vendors are starting to expose the ability to create custom models. It’s going to be more expensive because they need to work with you to build a model that’s geared towards your content,” says Heider, but often the results are superior to using a generically trained model.
“The best thing to do is use a subset of your actual content to see what service handles your needs best,” says Heider. “Make sure to not think of these services as ‘I have to pick one or the other.’ If I know X does well with transcripts, Y is really good with facial detection and Z does a very good job with detecting when scenes stop and start, you can hit all three services, pull the data back and normalize it.”
Machine learning and AI are not the solution for every problem, but they will be crucial in many use cases that are not easily handled by simple business logic and need a system that can process a lot of variables, learn, and then create a solution or result based on this data. Machine learning and AI certainly need human guidance, but the results can be truly mind-blowing.
[This article appears in the March 2019 issue of Streaming Media Magazine as "The State of Machine Learning and AI."]
Get instant access to our 2019 Sourcebook. Register for free to download the entire issue right now!
Related Articles
The uses for artificial intelligence (AI) and machine learning (ML)—from text identification to contextual advertising to cultural sensitivity—are multiplying. Here's a look at some of the most compelling offerings on the market today.
19 May 2022
The sheer volume of video generated in many workflows is impossible to manage without artificial intelligence and machine learning. Here are examples of how to leverage it to make your video production workflow more efficient and productive.
28 Oct 2020
As the use of live streaming and on-demand video continues to grow in the workplace, the addition of artificial intelligence promises to exponentially increase how video can be used and the value it can bring in transforming how work is done and how workers communicate and collaborate.
09 Sep 2019
Self-driving cars will see the road for you, and MPEG is working on standards for machine-to-machine video creation and consumption for the AI-powered neural networks that will make it happen.
03 Sep 2019
Network clips that display tune-in information are automatically suppressed by Facebook's AI, says BET, forcing the network to spend more on promotion.
22 Feb 2019
Meet the big four players in the video artificial intelligence space, then learn how they can speed up time-consuming tasks like generating metadata or creating transcriptions.
24 Oct 2018
Look for artificial intelligence and machine learning to improve content delivery, video compression, and viewer personalization, strengthening the entire workflow.
26 Sep 2018
Companies and Suppliers Mentioned