The State of Generative AI 2025
Can generative AI (gen AI) rip things apart and then try to glue it all back together with more insight, more intelligence, more efficiency, and more value? It really depends on what the application is.
Starting with the building blocks of using AI, you need large language models (LLMs) that are trained on huge amounts of textual data. This caused the first major AI fight, in which individuals and companies did not want these models training on their intellectual property. The idea is that LLMs can learn from this data and then generate ideas, content, and what some people think of as intelligence.
One school of thought believes that gen AI is just the progression of technology, which, by its nature, is continually disrupting itself. Another school of thought says that gen AI is radically changing how things are done, because it enables us to take multi-step, manual activities and automate them, plus have insights we couldn’t manage at scale and an end goal that materially moves business forward.
Autocorrect on Steroids
Within the video workflow, some tasks are good matches for using gen AI, and others aren’t. After being deluged by “AI-powered” oddities at CES 2025, such as Horse AI, Barbeque AI, and Laundry AI, it’s actually refreshing to hear an honest assessment of where things will and won’t work for those of us in the streaming world.
“Everyone is very curious about how we could apply this very exciting, interesting, new technology, and we definitely hear that from customers too,” says JWP Connatix CTO David LaPalomento. “It’s getting to the point where maybe it would be an interesting tool for the production process for parts of it, but that’s not where JW Connatix sits in the supply chain.”
LaPalomento says the role at his company’s point in the supply chain where AI excels is “text, video, or audio building rich media new content.” But he regards that function as more machine learning-driven than leveraging gen AI. “It’s more classification,” he notes. “Today, I don’t think we have anything that I would describe as generative AI in production offered to customers. What we have done is experimented in a bunch of different ways.”
LaPalomento says that one valued service gen AI is uniquely equipped to provide “is a really awesome autocorrect. It will make suggestions that are better than what you previously did.” This includes organization and classification that would ordinarily be handled by editors. Even if it doesn’t eliminate the need for human intervention, he states, “At least they will have a really great starting point that’s informed by some smart algorithms behind the scenes. I definitely see a lot of area for improvement there.”
Gen AI may well disrupt streaming workflows in various ways, but as a number of people interviewed here have said, it is still super early, and it hasn’t happened yet—no matter how much some companies with a stake in selling “AI-powered solutions” may want you to think otherwise.
What gen AI will deliver is “optimization, whether it’s monetization or video strategy,” says LaPalomento. “That’s a really weird intersection between a whole bunch of coding and some business problems that you’re trying to maximize.”
My intent in the interviews I did for this article was to focus on distribution-related use cases, but I found the conversations often turning to production-centric use cases. It was a bit like asking an LLM a question about A and having it return data about B. But the possibility of getting unexpected answers is one of the things that (still) makes interviewing real humans worthwhile.
Multimodal Model Training
To begin, let’s touch on training data first. TwelveLabs is an AI company that builds video foundation models. “Video is a very different data format from language or textual data,” says Maninder Saini, TwelveLabs’ head of growth. “Within video, you have audio, visual, and spatial temporal relationships—the things happening on screen and their relative relation to each other over time.” Those characteristics, Saini notes, are what make video data unique.
“Traditionally, with computer vision models, you splice video into different frames and do keyframe analysis, and in parallel, you might run transcription to try to understand the audio and then glue it together and—voilà—you have video understanding. A lot of the LLMs take this approach as well. They try to shoehorn video into a data format that’s not truly video understanding because you lose a lot of important context and nuance of the movement that happens in video,” says Saini.
TwelveLabs has been working on this for 4 years. Google also offers multimodal AI, using data sources from images, video, and audio on top of text. Google’s technology was introduced in December 2023 with models like Gemini.

Finding a scene in natural language with TwelveLabs’ multimodal AI
Describing a common use case, Saini says, “Editors might have massive bins where they’re trying to find the relevant moments from their dailies to splice together the specific scene they’re trying to make. Traditionally, you would have a human in the loop tagging manually, creating that metadata.”
Multimodal AI models eliminate the need for human intervention through scene-level understanding, so users can do a semantic search in natural language. The models, Saini explains, “can search for emotionally evocative moments. One of our clients, Maple Leaf Sports & Entertainment, told us it takes them about a couple of days to ideate, send the request, get the raw footage back, and find the specific moments. With the natural language query for their entire archive, they were able to pull together from idea to first cut in a couple of minutes. Social already has about 80,000 views on it.”
The cost is roughly $2 per hour of video, Saini says. “If you have 1,000 hours of footage, you’ll want to index on our models out of the box; it’s going to run you about $2,000 to do that.”
TwelveLabs doesn’t work with live content now, but for on-demand video, its search engine can process content with resolutions from 360p to 4K using FFmpeg.
When shopping for gen AI models for these types of applications, Saini advises, the questions should be, “What is the accuracy level, and how much does it cost to use those models for video? For the vast majority of larger studios—let alone indie content creators—it can be cost-prohibitive to use some of those models to power their workflows.”
As for training the models, Saini says, “We licensed our data from rights owners that have the license or have created the content themselves. We also do what we call ‘exchange of economics.’ That [refers to] rights owners that have unique proprietary video datasets who give us their video data in exchange for access to our models at favorable commercials. And then lastly, we also have started experimenting with synthetic data.” Their models are accessible via API to work with editorial, asset management, and distribution tools.
Legal Training
Flikforge is a “Gen AI monetization” company that tracks data provenance to identify what data is being used to train models and ensure fair royalties are collected by creators for their copyrighted works and licenses.
“It’s a legal problem when you can generate anything you want” from your models, says Jeff Allen, Flikforge’s CEO and head of AI platform. “All of the big tech LLMs will say, ‘We have incorporated content in our training algorithms, and we’re claiming fair use.’ They may or may not win those arguments,” but once a digital work is in your model, “you can generate anything you want. The problem is that an advertiser who generates a commercial or a filmmaker who generates a film using these technologies probably would not want to publish it. They don’t know where that training data has come from."
That’s where Flikforge comes in. “We’re sourcing and tracking the licensing aspects of it, then we’re actually enabling transactions on our platform as well as doing the indexing. The indexing piece is actually where the value is, because we’re essentially preparing the files for the data science teams,” notes Allen.
There have been a number of high-profile examples of re-creating known voices of celebrities without their permission, as when OpenAI allegedly appropriated actor Scarlett Johansson’s voice for its ChatGPT after she repeatedly refused their requests to license it. Flikforge is also trying to make inroads in this area, working with stars to train and license their voice likenesses legally.
Advertising Gains
Tool is an advertising production company whose tagline is “Creating content and experiences at the speed of culture.” In 2024, it began incorporating gen AI content alongside live-action video into the ads it creates for its brand partners.
“Three months ago, we created a hybrid, experimental commercial for Land Rover,” recalls Dustin Callif, Tool’s president. “We decided to use live-action production to shoot the human performance and the actual car itself. But then we used generative AI to create all of the environmental scenes and one shot of the car. In October last year, we could not get human performance. We took photographs and were able to add motion and other elements. “Fast-forward to January, and we received access to Google’s Veo 2, their video generative AI video platform, and it has blown everyone away with the ability to get facial reactions and things that don’t look so unrealistic.”

Tool’s “Who Says Cars Can’t Dream” experimental ad for Land Rover. (See the full ad.)
From a technical perspective, issues with consistency persist. “The biggest thing that’s holding it back from being [usable] for entertainment and advertising—aside from the legal aspect—is you can’t do dialogue,” Callif concedes. “Everybody has started to understand that AI is going to be a central part of how we make and what we do. CFOs and CMOs are now challenging their teams to say, ‘We have to find new ways to create marketing content because 1) they need way more volume, and 2) the costs are getting too high to do it the way that we used to do it.’”
Tool trains its models on a specific brand’s look using open source models Stable Diffusion and Flux, Callif says. “We get anywhere from 50 to 100 images on a specific look and will train on that.” The company does this on a private network so that none of the data goes out to the public. “We’re still on the cusp because we need legal teams at brands to be OK with some of the risks with AI,” he notes.
But the day when this part-live action, part-gen AI approach becomes more than experimental isn’t far away, according to Callif. “We’re starting to hear a lot of brands say, ‘We’re now ready to do a pilot program.’”
Subbing and Dubbing
“I’m willing to confess, I’m confused. It’s really hard to burst all the PR around it,” says Joshua Stinehour, principal analyst at Devoncroft Partners, of the current state of gen AI hype in the streaming world. “Google put out what I actually thought was a pretty provocative thought piece on gen AI activities. They’re quoting in there, like 20, 30% productivity gains."
Stinehour remains skeptical that productivity gains like this are happening in real-world scenarios today. “If that were occurring in media workflows, I feel like we would see it in income statements for big media companies,” he notes. “Or we would be hearing, ‘We’ve shrunk the shoot window for these films by a third.’ I have seen no evidence of things like that. You will see certain use cases, around captioning or translation, where you previously had these hideous workflows that have now been made into something coherent, and what previously took a month now takes a couple of days.”
The active use cases can be a matter of perception. If a company has a customer, and it is willing to speak about it, how relevant are the findings? With new technologies, you will always have this issue to begin with.
“What we’ve seen this past year is that these media companies are doing this with measurable ROI,” says Albert Lai, global strategic industries director for media and entertainment at Google. “These three tiers are content production, the understanding of contents (personalization, which is more about engaging consumers in those consumer-facing experiences), and internal productivity.”
The localization use case is a popular approach. “It’s about creating captions, subtitles, and dubbing in order to increase accessibility of content. The best use case of this is Warner Bros. Discovery. They published a case study where they explained how they used AI with Google Cloud in order to augment their teams that were creating captions for their Max streaming product,” says Lai.
This case study identified a 50% reduction in overall costs and an 80% reduction in time, reducing a 4-hour task for a half-hour show to under an hour. “[The file is] already timecoded, formatted, and ready to edit. And all we have to do is verify that all the words are correct and that the file is in sync. Before, it took about 4 hours for a show like this. And now, since we’re literally just going through and confirming that everything is correct, it’s reduced it down to sometimes under an hour for a half-hour show,” explains a Warner Bros. Discovery employee in the video.

Warner Bros. Discovery demonstrating AI-assisted captions with Google Cloud
“Our customers make deals to sell their content to streaming platforms or other broadcasters and so on for syndication deals and need to deliver those to those endpoints,” says Chris Braehler, VP of product at SDVI. “Both have localization and other metadata challenges. When it comes to subtitles, that obviously is a big gain there. While text-to-speech was around before, now having an LLM that is able to understand and make sense [of the content] is a great asset. And then adding translation on top of that and being able to speed that up makes that a much more affordable process. It’s the number-one use case.”
Braehler says that one ongoing challenge is helping customers to “properly understand what the tool is doing with the media that they feed it and understand what they can do with the media that’s being created. When it comes to a subbing-and-dubbing workflow, the impact there is very little,” he contends, because they are working with public information.
Still, there’s the possibility of infringing on copyright and the question, “If I throw my library at a gen AI model, does that tool have free rein to create derivative material?” This remains a common and relevant question. “Our customers are taking a bit more of a cautious approach toward fully jumping into utilizing gen AI in all aspects of their supply chain from a legal security standpoint,” Braehler says.
Another company that has relied on gen AI to support its international growth is Xperi/TiVo, which has eight OEMs and 33 brands in 26 countries and supports 42 different languages. Without relying on gen AI for translation and localization, according to Chris Ambrozic, Xperi’s VP and general manager for discovery, “We would not be able to serve all those languages.” Machine translation engines like Google Translate generally do a subpar job with video content, he explains, and to bring translations up to a reasonable standard, “You have to actually engage with a user to say, ‘How would you say this? And how would you achieve that in your language?’ Historically, that would’ve meant working with human beings.”
Moreover, translating content satisfactorily and cost-effectively is only one step toward the more ambitious goal of creating global brands. “How do we get recommendations, search, and voice interaction scaling up across multiple languages? We’ve been using gen AI for that,” Ambrozic states. “How do you describe The Hunt for Red October? Well, historically, metadata would have described The Hunt for Red October as being action, suspense, Navy. But you would not have had the emotional language associated with somebody who’s betraying their country in order to make what they think is the right decision. Now I have that for millions of different TV shows and assets across all these different regions,” and only gen AI makes that possible.

Accessing localized content through TiVo ONO, TiVo’s advanced TV service in Spain
Understanding the Customer Journey
Bitmovin has three products that use gen AI: AI Session Interpreter, AI Error Interpreter, and AI Contextual Advertising. AI Session Interpreter is the only one that’s live in the market at this writing. Bitmovin continues to work with some partners on developing AI Error Interpreter and AI Contextual Advertising.
AI Session Interpreter is a Retrieval-Augmented Generation (RAG) model designed to work with Bitmovin Analytics. It takes all session-level data—every event, every buffer, every seek—and distills it into “clear and concise summaries” for Bitmovin clients’ support teams so they can gain subjective insights into user behavior, video performance, and any technical issues that arise during the session to identify patterns and look for areas to improve.
“We have a model which we’ve tuned ourselves to present back in plain text as a summarization of the session,” says Jacob Arends, Bitmovin’s senior product manager for playback and AI. “How long did they watch for? How many ads did they see? How much time did they spend buffering?” RAG is a cost-effective way to reference specialized information and extend LLMs to specific domain knowledge without the need to retrain the model. “[AI Session Interpreter has] been live for over a year now and won Best of Show at NAB last year.” Arends adds.

Bitmovin AI Session Interpreter
Better Search
Fox Sports has a vast library of content it has shot and produced over decades. With the help of Google Cloud’s gen AI-powered Vertex AI Vision, says Google’s Lai, “they can take that content, run it through these models, and then extract the metadata, which could be textual metadata or could be embeddings.”
The result is a search capability that enables internal users to access and discover what they are looking for within millions of videos, using natural-language queries to discover precise moments. “When we think about that understanding of content, it is both the extraction of metadata for use and monetization and personalization, but then also making that content searchable and accessible to those internal users,” says Lai.
Google Cloud and Fox Sports partnered in 2020 to build Fox Sports’ Intelligent Asset Service (IAS) media asset management platform, which by 2022 had been implemented across all Fox Sports broadcasts. New developments in Vertex AI Vision continue to expand what IAS can do, as well as the degree of protection afforded to Fox Sports video content. “When they use and provide data with Vertex AI,” Lai says, “it is not shared with the broader Google ecosystem, it’s not shared with Google Cloud, it’s not used to train other models, their IP, their data, their consumer data. It is private and it is protected.”

Fox Sports’ AI-powered Intelligent Asset Service media asset management platform, aka “Fox YouTube”
Metadata
Underlying content metadata is an essential element for distribution for companies powering video workflows, whether at small or large scale. This metadata is a logical step for AI use cases because working with metadata can be more affordable than working with actual video files.
“Metadata is core to being able to inform your content growth strategies,” says Zeenal Thakare, Ateliere’s SVP of enterprise solutions architecture. “It’s not just a way for you to search, curate, and index. It really is a way you inform yourself not only in terms of what you distribute, but what you acquire and what you produce. It’s really the fuel that helps you build that content growth strategy. The supply chain no longer needs to be just automated or efficient; it also has to be intelligent to be able to predict where you should be able to send content and what type of content you can repurpose from your inventory.”
Enterprise and smaller media companies often don’t have the same metadata within their content that larger media companies do. Brightcove’s new AI Content Suite is designed to make it easier to generate metadata by automating its creation. “We don’t give you just one title because we all know that generative AI hallucinates,” says Scott Levine, Brightcove’s chief product officer. The AI Content Suite provides a group of titles, tags, and thumbnails, plus short and long descriptions, all of which you can edit by hand. “We have customers like Gaia, Major League Fishing, and STV, who are using the products and [testing them] and giving us feedback. When you get into things like content creators, enterprise, or people who are putting up marketing videos, it becomes a lot more challenging. Another customer, Home Depot, is uploading thousands of videos a month about hammers, nails, saws, and things like that. Adding that level of metadata makes them more powerful and that team more engaged.”
A new offering in Brightcove’s AI Content Suite is AI-Content Multiplier, which converts long-form content into short-form clips, highlight reels, and theme-based chapters and provides conversion from horizontal to vertical.

Horizontal-to-vertical video conversion in Brightcove’s AI-Content Multiplier
Real-World Gen AI in 2025
“There’s a really big, interesting debate in the development community right now about how useful generative AI is,” says JWP Connatix’s LaPalomento. “These tools are just too interesting not to have people experimenting and trying them and seeing if they help them be more productive. But under no circumstances do we want an automated workflow where AI is creating code and it just gets slapped into production,” he cautions. “There’s a human in the loop every time to review and verify, and then once you’ve checked it out and said, ‘Oh yeah, this is what I want,’ then go ahead. And the human is the one who says, ‘We’re going to do this patch, and I take responsibility for this.’”
“The point is to not be gimmicky, and the point is to actually try to find a problem,” says Bitmovin’s Arends. “At the end of the day, it’s all still very new. People are still working out what they want.”
Arends points to the 7th Annual Bitmovin Video Developer Report, where AI ranks second as the “biggest topic in terms of innovation. It also tops the chart for the biggest technical challenge for teams,” he says. “The use cases weren’t very broad; transcription or metadata capturing, for example. Everything else was equal across the board at the lower end.”
When Devoncroft Partners looked at gen AI from a business perspective in its 2024 Big Broadcast Survey, it also garnered second place. “If you lump it all together in our trend index”—answering the broad question of how will gen AI, AI, and machine learning impact your business—“it was number two in 2024,” says Devoncroft Partners’ Stinehour. “In terms of projects budgeted, it was also number two.”
Future State
My interviewees talked about many gen AI use cases and features that are just emerging or not yet in the market, starting with metadata revisions. If you deliver something to a platform that has a requirement for the plot summary to be 180 characters versus 250 characters, you no longer need someone to rewrite it, whereas previously if you’d left it to a machine, it might have just cut it mid-sentence. “Now you can just take any of the GPTs and say, ‘Rewrite that for me into a 180-character version, a 120-character version, or a 350-character version, depending on the target format or what the target endpoint needs,” says SDVI’s Braehler. SDVI reports 90%-plus accuracy for these summaries in the company’s tests.
AI-powered contextual advertising is another work in progress. “During the encoding process, we are basically analyzing the content to create a manifest of contextual information about all of the scenes within the content,” says Bitmovin’s Arends. “And whether that’s sentiment analysis, object detection, linking to an IAB taxonomy, standardized mapping for selling ad space, or ad topics, you would then pass that information to the player, which could infuse that data into the ad request to inform a better ad decision.”
JWP Connatix’s LaPalomento highlights trending content identification as another key application area. “One of our data scientists has been taking the vector embeddings that are used in large language models and applying them to the problem of making better recommendations and categorizing videos,” LaPalomento says. “The LLM system will go through and look at the video and the contextual info—captions, metadata—and categorize them all. Then, we can aggregate your analytics based on those categories. You can find out at a more human level what actually is trending among your users. So that’s kind of cool and maybe that’s” something gen AI can do.
Arends also points to automatic ad placement. “If we already have an understanding of where all of the scenes are, we have an understanding of where ideal ad placement should be. So automatic SCTE marker insertion, these kind of things” are possible with gen AI, he says. “You can start looking at internal ads, and let’s say you’ve got a couple of different cuts from different trailers of other content. You could provide different trailers depending on the sentiment of the main content someone’s watching, if you’ve got a sad cut or an emotional cut versus a bit more of a scary one.”
Gen AI can also act as an advanced error interpreter, according to Arends. “We have a lot of stack traces and network traces when it comes to errors in our analytics product,” he says. “And if you’re seeing common errors come up over a time period, the error interpreter will summarize all of those sessions and find commonalities between them.”
Social posting is another application that’s coming soon to Zype users, according to Chris Bassolino, Backlight Streaming’s SVP of sales engineering, who explains how it works. “We kick off a webhook notification that says, ‘Hey, there’s been a new piece of content that’s been uploaded to the platform.’ Then it goes into an automated workflow that is built outside of the platform, but ties in. It can use other products or product partners to be able to take that long-form piece of content, create multiple social clips from it, and leverage the transcription and all of the metadata to then write all of the content for those different social posts and actually go into the different social platforms and create a post using those new clips and that content to save as a draft so that the marketing team can review it.”
Distinguishing Gen AI Present From Gen AI Future
In many of my conversations, it became very hard to understand what a company was supporting or what it was developing itself. Also equally complicated was what was currently available and what was still in the works. Many people I spoke with would transition from something active to something on the road map so quickly that I still can’t figure out what is actively in the market. The localization and search use cases are compelling, but will they radically change the industry?
When it comes to AI saving media companies money, which is a critical goal for just about everyone, Devoncroft Partners’ Stinehour says, “We can see all their costs, and they’ve been cutting costs for a decade. What’s left? There’s just not a lot of fat to cut in these media companies anymore,” he contends. “For the public companies, we really can see their expenses. Show me where all this fat is now.”
When it comes to new technologies magically bringing about dramatic and immediate cost reductions, Stinehour insists, we’ve been down this road before. “People think this is the new technology that’s going to drive all these efficiencies. Well, what about all these other technologies we had that are supposed to drive all these efficiencies? Where do these efficiencies go?” he asks. “Media management has been getting cheaper for 20, 30 years—unit of storage, unit of delivery, unit of compute—and yet the ultimate process of creating content has gotten more expensive. So, to me, it is more interesting to try to think through how [a new technology like gen AI] could bring some fundamental efficiencies to the creation process.”
Indeed, there’s a good argument to be made that distribution has been optimized, and the question of where gen AI is likely to bring the greatest change and disruption is more of a conversation about production. Maybe the interviewees who talked about production when I asked about distribution were right all along.
Related Articles
Conversation about monetization is so nuanced that after a decade of writing on this topic, I am still looking for meaning where there may be little to find. Essentially, what I've come to understand is that advertising is like the stock market: Certain aspects of advertising deals are not transparent now and may never become fully transparent. What is clear, according to research from Statista, is that 64% of revenue growth in the U.S. in OTT is coming from advertising.
28 Mar 2025
Streaming Media Connect 2025 featured a session on using generative AI in news, sports, and entertainment, moderated by Brian Ring of Ring Digital. The panel included industry veterans Andy Beach, Pete Scott, and Raffi Mamalian, who discussed the transformative potential of AI in media production. Key topics included the ethical use of AI, the challenges of AI dubbing, the future of personalized content, and more.
02 Mar 2025
The evolution of AI extends from preproduction planning to postproduction enhancements, offering tools that augment creativity, efficiency, and precision. It is difficult to ignore all of the buzz about AI these days, but here is what streaming pros can expect from this early stage of AI technology.
27 Sep 2024
This article explores the current state of AI in the streaming encoding, delivery, playback, and monetization ecosystems. By understanding the developments and considering key questions when evaluating AI-powered solutions, streaming professionals can make informed decisions about incorporating AI into their video processing pipelines and prepare for the future of AI-driven video technologies.
29 Jul 2024
The uses for artificial intelligence (AI) and machine learning (ML)—from text identification to contextual advertising to cultural sensitivity—are multiplying. Here's a look at some of the most compelling offerings on the market today.
19 May 2022