The State of Education Video 2024

Is the state of education video in 2024 the quiet before or after the storm? With a pandemic in the rearview mirror, we approach a crossroads where it will be determined whether enterprise-scale video hosting and management services will remain profitable at the prices that schools are willing to carry in the new normal. It’s unlikely that we’ll cross a point of no return this year, but I recommend keeping an eye out for signs that will either allay or amplify concerns about the long-term future of schools having a degree of ownership and control of the video services they rely on.

Meanwhile, convenient captioning workflows in 2024 are now accessible to the masses thanks to the confluence of a hot new company (OpenAI) and an open source project lovingly tended to since 2001.

Business Demands Intensify

Ever since summer 2021, those of us who closely follow the industry for supporting streaming media for schools have had detailed insights into exemplary vendors. That’s when Kaltura successfully completed its IPO and subsequently was required to file documents with the SEC, the most interesting of which are quarterly 10-Q forms and the 10-K annual report. These documents include financial statements and a sober accounting of a public company’s perspective on its business climate.

Similarly, Zoom is required to publish these same documents, as it completed its IPO in 2019. Between the two companies and their SEC filings, we have a reliable view into both the synchronous and asynchronous video sectors from two of the most successful and well-run vendors serving the education video vertical.

State of Synchronous Education Video

Zoom is a little bit unusual for an emerging tech company in that it has turned a profit every year since 2018, the year prior to its IPO, and lately has had quite phenomenal profits. That synchronous video is a more profitable line of business is consistent with the general rule of cloud economics as detailed in The State of Education Video in last year’s Sourcebook:

In the cloud, variable-use resources like CPU, RAM, and bandwidth tend to be very cost-effective, while fixed-use resources like long-term disk storage are more expensive than what you can get with an on-prem investment. In other words, economies of scale work best when you pay for what you’re using to serve customers and handing those resources off to other public cloud tenants at other times and worst when you’re always paying to store data you’ve accumulated and may or may not be using.

Synchronous video requires paying for CPU and bandwidth whenever the service is being used, and not much of anything goes on the cloud bill when it’s not being used. By default, Zoom deletes meetings recorded to its cloud hosting after 180 days, so storage costs have a built-in mechanism to avoid snowballing. In searching for tea leaves to read in last year’s Sourcebook, we alit upon a decreasing rate of revenue growth period-over-period from 2021 to 2022. That trend continued into 2023, but Zoom revenue growth appears to have stabilized at around 3% through the first three quarters of 2023 compared to the same time periods in 2022.

Zoom acquired two companies in 2023: Solvvy and Workvivo. Solvvy adds a mature chatbot offering to the Zoom portfolio, and Workvivo provides an employee experience platform (see Figure 1) that delivers streamlined communication and culture-building tools for business subscribers.

workvivo

Figure 1. Workvivo for Zoom

The closest thing to an academic institution listed on the Workvivo website’s Partners page is the Hoover Institution at Stanford University, so I don’t expect that this acquisition will create immediate value for Zoom’s school customers. However, it may be a step toward closing feature gaps with Microsoft Teams down the road.

There is definite interest in developing custom chatbot applications in higher ed, though. The University of Central Florida (UCF) is a school that I admire for generally staying ahead of the curve with educational technology, and its Knightbot chat service, built in a partnership with engagement platform vendor Mainstay, is a good example of a successful chatbot. Another Mainstay customer, Georgia State University, in partnership with UCF and others, was recently awarded a $7.6 million grant to study whether chatbots can improve student learning outcomes by providing them with essentially a 24/7 AI teaching assistant to which they can ask questions. It will be interesting to see if that research also reveals whether students’ interactions with human faculty and teaching assistants decrease in civility as a result of more interaction with AI assistants in this role.

State of Asynchronous Education Video

Kaltura had its IPO at a somewhat unfortunate time as far as trendline optics go, although it was a good time to raise cash to the tune of $172.5 million. The NASDAQ composite closed at 14,631.95 on the day of the IPO and fell below 12,000 by May 2022 and below 11,000 in June 2022. Kaltura was priced at $10 for the IPO, had a peak closing price of $13.61 on Aug. 6, 2021—incidentally, the day that the last 2.25 million shares were sold at the original price—then fell all the way to $1.78 on March 7, 2022, roughly where the stock price has languished ever since.

That collapse in price made possible an unlikely, unsolicited purchase attempt from one of Kaltura’s top competitors, Panopto, in summer 2022. The purchase was ultimately shot down by Kaltura’s board. Kaltura has spent the past 2 years getting lean on operational costs, shedding 10% of the workforce in 2022 and laying off an additional 11% in 2023.

Layoffs have been widespread across the tech sector for the last several years and continue into 2024. Twitch laid off more than one-third of its employees in January 2024, for a dramatic example.

The effort to trim down has borne fruit in Kaltura’s case: The company’s non-R&D operating expenses fell below gross profits in Q4 2022 and have remained well below since. Period-over-period revenue growth in 2023 was strong, with Kaltura’s subscription income in the category that includes the education vertical increasing by 8.2%, 7%, and 4.8% over the first three quarters compared to 2022, handily beating the trend observed in last year’s Sourcebook.

It’s noteworthy that Kaltura—the biggest provider of educational VOD services that serve half of the R1 universities—has never shown a profit, either quarterly or annually, although again, Zoom is the outlier among emerging tech companies for consistently turning a profit. At some point, though, it would be reassuring to know that the vendors schools rely on for educational video services operate on sustainable business models. Kaltura recognizes this as well, recently recruiting John Doherty from Magic Leap to serve as its new CFO while specifically mentioning profitability as a component to the hire in its announcement.

Last year’s The State of Education Video article included a discussion of what might have happened if the two largest video management system vendors servicing schools had in fact merged and what options schools would have if their post-pandemic ed tech needs shrank and new circumstances warranted a de-escalation in their video management software (VMS) subscriptions. Since video services are tremendously valuable to schools, and school administrators tend to prefer to hire out core services to vendors rather than rely on the loyalty of highly skilled employees to support those critical operations, I’m bullish that the industry will thrive.

If that optimism is misplaced, the University of Toronto’s Opencast project may suggest a new direction. The University of Toronto is a bold and forward-thinking institution with a total enrollment of just fewer than 100,000 students across its three campuses. It successfully built out its Opencast Content Capture System (go2sm.com/occs) to provide campuswide lecture capture, and it remains an excellent solution for schools that are willing to invest in on-prem solutions or to collaborate across institutions to pool resources to that end (see Figure 2).

UToronto Opencast

Figure 2. A schematic of the University of Toronto’s Opencast Content Capture System

Kaltura’s 2024 10-K filing was expected in February. In the section of the filing that discusses risk factors, compliance with privacy regulations is always a major concern. In 2021, China passed the Personal Information Protection Law (PIPL), complicated legislation that includes specific cash ranges that companies can be held liable for if the law is not adhered to. Thus far, PIPL has not been mentioned in Kaltura’s SEC filings (and only obliquely in Zoom’s 2023 10-K), but navigating how this law impacts international educational institutions and the vendors that provide technology services for them is a major question.

I also expect some insightful discussion of new risks posed by modern AI. Generally, Kaltura includes a short paragraph about liability related to hosting content that violates copyright or licenses. It will be interesting to read if deepfake technologies are on Kaltura’s radar, as they present a more costly challenge to assisting institutions with policing take-down requests for offensive, highly personal content.

I’m also curious to see data on how Kaltura’s entrance into synchronous video services has developed, something that has yet to be teased out in any filings thus far. As discussed for Zoom, the economics of cloud resource provisioning for synchronous video are more favorable than those for asynchronous, so the more Kaltura can grow its synchronous service offerings, presumably the better for its bottom line. The company will also need to thread the needle of either more effectively passing on its storage costs to customers without creating dissatisfaction, or, better, providing data-driven tools for assessing what content can be inconsequentially deleted or archived to lower-cost storage by customers to minimize storage costs.

An advisable approach is to adopt the “with great knowledge comes great liability” data retention policy angle, perhaps in concert with efforts to comply most effortlessly with PIPL, the General Data Protection Regulation, and U.S. privacy laws. Another appealing justification to conscientiously manage the accumulation of recorded video data is streaming green. Unnecessary video storage bloats electricity usage and contributes carbon released into the atmosphere.

Accessibility Now Easily Accessible

In last year’s “The State of Education Video,” I threw some cold water on the hype over ChatGPT based on the performance of GPT-3. GPT-4 was released right around the Sourcebook’s publication, and that skepticism was no longer warranted given GPT-4’s superior performance. GPT-4 has been shown to do well on standardized tests, Advanced Placement exams, and trade exams, making it a major factor in how teachers assess student performance.

The best advice I’ve seen for how to AI-proof your tests and assignments, loosely adapted from optics research scientist and AI researcher Janelle Shane (aiweirdness.com), is to give questions that students can answer but that a pre-trained transformer can’t by making the questions very local to the student doing the assignment, either in space or time. The transformer’s training data is many months stale from the public internet, so it wouldn’t be able to answer questions about very recent events and wouldn’t have access to what’s on specific pages of your class textbook or your course website (except as given by a student prompt).

Over the past year, many teachers have leaned into the transformer revolution and have tried to incorporate AI into their instruction. Perhaps the most intriguing use of AI text generation is for seeding inspiration. Here, the assignment would be to have your text generator produce several essays on various topics, choose whichever one you most want to rewrite, and produce an original essay of your own based on the prompt. This strikes me as a generalization of Cunningham’s Law, which can be stated as, “The best way to motivate experts to provide you with a correct answer is to invite their contempt by posting the wrong one on the public internet.” It rings true that for whatever reason, it’s easier and somehow more satisfying to put creative energy into disagreeing with someone than agreeing with them. A compelling writing assignment would be to have students rewrite two AI-generated essays—one that they agreed with and one that they disagreed with—and subjectively rate the experience. As a class, they would then reflect on why this is so (assuming that it does indeed prove to be the class’ experience).

Embarrassing underestimations aside of how quickly large language model (LLM)-driven transformers would post substantial challenges to more sophisticated assessments than short-answer quizzes, a main point of last year’s article holds up well: Whisper, OpenAI’s open source speech-to-text engine, would be a huge benefit for education in 2023. In 2024, Whisper and Whisper-powered tools are easy to use, even for technology-challenged teachers and students who need to have their videos captioned without spending a huge amount of time on the process.

The quality of automatic captioning offered by vendors has improved dramatically in the past 5 years with the rise of attention-based transformers and LLMs. Whisper being freely available since September 2022 upgraded the state of the art in how educators can produce closed captions for their educational video. Whisper is able to generate astonishingly accurate transcriptions in multiple languages. For example, I supported a research project by generating automatic transcripts of interviews in Ukrainian, Russian, English, and Czech with people fleeing the war in Ukraine and those providing aid to them. This technology dramatically improved the researchers’ procedure (correcting a transcript is a much faster process than writing one from scratch) and did not send the highly sensitive data anywhere untrusted. That Whisper adds on the ability to automatically translate from language to language as part of the speech-to-text process is almost unimaginable, but it works pretty well.

Whisper is not perfect, though, and has two major problems. The first is that it produces segments that are far, far too long; often three or four lines of captions fill the width of the player. The second is that Whisper is prone to hallucinate, as are all transformers, since they’re built to predict words and send them to output even when the input is very sparse or nonexistent from a human language user’s perspective. Typically, a hallucination happens after or during stretches of silence or a non-speech signal like music, producing unrelated text or often just a sequence of periods for the remainder of the run.

WhisperX is a project that’s being undertaken to address both of these problems head-on (github .com/m-bain/whisperX). WhisperX (see Figure 3) pre-processes the audio to be transcribed by detecting speech signals and cutting out all other non-speech audio intervals so that Whisper doesn’t have an excuse to hallucinate. After generating a transcript of this edited audio, it performs forced alignment against the original audio using Meta’s Wave2vec toolkit to time code and segment the transcript into a caption file. This is a quite brilliant solution, although it jettisons Whisper’s translation capability, and WhisperX’s segmentation is also often far too long.

OpenAI WhisperX pipeline
Figure 3. The WhisperX pipeline as diagrammed on the project’s GitHub readme

However, hallucination is generally not a problem in instructional videos, where there are almost never extended periods of silence or non-speech sound. In fact, I had used Whisper for several months without ever seeing the phenomenon myself until we started throwing commencement ceremony recordings at it that included lengthy processionals. Thus, for a teacher, the only concern with using Whisper is getting it installed and being able to re-segment and easily correct the captions it produces.

To address Whisper’s challenges, Subtitle Edit is an excellent and free tool. Although I started using it only recently, it has been in development since 2001. The source code was version-controlled on GitHub for just over a decade and was

at that time primarily a souped-up version of SubRip, the DVD subtitle picture OCR program that invented the SRT filetype. Development on Subtitle Edit (see Figure 4), though, focused on ergonomics instead of OCR, deferring the job of recognizing the text in DVD subtitles to the Tesseract OCR engine, originally written at HP and later adopted as an open source project by Google. Subtitle Edit was a fascinating program all along; by 2011, it had advanced features like real-time text-based chat so that multiple editors could collaborate on a DVD localization project and a fast Fourier transform (FFT) calculator to show a real-time spectrogram to assist experts with identifying ambiguous speech. As of 2014, it could export to 201 different caption formats. With the 3.6.8 release on Oct. 24, 2022, Subtitle Edit began experimenting with using Whisper to auto-generate captions for any video to be presented in its 2-decades-in-the-making caption correction user interface; this occurred about 1 month after Whisper was open sourced. The program makes downloading and installing Whisper and its pre-trained models a breeze. The default option for the Whisper version is a standalone executable wrapper of Faster-Whisper, the same variant of the engine used by WhisperX. Another easy option, CPP, a C++ port of Whisper by the brilliant and extraordinarily productive Georgi Gerganov, has some very useful extra features like live captioning from a microphone and more compact models.

Figure 4. Subtitle Edit about to download the Medium.en pretrained Whisper model

If you need to caption video that would be prone to hallucination, WhisperX is an option, but it would require a nonstandard installation procedure bypassing the Conda virtual environment steps. The original Whisper engine significantly benefits from inference on a GPU with at least 12GB of VRAM when using a large model, but both Faster-Whisper and Whisper CPP perform well on any modern computer.

Subtitle Edit will re-segment the transcript into timed text using default settings (see Figure 5) that are close enough to the Netflix text style guide, which has become the industry standard after the National Association of the Deaf persuaded the company to become an effective ally of accessibility in the streaming entertainment industry.

Figure 5. The Subtitle Edit settings menu

With more than a year of development since Whisper was incorporated into the Subtitle Edit project, it’s an easy-to-use way to get started with this extremely advanced speech-to-text engine and one that I wholeheartedly recommend to teachers and students.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

SMNYC 2024: LPG edu Founder and Netflix Global Ed Alum Lori Greene Talks Career Pivots and Finding Your Purpose

Streaming Media's Tim Siglin interviews LPG edu founder Lori Greene at Streaming Media NYC 2024. Greene formerly ran global education at Netflix. She discusses the challenges of teaching online and the importance of engaging students. She also shares her insights on career development, emphasizing the importance of understanding one's purpose, personal branding, networking, and being a positive force.

04 Jun 2024

WCAG 2.2, Web Content Accessibility Standards, and You

Instead of a patchwork of accessibility standards for California, Illinois, Europe, and everywhere else, the current standard is set by a broad cross-section of experts from the industry and published by the World Wide Web Consortium (W3C) as the Web Content Accessibility Guidelines (WCAG).

17 Nov 2023

The State of Education Video

Video's role in schools is taken for granted entering 2023, although we should expect to see changes, potentially disruptive, in the educational video market as schools continue to adapt to the aftermath of the COVID-19 state of emergency phase. Despite the widely held belief that video is essential to school operations, expect to see schools roll back their investments in video services, while educators seek out ways to go beyond the basics of video delivery, finding better ways to engage students both with synchronous and asynchronous video.

12 Apr 2023

Lessons Learned: What the Pandemic Taught Us About Remote Teaching

Although failing to enter the popular lexicon as of yet, the term "emergency remote teaching" (ERT) is intended to avoid conflating what we'd now call "traditional" online education with the improvised adaptation of face-to-face lesson plans and classroom experiences to the synchronous videoconferencing platform available to any given school during the COVID-19 pandemic.

19 Dec 2022

The State of Education Video 2022

Now that students have returned to the classroom, schools and universities face an existential dilemma about the role video will play going forward.

27 Mar 2022

The State of Education Video 2024

Business Demands Intensify

State of Synchronous Education Video

State of Asynchronous Education Video

Accessibility Now Easily Accessible

SMNYC 2024: LPG edu Founder and Netflix Global Ed Alum Lori Greene Talks Career Pivots and Finding Your Purpose

WCAG 2.2, Web Content Accessibility Standards, and You

The State of Education Video

Lessons Learned: What the Pandemic Taught Us About Remote Teaching

The State of Education Video 2022

Best Practices: Analyzing Your Video Analytics

Best Practices: The Future of Content Delivery

More

Optimising Content Delivery for Impact and Efficiency - Europe-friendly timing

Sports Streaming Tech Breakthroughs

More Web Events

The State of Generative AI 2025

The State of OTT and CTV Monetization 2025

Streamticker: The Biggest Streaming Mergers and Acquisitions of 2024

Streaming Viewers Are Willing to Accept Ads, But Not If They're Paying for it, Tubi Finds