-->
Save your FREE seat for Streaming Media Connect in February. Register Now!

Cloud Control: How to Meet the Challenges of Cloud Migration and Operation

Article Featured Image

COVID accelerated the push to IP-based streaming workflows for many companies, proving that live streaming at scale via the cloud was possible, although not always in tip-top form. Live streaming has been something we’ve been covering since this publication started, but here we are looking at content that has traditionally been delivered via production trucks, satellites, and a lot of on-prem resources.

Analyst firm Devoncroft, in its 2024 annual ranking of the media industry trends that are considered the most commercially important by global media technology buyers and users, has IP networking and content delivery at the top.

Questions that everyone on this path has had to ask include the following:

  • What element of my workflow or architecture should I transition to IP first (moving master control to the cloud, production, graphics, playout, recording, clipping, storage, or metadata)?
  • Which cloud provider should I choose?
  • Is multi-cloud a better approach?
  • Should I go open source?
  • What are the best/most important metrics to use?

Going to IP

In a November 2024 Streaming Media Connect panel titled Cloudsplitting: Tackling Common Cloud Live Broadcast Workflow Challenges, Loke Dupont, a staff engineer at TV 2 Denmark, described his operation’s cloud migration pattern as “a melding of some of the traditional streaming stuff and some of the broadcasting technology. Broadcast technology tended to be very different from streaming,” he explained. “It was all very specialized hardware [for] delivery of signals over fiber or satellite. Today, we’re seeing more and more delivery using SRT and other stuff that would be more at home in the traditional internet world of streaming. You’re using the same encoders; you’re doing the same sort of handling of streams.”

For broadcast networks like TV 2 Denmark, dependence on the “reliability of systems” brings different issues into play when transitioning to an IP-based architecture. “That becomes a bit of a challenge moving from traditional SDI infrastructure to IP-based infrastructure, but it adds a lot of flexibility for routing signals internally,” said Dupont.

This transition from satellite to cloud infrastructure has been captured in the new multi-part SMPTE ST 2110 suite of standards. The suite addresses synchronization of streams over IP and many of the aspects of delivery that need to be done via the cloud using one common internet protocol.

“The SMPTE ST 2110 standards suites specifies the carriage, synchronization, and description of separate elementary essence streams over IP for real-time production, playout, and other professional media applications,” according to SMPTE. “Each stream is individually timed by the ST 2110 system and can take different routes over the networked fabric to arrive via unicast or multicast at one or more receivers. The audio-video-data synchronization using PTP clocks ensures ... the accurate synchronization of all streams regardless of how the packets were routed.” SMPTE runs a training course on this for broadcast engineers.

In a practical sense, any streaming operation migrating to cloud architecture will need to assemble a number of components and work with a number of vendors performing in unison to replicate on-prem equipment in a SaaS-based cloud workflow. “It’s a lot of integration with partners,” said John Barber, principal solutions architect for AWS Sports, during the Streaming Media Connect panel. “You can put 14 partners in a room and have a 50-50 shot at the end of the day that something will work, which a year ago, [was] not so much. We had a lot of lift and shift [where] people took sketchy software that ran on very large CPUs and memory systems and jammed them into the cloud with less than desirable results at times,” Barber recalled. “Unfortunately, what we’re finding is a lot of these first releases of these cloud-native services are not feature-equivalent with what they’ve been used to on-prem, and there’s a little bit of a burn on that as far as getting them up-to-speed.”

Team Cloud

One recent and very large cloud streaming success story was the 2024 Paris Olympics. It delivered cloud-based broadcasting with the Olympic Broadcasting Services (OBS) Live Cloud from Alibaba. For the first time, OBS became the main method of remote distribution to broadcasters. It transmitted a total of 379 video and 100 audio feeds, serving 54 broadcasters that have replaced satellite with cloud. Live coverage of approximately 11,000 hours of broadcast content in UHD HDR was captured, processed, and distributed, with 3,800 of those hours comprising athletic competition.

alibaba cloud's cloud memento at paris 2024 olympics
Alibaba Cloud’s Cloud Memento at the 2024 Paris Olympics

“For the Paris Olympics, one of the really cool things that we saw was, this is an event that reached all over the globe—over 4 billion [viewers]—and over two-thirds of the actual feeds were broadcast using cloud technology,” said Nishant Agrawal, industry solutions architect for Alibaba Cloud, during the panel. This new cloud delivery became a canvas for experimentation. It enabled the use of AI for multi-camera, freeze-frame, slow-motion replays for gymnastics, basketball, tennis, and more.

Alibaba Cloud worked with Olympic Broadcasting Services to provide multi-camera replay systems across 14 venues at the Paris 2024 Olympic Games

Alibaba Cloud worked with Olympic Broadcasting Services to provide multi-camera replay sys-tems across 14 venues at the Paris 2024 Olympic Games

The same capabilities that the Olympics streams leveraged are also now available for smaller events. “The results can be very unique and valuable contributions to the business. I’ve been working with some customers in sports to pull [new capabilities] together,” said Barber. “The fan engagement, the sponsor engagement.

Alibaba Cloud worked with Olympic Broadcasting Services to provide multi-camera replay systems across 14 venues at the 2024 Paris Olympics, the venue engagement, the parking services, the ticketing, the video, the data coming off the players, the data coming off the field, the data going into video cameras—[all of these] would be very difficult for a tier-three sport or a tier-two sport to even begin to think about on-prem or any other way [than cloud].”

Cloudsplitting: Advantages and Challenges of The Multi-Cloud Approach

Supporting more and more broadcasters and their content push into the cloud means covering more territory, especially for events like the Olympics. The assembled cloud architecture may start to look like a Venn diagram of overlapping cloud services. The answer to this is a multi-cloud environment, building networking infrastructure capacity across multiple networks, or “cloudsplitting,” when it comes to reaching everywhere that content is needed.

“There are a lot of vendors out there that are great in one cloud, but when you try to tie all three of those clouds together or even two of them, that’s when things start to fall down because the diversity of that particular vendor is not multi-cloud-aware,” said panelist Corey Smith, Deputy General Manager, Media-Enabled Services, TATA Communications. “They’re not cloud-diverse in their actual application environment. So now you start breaking things down into more silos. It’s very hard to diversify your cloud infrastructure because the vendor layer is preventing some of that.”

If you’re partnering with vendors that are not compatible with the same workflow, you’ll find yourself supporting multiple tech stacks, which becomes more complicated and more expensive. “So do you staff a team of 100 developers to support your tech stacks in these diverse environments to the end of time, or do you buy 90% off the shelf and hire the 10% to tie it all together?” asked Smith.

Ultimately, it comes down to a “build versus buy” dilemma, according to Nagrawal, and an open-source approach becomes more appealing. “What I think is important is the understanding of the role of open source. When we think about different trends throughout the years, when we look at Kubernetes for workload orchestration and look at SRT, I think a lot of customers would choose to work with open source technologies. It gives them more freedom. My customers want to choose a stack which is either open source or a stack that they can understand and work with so that they’re not tied to a single provider,” said Nagrawal.

The OpEx Connection

One of cloud’s innate advantages is the diminished CapEx of the architecture versus a predominantly or entirely on-prem approach, particularly when it comes to scalability. But managing, containing, and predicting OpEx costs remain key concerns for those contemplating or implementing cloud migration.

“The C-level management that I’ve had the chance to talk to want predictability. They don’t like unpredictable finance models,” said Barber. “The biggest challenge is that costs are variable. If you’re running the same thing all the time, that’s one thing, but for spiky events, there’s more variability.”

“For most of the mainstream broadcast guys out there, it’s really about getting their head wrapped around, ‘Why do I need to do cloud anymore?’ The broadcast model has gone back to where it was prior to the pandemic, where everybody’s very comfortable with trucks,” said Smith. “From a cloud perspective, people are still afraid of the costs for whatever reason.”

This can be a healthy fear because spiky traffic can cause spikes in budget. If you forget to turn off cloud services and leave the meter running when you’re not broadcasting, costs will jump and can quickly spiral out of control. The CapEx versus OpEx debate is a common one these days with all manner of cloud applications. As many more areas in technology move into a SaaS or subscription model, understanding how operational costs work is absolutely necessary, but they’re also manageable once you get the hang of it. “The bigger challenge is actually getting FinOps to convert their mindset from CapEx to OpEx and understand why they can’t write this stuff off every 3 years or every 5 years,” said Smith.

“Cloud distribution has a challenge with variable cost around live IP broadcasting over consumption-based egress costing models,” said panelist Daniel Healey, senior technology partner manager at Console Connect. “Every cloud service provider has some sort of gigabyte egress costing model on their infrastructure, on their compute, etc. If you find the right virtual private server company, you can actually get it as a fixed-rate cost model, and that can make your cost modeling easier to predict.”

But predicting costs with 100% accuracy remains elusive, Healey said, “because you can get a cost-per-gigabyte outbound from the cloud service provider, but you can’t predict how many packets are going to be discarded because of jitter, latency, or congestion and need to be reshipped,” he explained. “So you may have to put in a guess variable of about 30%.”

Resending packets isn’t the only variable, and on-prem operations also come with a certain level of unpredictability. According to Dupont, on-prem is “also sometimes less predictable because [you don’t know] when you are going to need to replace that piece of hardware. If it’s in a cloud model, you have a predictable cost year over year that’s going to be about the same. Most top providers don’t suddenly raise their prices. If anything, they go down a little bit, or they stay the same. Whereas for the hardware, you don’t know if you’re going to get budget to replace that in 3 years or 5 years or 6 years.”

“In cloud, if you have a piece of machinery that’s acting up, you move it to another host,” said Smith. “On-prem, if you have a matrix card that dies in the box or a hard drive that goes sideways, that box is dead until you find a replacement.” Sometimes, he added, “you can’t even find a replacement.”

Looking for the Special Sauce

“Let’s say you’re sitting in a media center in Boston, L.A., Texas, London, Paris, Hong Kong, Tokyo, or anywhere else on the globe,” said Healey. In the on-prem era, “distance, latency, and buffering would kill remote production. Now, thanks to cloud, you can send a feed to any media center. Not only can you do editing, graphics, and captioning, you can also do localized ad insertion.

“You can also take the feed from its original contribution position and target ad insertion for different locations,” Healey continued. “Let’s say I’ve got a cricket tournament, and they’re playing in Pakistan. I could take that live signal and send it domestically to a cloud service. That’s doing production for domestic turnaround—a low-latency scenario and a push. Then I could send it to Melbourne, Australia, to a cloud instance there to do the ad insertion [for the Australian market].”

The above scenario can be architected any number of ways, and different cloud experts have their favorite approaches. Console Connect went about this by developing its own protocol to compete with SRT or Zixi. “We created our own proprietary protocol and then built AI command control on it for optimized routes [via public and private networks], but also the ability for it to be cost-modeled,” said Healey.

Healey promised that customers can watch content delivery switch in real time (50 milliseconds switching or less) between the different paths in a mesh infrastructure. “When you’re able to predictively understand the quality of service in a mesh network as it’s going into congestion order, as it’s showing latency, and be able to move before the path becomes a challenge instead of when you start seeing your own packet loss, that gives you a very strong capability to provide an SLA [service-level agreement].”

Whenever a company discusses its special sauce, a number of questions float to the top. “How good is your mousetrap compared to the people that preceded you?” asked Smith.

Another part of the equation is transport in and out. “How I get stuff into it and out of it would be a good indication,” said Dupont. “ ‘Do I need specialized encoders just for this, or is it integrated into other encoders?’ would be a question I would ask. Since we’re talking about at least some propriety technology, I would definitely have some questions about how to be able to observe. If it’s not SRT, but something custom-made that might be better, I would want to be able to figure out how to get some metrics out of that system.”

Integration options for APIs is another key question, Dupont said. “One of the issues in traditional broadcast is you tend to build stuff where you exchange a whole project of every-thing at once. My mindset is buying more in-dividual components, and it sounds like this would be a good case for that, but that means that I would care more about the integration points of the technology I’m buying. If I ex-change some other system down the line, would it still be integrated and working well?”

Real-time performance metrics is also a critical issue, Dupont added, especially with more closed systems. “I would like to be able to see how it’s performing and be able to monitor that as well and just react to issues myself. It’s great if a vendor can react to them as well, but I would also like to know if there’s something going on with the stuff I’m sending.”

Troubleshooting QOS

One contact told me a scenario a few years ago about running an application that could simultaneously record 15 real-time live esports streams between 10Mbps and 30Mbps but would choke at 16. There was only so much input, output, and throughput enabled by default. The newly native application was supposed to support recording 44 NDI feeds of up to 150Mbps with a resolution of 1920x1080, 60fps.

Troubleshooting between the application provider and the cloud platform showed that separating the read/write operations and also writing the streams across two disks fixed things. It took a joint effort between the platform and application developer to troubleshoot the problem.

TelevisaUnivision is another broadcaster that has migrated to the cloud, moving its international broadcast operations from traditional satellite networks to an all-IP workflow. It distributes and produces content to 60-plus channels in Mexico and the U.S., as well as ViX, its streaming service. TelevisaUnivision’s architecture uses a dual redundant IP: “[TelevisaUnivision is] transmitting content via SRT to two distinct cloud availability zones, complemented by a secondary on-premise site for geographical diversity, mirroring the primary site’s functionality,” according to a case study produced with AlvaLinks.

Each availability zone’s transcode gets distributed to four strategic cloud entry points for content preparation and delivery across the U.S. Each network path was architected for robust switching and redundancy capabilities to handle potential router failures and maintenance requirements.
Initial testing by TelevisaUnivision’s engineering team determined that there were video stability issues and that redundancy strategies didn’t meet expectations. TelevisaUnivision’s remediation efforts included increasing packet size to send fewer, larger packets; setting ACL rules; doing a DNS update; and setting QoS for each stream. After this, a network evaluation still showed the following:

  • The SRT transcoder intermittently stopped for several minutes.
  • There was a 2%–4% packet loss.
  • Frame errors were detected without corresponding network errors in traditional monitoring tools.

Through monitoring, TelevisaUnivision was able to identify the root cause of these issues and then find a solution that included reconfiguring equipment for contribution feed and optimizing traffic flow.

YouTube Generation

Traditional broadcast engineers have been doing things the same way for 50 or 60 years, and many streaming engineers have come from that background. But others have never worked in that mode and don’t regard migration to IP or cloud workflows as any sort of break with tradition.
“There’s a younger generation that has gravitated to the technology of cloud because they’ve grown up with YouTube, Twitch, and other forms of technology at their fingertips,” said Smith. “Going from a mechanical world into a software world takes a little leap of faith that this stuff’s going to work, because the old-school mentality is like, ‘That’s software, some kid wrote it, it’s never going to work. Just give me that coax.’ ”

According to Smith, one way to ease the transition is to be aware that it doesn’t have to represent a wholesale change, and some elements of the old architecture may well be retained. “I think that there’s a lot of nuance to moving workloads to cloud where it doesn’t have to be all or nothing—this hybrid idea where maybe I move graphics, replay, or master control to cloud, but I keep some of the other core pieces that make television production what it is from my sound stage perspective,” Smith said. “That same master control switcher [in the cloud] can actually provide your same broadcast feed to your cable network for linear or your OTT channel that goes out to Roku or YouTube or wherever.”

Going Forward

Lower latency than satellite; better user experience, workflow, and business innovation; higher resolution; more flexibility without on-prem hardware commitment; and a changing cost structure are only the beginning.

“Most of the new development in the media space is now cloud-native, not on-prem-native, not hardware-native,” said Barber. “We’re seeing [companies] be able to run multi-cloud. A lot of our ISVs actually have built their software that’ll run multi-cloud, or you can pick the cloud from a business viewpoint. They didn’t want to lose any business or carrying their current customers forward, but there’s definitely been the shift now that a lot of the new technologies are cloud-native.”

“When you are using cloud, you have a lot more observability in understanding the metrics of all the streams,” said Agrawal. “And when you’re working with all the media rightsholders, you are able to really measure, map, and see, per click, and really create new pricing models. And that helps you be more efficient and just serve better models.”

“For us to really realize the power of the cloud and not treat it like a science experiment, we have to go about the way we sell cloud differently,” argued Smith. “We have to take the things around cloud that people are mystified about—scale, performance, pricing—and kind of pull the rug out from underneath it and say, ‘You know what? There’s quite a bit of stuff we can do in cloud today.’ ”

And tomorrow, based on the rate of change we have seen in the last few years, there should be a whole lot more.

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

What Drives Up Cloud Streaming Costs?

What are the predictable and unpredictable factors that drive up the cost of cloud streaming workflows as complications arise and projects scale? AWS' John Barber, IMAX's Abdul Rehman, Zixi's Eric Bolten, and SVTA's Jason Thibeault offer practical tips for keeping cloud costs in check in this clip from Streaming Media Connect 2024.

How to Control Cloud Streaming CapEx and OpEx

What are the best strategies streamers can follow to manage the costs of cloud streaming architecture and production and maximize efficiencies in their workflows? AWS' John Barber, IMAX's Abdul Rehman, Zixi's Eric Bolten, and SVTA's Jason Thibeault discuss cloud streaming cost management in this clip from Streaming Media Connect 2024.

Michael Cerda, CPO for Streaming at TelevisaUnivision, Talks Building and Growing the World’s Largest Spanish-Language Streaming Service

Nadine Krefetz interviews Michael Cerda, CPO for streaming at TelevisaUnivision. He discusses the from-the-ground-up rebuild he implemented when he arrived at the company. He describes how—in just 9 months—his team assembled and scaled the tech stack dri­ving ViX, the world's largest Spanish-language streaming service, which haslive-streaming, AVOD, SVOD, and FAST offerings.