Repurposing TV for Streaming
By now it should be perfectly clear to anyone paying attention in the media industry that digital television (DTV) is not just about television anymore. In addition to the digital distribution of programming and the advent of non-linear file-based production, every network and local station has one or more web sites that feature program content. TV content is also repurposed and distributed to cell phones and handheld devices. And mobile TV is in the early stages of deployment in the U.S.
Repurposing TV content for new media distribution channels and consumer devices creates production and distribution problems not encountered when programming is distributed only over traditional television transmission channels. Content distributed to PCs and handheld devices requires different creative and assembly techniques, both technical and aesthetic.
The good news is that content can be produced and repurposed for a three-screen universe (TV, PC, and handheld) without having to use completely independent workflows. A converged workflow consisting of parallel production and conversion processes will create audio, video, and graphics elements only once and then use automated processes to convert them to the proper formats for each delivery channel.
In order to understand how to efficiently repurpose audio, video and graphics produced for TV over a streaming channel, a basic understanding of a television facility workflow and air chain is necessary.
TV Workflow and Air-Chain
Even though digital technology has exponentially increased the complexity of television broadcast operations, the big picture view of a facility infrastructure looks pretty much like it always has. Figure 1 illustrates the flow of content through production and assembly.
Figure 1. Functional block diagram of a broadcast facility
Programming enters a television facility from a remote venue, originates from a studio, or is played back from a storage device. Mixing of audio, video, and graphics elements into a program happens in a Program Control Room (PCR). For a remote event, the PCR is a broadcast truck or a control room at the event site. Studio shows are produced in sophisticated PCRs located in the broadcast operations center, while preproduced long-form pieces are "good to go" and bypass a PCR when taken to air.
Editing and graphics production are non-real-time processes. Editing produces finished programs, segments, or clips that are ready for air. In the past, these were laid to tape on a VTR, but in a modern facility, most preproduced content is stored on playout servers and taken directly to air.
Graphics elements include sophisticated animation sequences and complicated, template-based lower thirds composited with video by the production switcher in the PCR. Air-time assembly of graphics increasingly utilizes preproduced templates that are populated by data, JPEGs, and other elements in real time during a broadcast.
Insertion of station and network logos, rating "bugs," and closed captions—as well as network-distributed commercials and promos—occurs in the originating facility’s Master Control Room (MCR). Programming distributed to local affiliates and retransmitted by cable, satellite, and telco headends includes signaling cues inserted in the outgoing program stream for automated, downstream regional and local commercial insertion.
After Master Control, the audio and video is compressed. The compression process is also referred to as encoding or sometime formatting. It takes a compression ratio of over 50:1 to squeeze raw 1Gbps HD content into a 14Mbps video stream. Audio is no bargain, either. Although raw PCM (pulse coded modulation) audio is less than 1Mbps per channel, for 5.1 surround sound, even with the limited frequency response of the LFE (low frequency effects) channel, the baseband audio data rate will be over 5Mbps. The transmitted compressed audio bitrate will usually be 384Kbps for surround sound.
The compression encoding process converts baseband digital audio and video into Elementary Streams (ES) and then Packetized Elementary Streams (PES) of up to 65,535 bytes in length. During the process, any existing temporal relationship between the audio and video that existed in the source signals is lost. Audio and video packets are marked with a presentation time stamp (PTS) for use by the DTV receiver. Then the PES packets are further divided into 184-byte payloads. An MPEG-2 transport stream packet adds a 4-byte header to the payload to create a 188-byte packet.
Assembly instructions, along with a digital reference system time clock, are required for audio and video decoding and presentation at the receiver and are multiplexed into the transport stream. The program assembly technique used in the U.S. has evolved past the fundamental method specified by MPEG standards. MPEG-based systems such as Europe’s DVB use Program Specific Information (PSI); while in the US, the ATSC DTV transmission specification includes the MPEG PSI tables but also adds ATSC A65 Program Specific Information Protocol (PSIP). PSIP information enables locating and decoding the appropriate audio and video packets for a particular program and also facilitates Electronic Program Guides, presentation of closed captions and program rating based automated "V-Chip" receiver control.