Tutorial: Speech-to-Text in Adobe Premiere Pro

Streaming Learning Center's Jan Ozer explores the dazzling new Speech-to-Text feature in Adobe Premiere Pro, a powerful new tool that makes it remarkably quick and painless to create accurate transcriptions and burned-in or exported captions for your videos.

By Jan Ozer
Posted on July 24, 2021

Page 1

I've been using Adobe Premiere Pro probably for over 20 years, and I don't think there's ever been a feature I've been more excited about than the speech-to-text transcription function they've just introduced in the most recent upgrade. It's fast, it's accurate, it's free, and it's incredibly easy to use.

In this tutorial, I'll show you how to use it and how it's going to transform any video that can benefit from captioning--which is to say, almost any video, period. In the tutorial video above, you can see Premiere Pro's captioning in action.

Let's start by taking a quick look at how efficient it is to create captions. I'm working with a lesson from my latest course on live streaming. It's about three minutes and 21 seconds long. In Premiere Pro, I'll drag it into the timeline, and click Captions--making sure that text already selected--and click Transcribe. I'm also going to select all the defaults.

The steps that are going to occur behind the scenes are as follows: Adobe is going to create the audio file, upload that to the cloud, and then transcribe in the cloud. We're going to get back a transcript to which we can make any corrections that are necessary. And then we're going to convert that transcript into individual captions that are presented in the Captions tab in the Source Monitor and in the timeline. The transcription process took under a minute and a half for three minutes and 21 seconds of video. So it was very, very quick.

And if I wanted to, at this point, I could play the video in the Program Monitor, and review the transcript for errors, and make changes as I see them while the video is playing. (I can also make changes after I tell Premiere Pro to create the captions.).

After you've corrected the transcript, you click Create Captions again, and accept all the defaults. And now Premiere Pro takes the transcript and creates individual caption entries in the timeline. And as they say, boom, you're done. At this point, you would click File > Export Media to export your fully captioned file with the captions burnt into the video or as an SCC sidecar file.

I don't know if you've ever done this process yourself, but to manually caption a three-minute file like this would probably take a half hour or longer just to get the text transcribed and also to get these individual entries, both created and aligned. It's very time consuming, very frustrating. And this new feature just totally does away with that.

Below, you can see the captioned 3-minute, 21-second file. Note that these I've left these captions uncorrected, in order to demonstrate Premiere Pro's first-try text-to-speech accuracy.

Now, let's look at this feature in a little bit more detail with a different project. The other video I tested this feature on is an interview I performed over a GoToWebinar with Michelle Fore and Tim Siglin. This is an interesting use case because we know audio is degraded somewhat by Zoom and GoToWebinars. So we're going to check the accuracy, given that degradation. And we're also going to explore options like moving captions under one or the other of the speakers.

To begin, I'll create the sequence, click Captions, and set the In Point at 00:00:00, and set the Out Point at 05:00:00, five minutes in. We're going to create the captions for In to Out Point. Then I'll go to Text > Transcribe Sequence, and then take a look at the different features available in the Create Transcript dialog.

If you've got all the audio clips with dialogue in a dialogue track, you can select this. We don't have that here. If you've got all the dialogue and a single track, you can select Audio Clips Tagged as Dialogue. Or you can just choose the mix, as I did with this clip. You can also choose "Transcribe In Point to Out Point only," which is what we're doing with this example. You can also choode "Merge output with an existing transcription," if there is one. We don't have that here. Apparently, you have to opt in to identify speakers in the transcription, which is not available in Illinois for some strange reason. Once you select this once, you'll never see it again, so no big deal, even in Illinois.

Now, Adobe is going to create the audio file, upload that to the cloud, and then come back to us with a transcription. For this five-minute segment, it took well under two minutes to get to our transcription, which again appears in the Source Monitor. Again, if I want to make any changes, I can do that as I did before.

Next, let's look at a couple of interesting export options. I can export the transcript, which produces a transcript file that you can use to save all the details of the transcription to import into another Adobe Premiere Pro project. I'm not quite sure why you want to do that, but you can. You can also export a Text file. Before you do that, if you haven't edited the transcript at all, you'll want to change the speaker names, which by default are Speaker 1, Speaker 2, Speaker 3, etc. If I wanted to edit this, I could I'm speaker number one, Tim is speaker number two, and Michelle is speaker number three. Now that appears in the Transcript tab, but it doesn't appear if we export the transcript as Text.

If we export as Text, Premiere Pro creates the uncorrected file you see (in part) in Figure 1, below. You'll see the transcription, but not the identification of speakers. So I'm not really sure where the speaker info shows other than the transcript file that you can import back into Premiere Pro. So it didn't appear to show up in any captions or any of the exported files. Again, I can play through the video file, and correct the transcript as needed. Or I can do that after I create the captions.

Click the image to see it at full size

In the Create Captions dialog, you can create from the transcript, or create a blank track if you want to insert them manually. I'm not sure why you'd want to do that. In most cases, you're just going to use the Subtitle Default preset. If you're exporting 608 or 708, you can choose them in the preset pull-down. But for burned-in captions, Subtitle Default is what you want to use, or SRT sidecars as well. You also have a pull-down with additional formatting options. You can also create a preset from your preferred settings, and once that is created, you can choose it from the Caption Preset pull-down and implement that in other transcription projects.

You can also set maximum length and characters for each line of captions. In this particular case, 42 is a good length. I can stick it under speaker 1 or speaker 2 without any issue. But if you're putting captions in a smaller window--say, if you've got three speakers in a Zoom conference-you may want to make that number smaller. I've done that type of work before where I needed it. You can also set minimum duration in seconds or set the gap between captions and choose between one- and two-lined. I've always gone with two lines. Once again, Premiere Pro is going to create the captions, create the little inserts, and align them as necessary.

There are a couple of things you can do on the edit side. For example, if we see that Tim is speaking at a given moment, you can't see his name. If we wanted to move the caption horizontally over to his half of the screen, we can do that very easily. If you want a multiple-select, holding down the Shift key, you can select multiple lines of text and do that as well. That's a nice convenience. If you want to go through the aggravation of placing all of the captions under the individual speakers--and I'm not quite sure that's necessary in a case like this--it is doable. In addition, if you wanted to change the font to, say, Arial, you can do that well, and push that out to all of the captions if that's your preference. You can also apply that change in Track Style as mentioned earlier, or create a new style, save it, and then choose that style the next time you create captions. All the options you'll see in the captions area are to export an SRT file.

If we export this as a Txt file, again, no identification with the captions, no timecode. And the SRT, of course, is going to have the timecode. And then you can import this into YouTube and other services that require SRT captions. And once you've got everything squared away, again, if you want to change anything in the particular caption, you can do it in the Captions tab in the Source Monitor, and Premiere Pro will correct in both places at once.

When you're done click, File > Export Media. And then, as we saw the first time, you can export either captions burnt into the video, or you can create a sidecar file. Below you'll see this video with burned-in captions created in this test. Note that I've left these captions uncorrected, in order to demonstrate Premiere Pro's first-try text-to-speech accuracy.

So that's the new Speech-to-Text function in Adobe Premiere Pro. Not only does Adobe do a great job converting the speech into an accurate transcription, It also takes all the work from placing all these individual files and copying and pasting text into them. So it's a huge timesaver for anyone creating captions in their videos.

Page 1

Featured Articles: Adobe Talks Text-Based Editing in Premiere Pro

The potentially game-changing update to Premiere Pro announced at NAB 2023 became available in late May: text-based editing.

Streaming Media

Streaming MediaMagazine

Tutorial: Speech-to-Text in Adobe Premiere Pro

Streaming Media
Magazine