Review: Epiphan LiveScrypt
Epiphan's LiveScrypt combines a hardware appliance for audio input and a cloud application for transcription. Together, they deliver a polished, inexpensive, and easy-to-use solution for transcribing speech to text in real time for conferences, training, or similar sessions. You can display the text locally on a monitor, display it as closed captions in a live stream, and make the captions available on the web via a URL or QR code.
Like all transcription services, machine-based or human, accuracy isn't perfect, but Epiphan uses Google's AI-based Speech-to-Text application programming interface, so it should improve over time. If you're wrestling with how to affordably add live transcripts to your presentations, LiveScrypt is definitely worth a look.
Introducing LiveScrypt
The LiveScrypt hardware is a touchscreen-based appliance that feeds incoming audio from multiple sources into the cloud, where it's converted to text by Speech-to-Text. Once converted, you can display the text on the appliance itself, on a monitor connected via HDMI, and on a dedicated webpage that can be accessed by most mobile devices. You can also feed the text into a live-streaming application like Telestream's Wirecast or Epiphan's Pearl for display as closed captions. Once the event is over, you can download the completed transcript from the web.
The hardware costs $1,499.95 on Amazon, and the transcription service costs $9.95 per hour. The first 5 minutes of any presentation are free, and each hour or portion thereof is the full hourly price—no prorating.
Once you get the unit, you log into Epiphan's LiveScrypt portal to enter credit card information and register the device. Thereafter, you can run LiveScrypt entirely via its touchscreen interface or access it remotely via the portal. More on this later.
Input and Outputs
The hardware supports a very extensive range of audio inputs, including two XLR inputs (with Phantom power), stereo RCA connectors, a 3.5mm audio port, two HDMI ports, SDI audio, and two USB ports (see Figure 1, at top of page). Audio isn't pass-through, so you'll probably have to double up your outputs to support both live-streaming/local speakers and LiveScrypt. During my tests, I input audio through XLR and the RCA connectors, both of which worked well.
As mentioned, you can drive the unit locally or via the web interface. Once I paired the unit and connected a microphone via XLR, I was transcribing in seconds; the only hiccup was that I had to manually enable Phantom power to the condenser microphone in the software. You can see the result in Figure 2.
Figure 2. I was transcribing in seconds after pairing the unit in the portal.
Operationally, you start and stop transcription via controls on the upper left of the touchscreen or by using the equivalent web controls. You can see the Start button in Figure 3 and the Stop button and running timecode in Figure 2. You open the controls shown in Figure 3 using one of the three buttons on the lower right.
As you can see in Figure 3, the controls are fairly simple. The System tab contains information like the IP address and serial number. Audio allows you to mute different audio inputs and set Gain levels on some, but not all, inputs.
Figure 3. The onboard controls
The Transcription tab lets you choose one of the 30 languages currently supported by the system (you can see a full list at go2sm.com/livescrypt). Currently, the system supports transcription only, so if you're speaking German, you can output only German subtitles. However, translation is on the developmental road map. The Transcription tab also provides options to enable the automatic insertion of punctuation and a profanity filter that converts dirty words to asterisks. The Security tab lets you set a password to operate the touchscreen and the web interface.
You'll spend most of your time in the Output tab shown in Figure 3. This is where you configure the text HDMI output from the unit for local display or input into a live-streaming system like Wirecast, shown in Figure 4. You can configure this output as text only or with text and a QR code so viewers watching a local display can retrieve the caption feed on their mobile devices.
Figure 4. Here's the transcription inserted into Wirecast.
To capture the text in Wirecast, I connected the LiveScrypt HDMI output to an Epiphan AV.io 4K USB capture device and configured the AV.io input into the cropped box appearing at the bottom of the video input. For the record, this video was from a Sennheiser microphone review shot a few years ago, and the audio quality was quite good. I played the video file on a Mac notebook, which I input into the LiveScrypt unit via RCA connectors. At the same time, I played the same video on an HP notebook, which I input into Wirecast via Desktop Presenter. I recorded the presentation you can view with the transcription shown in Figure 4.
I was surprised that the closed-caption use case was not explicitly supported via a preset that output two or three lines in a short and wide output resolution. It's not hard to configure the output for captions in a system like Wirecast, but it will take some experimentation to cleanly simulate closed captions. I would have thought that this use case would be so common that Epiphan would support it with a preset.
After the presentation is finished, you can download the transcription in either .srt or .txt formats from the web interface shown in Figure 5. Note that this portal contains all of the controls available on the LiveScrypt hardware itself so you can run the system remotely. On the top right of Figure 5, you can see the Stream URL that displays the transcription on the web.
Figure 5. AVStudio provides controls for the paired LiveScrypt unit and downloads of completed presentations.
What About Accuracy?
To get a sense of how LiveScrypt works, I watched some videos available on the Epiphan website. In one of the videos, the Epiphan spokesperson claims 92% accuracy, and that feels about right, although some articles place Google's accuracy at as high as 95%.
For perspective, note that the accuracy of human transcribers rates from about 95% to 98% in some of the articles I found. So perfection is not commercially available.
The big question is what level of accuracy is needed to be “useful.” You can be the judge of that yourself by watching this recording of the Sennheiser video shown in Figure 4.
Through most of the video, accuracy is quite good and only a second or two behind the audio, which was impressive. There are some instances where the transcription falls behind, after which the system catches up with an explosion of text that's a bit hard to follow (see around 1:24). Note that if you're working within a specific industry with unique jargon, you can enter a North American Industry Classification System (NAICS) code to increase transcription accuracy for specific terms and acronyms, which I did not try.
I should also point out that the accuracy exhibited in the test video was the best case in my testing. I also tested some recordings of Europeans speaking in lightly to heavily accented English, and the results were unusable. Of course, because Epiphan is relying on Google for the transcription, you should expect the accuracy for all use cases to improve over time.
In this regard, one way to evaluate LiveScrypt is as a hardware device and service designed to feed high-quality audio from different sources to Google and retrieve and make the transcription available for flexible delivery. Viewed in this light, Epiphan did a great job incorporating a range of audio inputs and making operation extremely simple. The only question is whether the accuracy delivered by Google at this point meets the needs of your application.
Related Articles
Mike Sandler, CEO, Epiphan Video
15 Jan 2019
Epiphan Video's Pearl Mini delivers pro-quality live event video production.
11 Jan 2019