Live Transcription with Video Delay

This page describes how FAB Subtitler BCAST/XCD can be configured for live transcription with video delay functionality so that live transcribed subitles appear on delayed video without delay. In this configuration subtitle text will appear when the person is speaking the same text which is displayed in the subtitle and not with a few seconds delay.

Using video delay

FAB Subtitler BCAST/XCD can use a Decklink SDI card for the following functionality:

  • The SDI signal connected to the SDI input is used as audio source for live transcription
  • The same SDI signal is then delayed by a configurable number of seconds (1-30) and live transcribed subtitles are then overlaid over the delayed video which is provided on the SDI output.

FAB Subtitler BCAST/XCD will consider the delay of live transcription and the delay of the video when transmitting subtitles so that subtitles will be displayed at the correct time when the person is speaking the text.

Configuration of video delay

You can configure the video delay for the Decklink SDI card in FAB Subtitler Options:

To achieve very high quality of recognized text it may be necessary to wait for the final result of live transcription which may take up to 25 seconds. In such case live video should be delayed by 30 seconds and the value of ms to enter in the configuration would be 30000.

Configuration of delayed subtitle transmission

The second important setting is to instruct FAB Subtitler to transmit subtitles with the same delay which is configured for the video delay.

Select the subtitle output that shall be delayed under “Active outputs” and then click on “Set name / delay” and enter the name and the required delay for this output:

Configuration of live transcription

It is important to understand that the live transcription service must be able to report timestamps of recognized words. This functionality is offered by Microsoft and FAB Subtitler Server.

It is necessary to activate the correct setting so that timestamps of recognized words are included in calculation of timing for subtitle transmission:

The value 5000 which is visible in above picture is only used for transcription services that do not report timestamps of recognized words.

This page was last updated on 2024-10-10