Our auto-transcription mechanism uses one of the most advanced speech-to-text technologies provided by Google.

Automated captions are NOT a replacement for captions created by a human.

The use of our transcription mechanism is optional. You can skip this step and import your .srt into the context editor.

The auto-transcription mechanism can save you a HUGE amount of time on captions timing and rewriting every spoken word, if you don't have an .srt at hand.

To get the most out of it and to achieve a high level of precision, follow this best-practices checklist:

Transcription Best Practices

We follow Google's text-to-speech best practices on our end. Follow the below steps to get the most precise transcription:

Capture the audio with a sampling rate of 16,000 Hz or higher.
Avoid resampling of your audio.
The audio should be as clean as possible.
Don't use any noise reduction as it reduces recognition accuracy.
Position the microphone as close to the person speaking as possible.
Don't use automatic gain control.
Avoid multiple people talking at the same time.
All speakers should speak at a similar volume level.

In most cases, you shouldn't have to think about these requirements.

Just remember to upload a video where audio is not resampled. Provided that your audio is in good quality, it should be transcripted with a high level of confidence.

My transcription is NOT as precise as stated

If your audio does not follow these best practices, its confidence level may not be calculated properly.

This may happen if the transcription mechanism does not catch some of the spoken words in the video.

Don't reupload your video in this case, as it will not help you get better results. Your audio quality is not sufficient in this case. We recommend that you use a human transcription service, such as rev.com, and upload your transcript in an .srt format.
Was this article helpful?
Cancel
Thank you!