Generate video subtitles in realtime

Learn how to convert speech to text in JS

ยท

2 min read

Generate video subtitles in realtime

Hey folks ๐Ÿ‘‹๐Ÿป,

Hope you're doing great!

Today we will dive into the interesting article of converting speech to text in javascript.

Initially I had some random thoughts of how the subtitles are generated. So I went through some research & found the simple code. Now will share the same with you ๐Ÿ˜.

Initial Steps

  1. Get user permission to record audio.
  2. Create socket connection to Deepgram to fetch realtime data.
  3. Send Stream Data to Deepgram in every set of time intervals.
  4. Display the messages received via socket.

Deepgram Integration

Here we are using Deepgram APIs for Speech Recognition.

So Login to https://console.deepgram.com and generate API key.

JS Code

Get user permission to record audio

navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
  startRecording(stream)
})

Create Socket connection to Deepgram

const startRecording = (stream) => {
  const mediaRecorder = new MediaRecorder(stream)
  const socket = new WebSocket('wss://api.deepgram.com/v1/listen', [
    'token',
    '<api_key>',
  ])

  socket.onopen = () => {
    mediaRecorder.addEventListener('dataavailable', async (event) => {
      if (event.data.size > 0 && socket.readyState == 1) {
        socket.send(event.data)
      }
    })
    mediaRecorder.start(250)
  }

  socket.onmessage = (message) => {
    const received = JSON.parse(message.data)
    console.log(received.channel.alternatives[0].transcript)
  }
}

Testing

To test the code,

  • Just open the Youtube video of any TED talks.
  • Copy the above code and paste in developer console.
  • Replace the api_key generated.
  • Run the code.
  • Verify whether Micrphone access has beed allowed.

It's just the simplest demo, this is not the best way to show subtitles as we are recording Audio and the device volume must be turned ON.

You can also test by talking to microphone as audio recording is ON & your speech will also be converted to text.

Screenshot 2021-12-13 at 12.01.14 AM.png

The Recorded Speech will be shown as Text in the developer console.

tumblr_553b67e4347f4251e417df82d3ea76e3_945ff351_500.gif

ย