Common Ninja’s Blog | How To Convert Speech to Text in React With the React Speech Kit

The speech-to-text feature is useful in entering text without the use of the keyboard. Among its advantages, implementing this in your app can also help accommodate visitors with certain accessibility disabilities. A common use of speech-to-text is in Note apps which we will be building in this tutorial.

You can view the hosted app here.
The complete code for this tutorial is available on GitHub.

While You Are Here, Why Not Learn How To Build a WordPress Plugin or Theme With React?

Prerequisites

To follow along with this tutorial, you should be familiar with React and have Node installed on your system.

What Is The React Speech Kit?

React speech kit is a library that makes it easy to use the Web Speech API, which provides a feature for converting text to speech and speech recognition. It includes two hooks in relation to the Web Speech API interfaces which are: useSpeechSynthesis for text-to-speech conversion and useSpeechRecognition for speech recognition. For our use case, we will only need the useSpeechRecognition hook.

useSpeechRecognition Hook for Speech to Text

useSpeechRecognition is a wrapper around the SpeechRecognition interface which when called returns an object with the following properties:

listen: This is a function that’s called to make the browser start listening for input.
stop: This is a function that’s called to make the browser stop listening.
listening: This is a boolean which is either true or false based on whether the browser is actively listening for inputs.
supported: A boolean to indicate whether the browser supports SpeechRecognition.

To get the transcript of the speech input, we can use the onResult function which is a property passed to the useSpeechRecognition hook. Here is a basic example of how to use useSpeechRecognition hook:

  import { useState } from 'react';
  import { useSpeechRecognition } from 'react-speech-kit';
    
  function Example() {
   const [value, setValue] = useState('')
   const { listen, stop } = useSpeechRecognition({
     onResult: (result) => {
       setValue(result)
     }
   })
    
   return (
     <div>
       <textarea
         value={value}
         onChange={(event) => setValue(event.target.value)}
        />
        <button onMouseDown={listen} onMouseUp={stop}>
          🎤
        </button>
       </div>
      )
    }

Pros and Cons of Speech-to-Text

Here are some of the pros and cons of speech-to-text technologies.

Pros

It can capture speech much faster than you can type, which will increase productivity.
It can streamline tedious jobs that have to do with typing.
Enables those with limited use of their hands to use voice inputs instead of typing.

Cons

Background noise interference decreases the accuracy of speech inputs.
Accent can be a problem when converting speech to text.
It doesn’t keep privacy. Speech inputs can be easily heard by others.

Building a Note Speech-to-Text App

In the following sections, we will be building a note app that will be able to search for and add notes using the speech-to-text feature.

Setting Up React

I have already created a note app template so that we can focus on implementing the speech functionality. So what we need to do now is to clone the GitHub repo. We can do this with the following commands:

git clone -b starter https://github.com/ElMirth/Voice-note-app.git
    
cd Voice-note-app
    
npm install

The above commands will clone the starter repo and install the node_modules. Once the installation is complete we can then start the app with the npm start command and we will see the following screen:

Right now, in the app, we can add, edit, view, delete and search notes. At the bottom-right of the page, I have included a microphone icon which we will use later while implementing the speech-to-text feature.

Search for Notes Using Speech

In the app when we click the microphone icon at the bottom-right of the page we will see a search option, something like this:

What we will do is that when we click on the search option and start speaking, the transcript will be entered in the search bar.

To do this first let’s install the React speech kit. We can do that with the following command:

yarn add react-speech-kit

Next, in the Notes.js file, import the useSpeechRecognition hook:

 // src/Notes.js
    import { useSpeechRecognition } from 'react-speech-kit';

Next, add the following line of code in the Notes component after the body state:

 // src/Notes.js
    const { listen, listening, stop, supported } = useSpeechRecognition({
      onResult: (result) => {
       setSearchValue(result)
      }
    })

In the above code in the onResult function of the useSpeechRecognition hook which returns the transcript of our speech input, we are calling setSearchValue to set the search state with the returned transcript.

Now to start listening to search inputs we need to call the listen function destructured from useSpeechRecognition. To do this, at the bottom of the page, modify the rendered SpeakOption component with text=’Search’ to the following:

 // src/Notes.js
    <SpeakOption 
      icon={<img src='/search-i.png' alt='search'/>}
      text='Search'
      onClick={() => {setSpeakOptionOpen(!speakOptionOpen); listen()}}
    />

With the above code, the browser can now start listening to speech inputs when we click the microphone icon and then click Search.

Right now while the browser is listening there is no indication in our app for that and also we cannot stop the listening. To do that, modify the rendered Speak component to look like the following:

 // src/Notes.js
    <Speak
      open={speakOptionOpen} 
      toogleViewOptions={() => setSpeakOptionOpen(!speakOptionOpen)}
      listening={listening}
      stopListening={stop}
    >

In the Speak component, we are using the passed props to conditionally change the background color of the microphone image and also stop the browser from listening when the image is clicked. Here is what the Speak component looks like:

 // src/Speak.js
    function Speak({children, open, toogleViewOptions, listening=false, stopListening}) {
    
      return (
        <div className={`speak ${open ? 'speak__OptionsView' : '' }`}>
          {open &&
            <div className='options'>
              {children}
            </div>
          }
          <img 
            src='/speak.png' 
            alt='microphone' 
            onClick={listening ? stopListening : toogleViewOptions}
            style= {{backgroundColor: listening ? '#468ad3e8' : 'gray'}}
          />
        </div>
      )
    }
    
    export default Speak

Taking Notes With Speech Feature

Here, we are going to use the speech-to-text feature to also add notes which contain a title and body. In our app, when we click Add new note and then click on the microphone icon, we will see the Title and Body option:

Right now the Title and Body speech option is not working which we will fix shortly. To do this, we need to change the state that’s been set in the onResult function of useSpeechRecognition hook to the respective state of the option we choose. Rather than calling setSearchValue directly in onResult we need to call a function that returns the state setter of the option we select.

For this first, let’s add a state that holds the string indicating the currently selected option. In the Notes component, add the following line code after the body state:

 // src/Notes.js
    const [current, setCurrent] = useState('')

Next, add the following function before the useSpeechRecognition hook:

  // src/Notes.js
    const getStateSetter = (result) => {
      switch(current) {
        case 'search':
          setSearchValue(result)
          break;
        case 'title':
          setTitle(result.toUpperCase())
          break;
        case 'body':
          setBody((prev) => prev + ' ' + result)
          break;
        default:
         return null
      }
    }

The above code will return the respective state setter based on the value of current. Next, let’s call replace setSearchValue(result) in the onResult function with the above function. Here is what the onResult function should now look like:

 onResult: (result) => {
      getStateSetter(result)
    }

Next, let’s modify the stopListening prop of the Speak rendered component and the onClick prop of SpeakOption to call setCurrent to set the currently selected option.

Here is what Speak and the SpeakOptions rendered component should look like:

<Speak
      open={speakOptionOpen} 
      toogleViewOptions={() => setSpeakOptionOpen(!speakOptionOpen)}
      listening={listening}
       stopListening={() => {setCurrent(''); stop()}}
    >
    {!openAddModal
      ?  <SpeakOption 
          icon={<img src='/search-i.png' alt='search'/>}
          text='Search'
          onClick={() => {setCurrent('search'); setSpeakOptionOpen(!speakOptionOpen)}}
        />
      : <>
          <SpeakOption
            icon='+'
            text='Title'
            onClick={() => {setCurrent('title'); setSpeakOptionOpen(!speakOptionOpen)}}
          />
          <SpeakOption 
            icon='+'
            text='Body'
            onClick={() => {setCurrent('body'); setSpeakOptionOpen(!speakOptionOpen)}}
          />
        </>
      }
    </Speak>

With the above code, we are setting the current state to an empty string when the browser is not listening and we are setting it to the search, title, or body based on the option that was selected. By doing this when getStateSetter we created earlier is called in the onResult it will return the appropriate state to be state setter.

Notice that above we are no longer calling the listen function in the onClick prop of the SpeakOption with text=’Search’, that because we no longer want to call it only when we click on the Search option rather we want it to be called when we click on any option and also before calling we need to check if the speech recognition is supported in the browser been used.

To do this, add the following lines of code after the useSpeechRecognition hook:

  // src/Notes.js
    useEffect(() => {
      if(current !== ''){
        if(!supported) alert('Speech recognition is not supported')
        if(current === 'body'){
          listen({interimResults: false})
        }else{
          listen()
        }
      }
    }, [current, listen, supported])

With the above code, whenever an option is clicked we check if speech recognition is supported if its not we alert a message indicating that, and if it is we call the listen function and when we start speaking, the transcript will be displayed in the respective input.

In the above code notice that when the value of current equal body we are passing {interimResults: false}. Were are doing this because in the getStateSetter function we are setting the body state to append to the previous state so by setting set the interimResults property to false onResult won’t return the speech transcript in realtime which at first will probably not be correct but it will return the fully processed speech transcript which will then be appended to the previous body state.

Conclusion

In this tutorial, we learned how to use React speech kit — a wrapper for the Web Speech API and how to use its speech recognition hook to implement speech-to-text functionality by building a note app.

Join Our Newsletter!

How To Convert Speech to Text in React With the React Speech Kit

Sergei Davidov, Jul 07, 2022