Offline Speech Recognition using Raspberry Pi4

AUTHOR: Kevin Shin

Outcome

Initial Goal: Setup Reachy Raspberry Pi4 with the Offline Speech Recognition

Realized Outcome: All the application have been installed and tested (as much as I can remotely)

Next Step: Follow the instruction outlined below to test the speech recognition real-time using a speaker

Outline/Framework/Recipe/Steps

Referenced the tutorial Dan recommended (pictures and the commands are relevant to our Reachy Raspberry Pi4 image) - https://www.hackster.io/dmitrywat/offline-speech-recognition-on-raspberry-pi-4-with-respeaker-c537e7

[STEPS FOR INSTALLATION AND TESTING]

Install DeepSpeech package on RPi4 (reference the command & screenshot below) *Note the installation steps have been done & On-site team can jump straight to the Testing portion.
- pip3 install deepspeech
Download the speech recognition models, language models (there are 3 models that needs to be downloaded - ref screenshot)
Download & uncompress the example audio files for testing
- curl -LO https://github.com/mozilla/STT/releases/download/v0.7.1/audio-0.7.1.tar.gz
- tar xvf audio-0.7.1.tar.gz
Play the audio file & test the recognition
- deepspeech --model deepspeech-0.7.*models.tflite --scorer deepspeech-0.7.*models.scorer --audio audio/2830-3980-0043.wav
- It worked!! It took 1.523 second for processing audio file that is 1.975 seconds long -
- Note: the speech recognition listens for the entire sentences and processes afterword. This means, we will need to have a "Human Interaction" strategy to make sure the user pauses at the right moment to allow Reach to perform recognition (which can take as long as the time it took a human to say the words)

[STEPS FOR TESTING REALTIME SPEECH RECOGNITION]

Clone the DeepSpeech examples (this step have been completed)
- git clone https://github.com/mozilla/DeepSpeech-examples
Change to the directory and install the dependencies
- cd ~/DeepSpeech-examples/mic_vad_streaming
- pip3 install -r requirements.txt
- sudo apt install portaudio19-dev
Need On-site team to test the Real-time Speech Recognition (please reference the steps below or follow the original tutorial - https://www.hackster.io/dmitrywat/offline-speech-recognition-on-raspberry-pi-4-with-respeaker-c537e7)

Connect the microphone to the Reachy RPi4 - any microphone can be used but the quality of the microphone and/or the background noise can have impact on the speech recognition (Consideration: Speech Recognition model we are using was likely trained using a clear voice input from a quiet location. Our robot may be located in the environment where it may be noisy and/or users are wearing a COVID face mask)
For our testing of the system, please use the command below to leverage sample file we installed above. Our result should look like the Youtube vide below - Good Luck!
- python3 ../DeepSpeech-examples/mic_vad_streaming/mic_vad_streaming.py --model deepspeech-0.7.*models.tflite --scorer deepspeech-0.7.*models.scorer
- Youtube video: https://youtu.be/a7n5XLZrM1w
https://youtu.be/a7n5XLZrM1w