Speech Recognition

Post#1 by **Rava** » 06 Dec 2020, 18:25

I came here from the forest
I tell you, ſrom this post specifically: Howto: Popular AppImages one click away (Post by Rava #80155)

Rava wrote: ↑
03 Dec 2020, 09:29

Tolkem wrote: ↑
02 Dec 2020, 17:54
This Speech Recognition from audio/video file using PocketSphinx is such a great feature I'd never seen in any other subtitle editors,
Is that PocketSphinx Speech Recognition available for use with speech recognition of vocal recordings for the use of creating office documents, too?
Cause I am in dire need of one that is free and open source software and is of good quality, speech recognition-wise, and is also able to learn when unknown words are used.

I looked into PocketSphinx Speech Recognition and found this:
https://www.codesofinterest.com/2017/03 ... phinx.html

Easy Speech Recognition in Python with PyAudio and Pocketsphinx
If you remember, I was getting started with Audio Processing in Python (thinking of implementing an audio classification system) a couple of weeks back (see my earlier post). I got the PyAudio package setup and was having some success with it. As you know, one of the more interesting areas in audio processing in machine learning is Speech Recognition. So, although it wasn't my original intention of the project, I thought of trying out some speech recognition code as well.

I searched around to see what Python packages are available for the task and found the SpeechRecognition package.

Python Speech Recognition running with Sphinx

SpeechRecognition is a library for Speech Recognition (as the name suggests), which can work with many Speech Engines and APIs. The current version supports the following engines and APIs,

CMU Sphinx
Google Speech Recognition
Google Cloud Speech API
Wit.ai
Microsoft Bing Voice Recognition
Houndify API
IBM Speech to Text
I decided to start with the Sphinx engine since it was the only one that worked offline. But keep in mind that Sphinx is not as accurate as something like Google Speech Recognition.

First, let's set up the SpeechRecognition package.

To start, you need to have the PyAudio package. SpeechRecognition requires PyAudio to interact with the microphone of your computer. If you don't have PyAudio installed already, you can follow the instructions from my earlier post to set it up.

Next, since we will be using the Sphinx engine, we need to install the pocketsphinx package,
Code: Select all
 pip install pocketsphinx  
Finally, you can install SpeechRecognition, again from pip,
Code: Select all
 pip install SpeechRecognition  
With everything set up, we are ready to code our speech recognition script.

The basic code is quite simple,
Code: Select all
 import speech_recognition as sr  
   
 # obtain audio from the microphone  
 r = sr.Recognizer()  
 with sr.Microphone() as source:  
   print("Say something!")  
   audio = r.listen(source)  
   
 # recognize speech using Sphinx  
 try:  
   print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")  
 except sr.UnknownValueError:  
   print("Sphinx could not understand audio")  
 except sr.RequestError as e:  
   print("Sphinx error; {0}".format(e))  
The code will create a Recognizer object, create a Microphone object, listen to the microphone to hear a spoken phrase, and use the appropriate recognizer engine ('recognize_sphinx' here) to recognize the phrase.

Sounds quite simple right?

But, if you run this code, you may find that the code hangs sometimes, not recognizing you speaking.

Speech Recognition hangs, not recognizing you speaking
This happens due to ambient noise.

A typical microphone will pick up a lot of noise from a background, even though we don't hear it, which will interfere with the speech recognition.

We need to filter out this ambient noise to make the speech recognition more accurate. You do this by setting the energy threshold of the Recognizer object. The energy threshold defines which levels are noise, and which levels are speech. We need to set the threshold so that the recognizer ignores the ambient noise in our environment so that it can focus on the speech. But, how do we know to which value to set the threshold?

Luckily, the SpeechRecognition package has a built-in method to help us with that.

We just need to use the adjust_for_ambient_noise method, and it will automatically listen to the environment and will calculate and set the optimal energy threshold for it.

and the article continues.

Seems getting Offline Speech Recognition in Linux takes some work to get it working properly, and I presume there is no one here who already has managed setting it up?

__________
Yes the first line and next 3 words are a wintertime reference to a famous German poem by Theodor Storm - Knecht Ruprecht -- (approx English translation) - and the German original.