Making speech recognition in C# recognize phrases from a set vocabulary in continuous speech

Question

0.00/5 (No votes)

See more:

, +

I am trying to recognize instructions using speech recognition in C#. These instructions consist of a vocabulary of set phrases I know. So far nothing is working the way I need as I need to be able to recognize these phrases when they immediately follow each other in continuous speech. Depending on the first phrase of the instruction the vocabulary for the next one is different, but I would likely be able to get this working even if I could recognize all these phrases at all times.

I have really searched for solutions and tried to get this to work, but I am currently at a loss as to what to do as currently, nothing seems to meet my expectations. Hopefully, somebody can think of a better solution to this problem. I am open to any solutions including the use of other APIs as long as they will work offline (and preferably are free).

What I have tried:

I tried using the built-in System.Speech.Recognition API. Using the dictation vocabulary the accuracy is not good enough for me to be able to work with it. I tried using a specified Grammar but the issue here is, that it can't recognize the phrases when the immediately follow each other.

I also found the PauseRecognizerOnRecognition method which seems to be something like what I am looking for. But that only seems to work for SpeechRecognizer and not SpeechRecognitionEngine. However, for my needs the pop-up and sounds created when using SpeechRecognizer are undesirable and I wasn't really able to get it working anyway.

I managed to get a little further using the Append method in GrammarBuilder. Thanks to this I can now have a set of phrases immediately following each other recognized. One of my issues with this approach is, that depending on the start of the instruction I may expect a different amount of phrases to follow it. The only solution here would be to add all the possible phrases as well as their variations with optional parts at the end. This would not only be tedious but likely very inefficient. For my use case, the phrases I expect to follow differ depending on what precedes them. Unfortunately, there doesn't seem to be any way to recognize the phrases fast enough to be able to say which phrases to expect afterward nor did I find a way to buffer the input waiting for the first part to be recognized and processing the following one based on what is said in the first. The recognition just doesn't seem to be quick enough whatever I do to even recognize the following phrase from a specified vocabulary let alone change the vocabulary based on what has been said.

Posted 19-May-19 8:32am

TheRorooo

Updated 19-May-19 9:43am

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Thomas Daniels · Accepted Answer · 2019-05-19T09:43:00

Solution 1

Check out the Choices[^] class in the System.Speech.Recognition API. A Choices object is something you can append to a GrammarBuilder with .Append, and you can create a Choices both out of a String array and a GrammarBuilder array. I've never done experiments to this extent but Choices should make it possible to create a "tree-like" grammar structure like you want to have.

Posted 19-May-19 9:43am

Thomas Daniels

Comments

TheRorooo 19-May-19 16:07pm

Thank you for your reply. Maybe I didn't explain myself well. I am currently using a combination of Choices and .Append as you suggest. My problem is that let's say I have .Apppend 3 times and I want a fourth or even fifth optional one. It would also be the best for me to change the order of the next appends based on the phrase recognized from the first one as I currently would need to include the phrases in both. I also don't think this is really a tree structure as once you branch to one phrase you still have an option for all the same phrases in the next append which ideally you shouldn't. Or actually reading you comment one more time, do you mean you can include a whole grammar builder as one of the choices and try to make a tree-like structure that way?

TheRorooo 19-May-19 16:30pm

Okay, I think I got what you meant. Didn't know you could use a whole GrammarBuilder inside of Choices! This looks very promising from some initial testing :) Thank you very much.

Thomas Daniels 19-May-19 16:35pm

Yes, perhaps I should have emphasized that -- the fact that you can have Choices of GrammarBuilders makes it a good option for you. You could create a GrammarBuilder that has "[Instruction 1]" followed by choices "[1a]", "[1b]", etc. - and create such a GrammarBuilder for each top-level instruction and put all those in another Choices.

TheRorooo 19-May-19 16:47pm

Works pretty well so far. The only thing that would be useful in some situations for me and can't find a way to do it so far is having an instruction followed by naming some amount of specified phrases that could be from one to all of them. I could probably append the same options multiple times, but then the recognition doesn't stop. As I plan to use push to talk for this project anyway maybe I could have it just take the current hypothesized speech after releasing the button. This approach would probably also enable me to have optional phrases at the end of the instruction. If you have any better ideas, please let me know.

Thomas Daniels 19-May-19 16:59pm

I also haven't done experiments with this, but perhaps you could try an empty string or empty GrammarBuilder in a Choices object? And/or set the BabbleTimeout property on the SpeechRecognitionEngine to a nonzero value?

TheRorooo 19-May-19 17:11pm

Thanks for the suggestion. Including an empty string in Choices threw an error, but including a space seems to actually work. I am not very familiar with the BabbleTimeout property although I tried messing with I earlier. If I understand correctly that's the time the recognizer waits after it doesn't hear anything before it finishes recognition. Does a zero value make it wait infinitely in this case? But still maybe for my push to talk usage the approach, I suggested earlier might be better because this way the speech recognizer doesn't stop when I make a break when naming those things. Thanks either way.

Thomas Daniels 19-May-19 17:14pm

I think setting it to zero will make it wait indefinitely, yeah.

Though, on second thought, you probably shouldn't use BabbleTimeout. The RecognitionResult will be null in that case, which you probably can't use.

TheRorooo 19-May-19 17:23pm

Well, thanks a lot. I have something that works and I can experiment with now. The current approach with spaces works, but I might try using the hypothesized text after releasing the push to talk button once I get to implementing that. The biggest issue with the current approach really is that you can't take a break at all when naming those things and I don't think there is any way to customize the time it waits.
Anyway, I am new to this website, but very impressed so far. Was searching for a solution for a long time and wasn't able to get an answer on Stack Overflow. But then I decided to try asking here after I stumbled upon your speech recognition guide which helped me discover the .Append function :)

Thomas Daniels 19-May-19 17:31pm

Yes, that approach should also work. And glad to hear that my article was helpful :)