Click here to Skip to main content
15,881,803 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I am trying to recognize instructions using speech recognition in C#. These instructions consist of a vocabulary of set phrases I know. So far nothing is working the way I need as I need to be able to recognize these phrases when they immediately follow each other in continuous speech. Depending on the first phrase of the instruction the vocabulary for the next one is different, but I would likely be able to get this working even if I could recognize all these phrases at all times.

I have really searched for solutions and tried to get this to work, but I am currently at a loss as to what to do as currently, nothing seems to meet my expectations. Hopefully, somebody can think of a better solution to this problem. I am open to any solutions including the use of other APIs as long as they will work offline (and preferably are free).

What I have tried:

I tried using the built-in System.Speech.Recognition API. Using the dictation vocabulary the accuracy is not good enough for me to be able to work with it. I tried using a specified Grammar but the issue here is, that it can't recognize the phrases when the immediately follow each other.

I also found the PauseRecognizerOnRecognition method which seems to be something like what I am looking for. But that only seems to work for SpeechRecognizer and not SpeechRecognitionEngine. However, for my needs the pop-up and sounds created when using SpeechRecognizer are undesirable and I wasn't really able to get it working anyway.

I managed to get a little further using the Append method in GrammarBuilder. Thanks to this I can now have a set of phrases immediately following each other recognized. One of my issues with this approach is, that depending on the start of the instruction I may expect a different amount of phrases to follow it. The only solution here would be to add all the possible phrases as well as their variations with optional parts at the end. This would not only be tedious but likely very inefficient. For my use case, the phrases I expect to follow differ depending on what precedes them. Unfortunately, there doesn't seem to be any way to recognize the phrases fast enough to be able to say which phrases to expect afterward nor did I find a way to buffer the input waiting for the first part to be recognized and processing the following one based on what is said in the first. The recognition just doesn't seem to be quick enough whatever I do to even recognize the following phrase from a specified vocabulary let alone change the vocabulary based on what has been said.
Posted
Updated 19-May-19 9:43am

1 solution

Check out the Choices[^] class in the System.Speech.Recognition API. A Choices object is something you can append to a GrammarBuilder with .Append, and you can create a Choices both out of a String array and a GrammarBuilder array. I've never done experiments to this extent but Choices should make it possible to create a "tree-like" grammar structure like you want to have.
 
Share this answer
 
Comments
TheRorooo 19-May-19 16:07pm    
Thank you for your reply. Maybe I didn't explain myself well. I am currently using a combination of Choices and .Append as you suggest. My problem is that let's say I have .Apppend 3 times and I want a fourth or even fifth optional one. It would also be the best for me to change the order of the next appends based on the phrase recognized from the first one as I currently would need to include the phrases in both. I also don't think this is really a tree structure as once you branch to one phrase you still have an option for all the same phrases in the next append which ideally you shouldn't. Or actually reading you comment one more time, do you mean you can include a whole grammar builder as one of the choices and try to make a tree-like structure that way?
TheRorooo 19-May-19 16:30pm    
Okay, I think I got what you meant. Didn't know you could use a whole GrammarBuilder inside of Choices! This looks very promising from some initial testing :) Thank you very much.
Thomas Daniels 19-May-19 16:35pm    
Yes, perhaps I should have emphasized that -- the fact that you can have Choices of GrammarBuilders makes it a good option for you. You could create a GrammarBuilder that has "[Instruction 1]" followed by choices "[1a]", "[1b]", etc. - and create such a GrammarBuilder for each top-level instruction and put all those in another Choices.
TheRorooo 19-May-19 16:47pm    
Works pretty well so far. The only thing that would be useful in some situations for me and can't find a way to do it so far is having an instruction followed by naming some amount of specified phrases that could be from one to all of them. I could probably append the same options multiple times, but then the recognition doesn't stop. As I plan to use push to talk for this project anyway maybe I could have it just take the current hypothesized speech after releasing the button. This approach would probably also enable me to have optional phrases at the end of the instruction. If you have any better ideas, please let me know.
Thomas Daniels 19-May-19 16:59pm    
I also haven't done experiments with this, but perhaps you could try an empty string or empty GrammarBuilder in a Choices object? And/or set the BabbleTimeout property on the SpeechRecognitionEngine to a nonzero value?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900