Wrong recognition using sapi

Question

3.00/5 (2 votes)

See more:

hi ,
I developed an application which converts wav file to text using c#. Using SAPI TTS app tool i saved the wav file in microsoft voice itself. For accurate recognition only i saved it in microsoft voice. Though the result is not accurate. It is recognizing the words wrongly, such as meeting as needing and cute as dubed etc.
I attached my code with it.

C#

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;
using SpeechLib;
namespace using_wav
{
    public partial class Form1 : Form
    {
        private SpeechLib.ISpeechRecoContext wavRecoContext = null;
        private SpeechLib.SpFileStream InputWAV = null;
        private SpeechLib.ISpeechRecoGrammar Grammar = null;
        private String _WAVFile = null;
        private string strData = "No recording yet";
        private String _lastRecognized = "";

        public Form1()
        {
            InitializeComponent();
        }
        private void button2_Click(object sender, EventArgs e)
        {
           Close();
        }
        private void button1_Click(object sender, EventArgs e)
        {
           //String[] filePaths = Directory.GetFiles(@"c:\MyDir\", "*.bmp",
           //                              SearchOption.AllDirectories);
            OpenFileDialog dialog = new OpenFileDialog();
            dialog.Title =
            "Select a Speech file";
            dialog.ShowDialog();
            _WAVFile = dialog.FileName;
           //_WAVFile = dialog.filePaths;
            if (_WAVFile == null) return;
            wavRecoContext =new SpeechLib.SpInProcRecoContext();
            ((SpInProcRecoContext)wavRecoContext).Recognition +=new _ISpeechRecoContextEvents_RecognitionEventHandler(wavRecoContext_Recognition);
            ((SpInProcRecoContext)wavRecoContext).EndStream += new _ISpeechRecoContextEvents_EndStreamEventHandler(wavRecoContext_EndStream);
            Grammar = wavRecoContext.CreateGrammar(2);
            Grammar.DictationLoad("", SpeechLoadOption.SLOStatic);
            InputWAV = new SpFileStream();
            InputWAV.Open(@_WAVFile,SpeechStreamFileMode.SSFMOpenForRead, false);
            wavRecoContext.Recognizer.AudioInputStream = InputWAV;
            Grammar.DictationSetState(SpeechRuleState.SGDSActive);
            }
        private void wavRecoContext_Recognition(int StreamNumber, object StreamPosition, SpeechRecognitionType RecognitionType, ISpeechRecoResult Result)
        {
            strData = Result.PhraseInfo.GetText(0, -1,true);
            _lastRecognized = textBox1.Text;
            textBox1.Text = strData;
        }
        private void wavRecoContext_EndStream(int StreamNumber, object StreamPosition, bool f)
        {
            Grammar.DictationSetState(
            SpeechRuleState.SGDSInactive); 
        }
    }
}

Is there any fault in this code.
Thanks in advance.

[edit]Code block added - OriginalGriff[/edit]

Posted 11-Feb-12 3:00am

psgviscom

Updated 11-Feb-12 3:12am

OriginalGriff

v2

Add a Solution

2 solutions

Solution 2

First of all, you don't need to use SAPI directly for this simple task. You could simply use the namespace System.Speech.Recognition in the assembly "System.Speech" available in CAG. It is bundled with (freely distributed) .NET Framework runtime package and is easy to use.

The recognition quality is just lower than what you expected. For better results, use only small grammars with clearly distinct expressions. Pronounce more clearly. :-)

—SA

Posted 11-Feb-12 11:01am

Sergey Alexandrovich Kryukov

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Dave Kreskowiak · Accepted Answer · 2012-02-11T06:59:00

Solution 1

What made you think that using a synthesized voice was going to make recognition more accurate?? I'd think it would be jsut the opposite as synthesized speech doesn't always say every word properly.

Posted 11-Feb-12 6:59am

Dave Kreskowiak

Comments

Sergey Alexandrovich Kryukov 11-Feb-12 17:02pm

Agree, my 5, but I also added a couple of practical recommendations based on my experience, please see.
--SA

psgviscom 11-Feb-12 21:33pm

But it synthesizing the words correctly. And it is SAPI's own pronunciation. why can't it recognizing the words correctly.

Sergey Alexandrovich Kryukov 11-Feb-12 22:44pm

Isn't this obvious? Comparison synthesis with recognition is pointless. Don't you see they have virtually nothing in common?
--SA

Dave Kreskowiak 12-Feb-12 9:10am

Because the sound it listens to is not compared to the sounds it uses to create words. The two alogithms and data they used are completely seperate from each other. They may as well be in seperate libraries.

Dave Kreskowiak 12-Feb-12 9:11am

Look. Even the best speech recognition engine is not going to get every word correct. Dragon is about the best there is and not even it has a 100% success rate.