Data Scraping from Speech to Text

Eric M. H. Goh

Rate me:

5.00/5 (5 votes)

28 Mar 2018Apache2 min read

12.2K

520

Speech to Text Recognition for Data Scraping and Collection in Data Mining

Introduction

Data Science is a growing field. According to CRISP DM model and other Data Mining models, we need to collect data before mining out knowledge and conduct predictive analysis. Data Collection can involve data scraping, which includes web scraping (HTML to Text), image to text and video to text conversion. When data is in text format, we usually use text mining techniques to mine out knowledge.

In this article, I am going to introduce you to speech to text recognition. I developed Just Another Voice Transformer (JAVT) to convert videos into text files, and consolidate them into a set of text data for text mining and natural language processing.

JAVT has features to convert video into audio file using ffmpeg, and then convert audio into text file, using Microsoft SAPI or CMU Sphinx. I have included the source code for all the video to audio conversion and audio to text conversion. In this article, I am going to explain only the Speech Recognition and Speech Synthesizer using Microsoft SAPI, and interfacing with ffmpeg.

Speech Recognition in C# using Microsoft SAPI

To use speech recognition in C#, you will need to add the following libraries at the top of the code:

using System.Speech.Recognition;
using System.Speech.AudioFormat;

Then create the dictation grammar and Speech Recognition Engine:

DictationGrammar dictation;
dictation = new DictationGrammar();
private SpeechRecognitionEngine sr;
sr = new SpeechRecognitionEngine();

We will then need to load the dictation grammar into speech recognition engine:

sr.LoadGrammar(dictation);

If you are using .wav file as input, set the speech recognition engine to:

sr.SetInputToWaveFile(textBox3.Text);

If you are using the audio device such as microphone as input, set the speech recognition engine to:

sr.SetInputToDefaultAudioDevice();

To perform asynchronous speech recognition:

sr.RecognizeAsync(RecognizeMode.Multiple);

Then add these event handlers:

sr.SpeechRecognized -= new EventHandler<SpeechRecognizedEventArgs>(SpeechRecognized);
sr.EmulateRecognizeCompleted -= 
new EventHandler<EmulateRecognizeCompletedEventArgs>(EmulateRecognizeCompletedHandler);

sr.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(SpeechRecognized);
sr.EmulateRecognizeCompleted += 
new EventHandler<EmulateRecognizeCompletedEventArgs>(EmulateRecognizeCompletedHandler);

If the speech is recognized, SpeechRecognized() method will be called. The following is the SpeechRecognized() method used in JAVT. To get the recognized text, we get it from e.Result.Text.

string finalResult;
private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e) {
            try{
            finalResult = e.Result.Text;
            richTextBox3.Text += " " + finalResult;
            }
            
            catch(Exception ex) {
                MessageBox.Show(ex.Message);
            }
        }

If the speech recognition is completed, the EmulateRecognizeCompletedHandler() method will be called. The following is the EmulateRecognizeCompletedHandler() method in the program:

bool isCompleted = false;
private void EmulateRecognizeCompletedHandler(object sender, EmulateRecognizeCompletedEventArgs e) {
            try{
            isCompleted = true;
            
            sr.UnloadGrammar(dictation);
            sr.RecognizeAsyncStop();
            
            richTextBox3.Text += "\n\nCompleted. \n";
            MessageBox.Show("Completed. ");
            }
            
            catch(Exception ex) {
                MessageBox.Show(ex.Message);
            }            
        }

Text to Speech

Since we have created speech recognition, the following is the text to speech recognition.

First, we need to add in System.Speech.Synthesis library and create Speech Synthesizer:

using System.Speech.Synthesis;

SpeechSynthesizer speaker;
speaker = new SpeechSynthesizer();

Then we set the Rate and Volume:

speaker.Rate = int.Parse(rateTextBox.Text);
speaker.Volume = int.Parse(volTextBox.Text);

To use a female speaker:

speaker.SelectVoiceByHints(VoiceGender.Female);

Then run the Speech Synthesizer:

speaker.SpeakAsync(richTextBox2.Text);

Video to Audio Conversion

I use ffmpeg to convert video into audio. To interface with ffmpeg, first, include the System.Diagnostics library:

using System.Diagnostics;

Then create a new process:

Process process = new Process();

Create the ffmpeg inputs:

string arg = "-i " + f + " -ab 160k -ac 2 -ar 44100 -vn " + f + ".wav";

Set the process settings:

process.StartInfo.FileName = Directory.GetCurrentDirectory() + "\\ffmpeg\\bin\\ffmpeg.exe";
process.StartInfo.Arguments = arg;
process.StartInfo.ErrorDialog = true;
process.StartInfo.WindowStyle = ProcessWindowStyle.Normal;

Start the process:

process.Start();
process.WaitForExit();

License

This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0

Written By

Eric M. H. Goh

Founder SVBook Pte. Ltd.

Singapore

Eric Goh is a data scientist, software engineer, adjunct faculty and entrepreneur with years of experiences in multiple industries. His varied career includes data science, data and text mining, natural language processing, machine learning, intelligent system development, and engineering product design. He founded SVBook Pte. Ltd. and extended it with DSTK.Tech and EMHAcademy.com. DSTK.Tech is where Eric develops his own DSTK data science softwares (public version). Eric also published “Learn R for Applied Statistics” at Apress, and published some books at LeanPub, Google Books, Amazon kindle, and SVBook Pte. Ltd. He teaches the content at EMHAcademy.com, Udemy, SkillShare, BitDegree, Simpliv, and developed 28 courses, 7 advanced certificates. Eric is also an adjunct faculty at Universities and Institutions.

Eric Goh has been leading his teams for various industrial projects, including the advanced product code classification system project which automates Singapore Custom’s trade facilitation process, and Nanyang Technological University's data science projects where he develop his own DSTK data science software (NTU version). He has years of experience in C#, Java, C/C++, SPSS Statistics and Modeller, SAS Enterprise Miner, R, Python, Excel, Excel VBA and etc. He won Tan Kah Kee Young Inventors' Merit Award 2007, and Shortlisted Entry for TelR Data Mining Challenge.

Eric holds a Masters of Technology degree from the National University of Singapore, an Executive MBA degree from U21Global (currently GlobalNxt) and IGNOU, a Graduate Diploma in Mechatronics from A*STAR SIMTech (a national research institute located in Nanyang Technological University), Coursera Specialization Certificate in Business Statistics and Analysis (Excel) from Rice University, IBM Data Science Professional Certificate (Python, SQL), and Coursera Verified Certificate in R Programming from Johns Hopkins University. He possessed a Bachelor of Science degree in Computing fr

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.