ShakyVoice - A voice stress analysis tool

Vladimir Ralev

4.10/5 (25 votes)

May 27, 2004

CPOL

6 min read

144871

2776

This is an implementation of a simple voice stress analysis tool for Pocket PC; it can be used on the road as a lie detector.

Sample Image - ShakyVoice.jpg

Introduction

ShakyVoice (initially codenamed BatteryDrainer Pro) is a simple voice stress analysis tool for your Pocket PC. It measures only one of the many voice parameters that expose the stress, however, this is exactly what most of these cheap phone conversation lie detectors do. I only added a timeline tracking feature that allows to see the values in time, and the program doesn't say what is truth and what is false - this kind of processing you have to do on your own by having some samples of lie and truth files. I think this makes the program more accurate and useful than the silly toys that have no idea how a truth would sound like.

How to use it?

*This program requires the MS .NET compact Framework installed.

The application measures the voice stress directly. You can measure a lie, by comparing the difference between the stress in the truth file and the lie file your have recorded. Formally, it's like this:

LieMeasure = |TruthFileStressReading - LieFileStressReading|

The program doesn't display the LieMeasure, because you would always have to feed it with two files (truth and lie). You can view a single file if you already have an idea about the speaker's normal parameters, so you can see if there is a difference.

You start by recording a truth file and a lie file. The program supports only 8khz PCM mono files, so you should go to Start -> Program -> Settings -> Input and set the Voice Recording Format to 8, 000 Hz,16 Bit, Mono (16 KB/s), and make sure you are not selecting some GSM or Microsoft format, the app works on uncompressed PCM WAVE files only. Otherwise - it will display an error message or wrong data (depending on what you have done wrong). Now, in order to record a truth file, go to Records -> Record truth and use the control in the normal way. Then you can record the lie file by Records -> Record lie. Any file recorded is stored in the memory and can be recalled in the next sessions. After that - you can analyze the files individually by Records -> Analyze truth/lie. After the analysis, the file parameters are displayed on the timeline. You should take short records, no more than 30 seconds, because the analysis is very slow (and drains the batteries quickly). This is one of the reasons why the utility and the precision are so basic. The red line is the stress - higher values mean greater stress. The white line is the total power of the signal and the yellow is the tremor power (actually the square roots of the powers, because of speed considerations). The program computes the tremor power relative to the total speech signal power - this is a good measure of the stress. You should usually look at the maximum values (on the top of the graphs), just get an idea what is the maximum (red) - it's a good start point for your analysis. The time is in milliseconds (on the time bar).

Generally, consider these situations as stress:

a white line not following the yellow closely.
a high red line.

User interface description

In order to distinguish between speech and silence, you should keep track on the white line not to go too low. Low white line means that the current segment may be empty (silence). The stress estimate is based on chucks of about 1/2 of the second overlapped by some hard coded factor.

Why do we lie?

Coming soon... yeah right ;)

Lie theory

People get stressed when telling a lie. The bigger the lie, the greater the stress. They get even more stressed when they know there is a chance to get caught or when they know they lie about something important. The stress is easy to detect. This is any deviation from the normal parameters - like the heart rhythm, retina reactions, eye blinking frequency (this one really works), blood pressure, body temperature, EEG graphs, and voice. The others are more subjective and I don't mention them at all. In this article, I will focus on the voice parameters that indicate stress. It is a very wide topic - there are many methods to detect stressed voice, some of them are very sophisticated. I have implemented the most simple one - a "poor" micro tremor detector. It works like that - it is estimating the modulation of the speech with a sine wave with frequency 8-12 Hz. In other words, it checks if the speech energy is jumping in 1/14 - 1/8 sec intervals. How do we do that? - many ways - however, it looks like the most common one is with spectral analysis of the speech data frames. Spectral analysis is usually done by taking the Fourier transform image of the speech data using a Fast Fourier Transform (FFT) algorithm. It is not easy to implement this algorithm - I have taken mine from Stephan Bernsee. I just had to port some functions from C to C# (keeping it in an unsafe section). If some DSP expert is reading this, he/she is already laughing, because this is not the right way to measure the modulation we are seeking for. Yes, but in practice, if we get the energy of the 6-15 Hz band, we pretty much are coming up with this right modulation energy. This is not very precise of course, I guess it is still useful since some people are actually selling out such devices.

Code Review

The most important part is the analysis code. It first gets a chuck of data of size FrameSize and then performs FFT, computes the total and the tremor energy, and sets up the stress estimate. There are many many details on how I read/record the wave file and how I display the data, and I can't really comment everything. It is all a matter of work, nothing too bright.

   while(notdone)
   {
    this.LieFile.inStream.Position=store_pos;
    int read=this.LieFile.inStream.Read(tmp,0,FrameSize*2);
    if(read!=FrameSize*2)
    {
     notdone=false;
     for(int q=read;q<FrameSize*2;q++) tmp[q]=0;
    }
    unsafe
    {
     fixed(byte *pdata=tmp)
     {
      byte *assignable=pdata;
      for(int q=0;q<2*FrameSize;q+=2)
      {
       short tword=(short)((((int)assignable[1])<<8)|assignable[0]);
       fdata[q]=((float)tword)/((float)(0xffff>>1));
       fdata[q+1]=0;
       assignable+=2;
      }
     }

     fixed(float * fftme=fdata)
     {
      smbFft(fftme,4096,-1);
     }
     float total=0;float tremor=0;
     for(int q=0;q<FrameSize;q++)
     {
      fdata[q]=(float)Math.Sqrt(Math.Pow(fdata[2*q],2)+
                Math.Pow(fdata[2*q+1],2));
      total+=fdata[q];
     }
     total/=FrameSize;
     for(int q=7;q<=15;q++)
     {
      tremor+=fdata[q];
     }
     tremor/=6;
     Data[1,0].Add(tremor);
     Data[1,1].Add(total);
     Data[1,2].Add(tremor/total);
    }
    store_pos+=OverlapFactor;
    progressBar1.Value= 
      (int)(100F*(float)store_pos/(float)this.LieFile.inStream.Length);
   }

What could be done further?

I will mention about some more advanced techniques. A more accurate stress indicator is the shaking pitch (or fundamental frequency) of the speaker. Have you noticed (while lying) that your voice is getting thin and high. Well, that's because your pitch is going higher - the algorithm I am talking about can be up to 100x more sensitive to the pitch changes than your ears & brain. This would be a great boost in accuracy.

Another alternative is to measure the breathing intervals and speech speed - this one is also very accurate.

All of these three methods can be implemented on a Pocket PC device with pretty much no hardware/performance requirements.

Used resources

Maximum thanks to Stephan Bernsee for the FFT routine.
Brenner, M., Branscomb, H., & Schwartz, G.E. (1979). Psychological stress evaluator: Two tests of a vocal measure. Psychophysiology, 16(4), 351-357.
comp.dsp.
Many Internet sites helped me to verify the method.