Click here to Skip to main content
15,867,453 members
Articles / Multimedia / Audio
Tip/Trick

Top Tips for Developing a Recordist Function

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
7 May 2022CPOL2 min read 3.4K   1   1
Introduction to and development journey of a real-time transcription function
The article illustrates a speech-to-text function and how it is integrated into an app.

Introduction

Efficient records management is more relevant now than ever. In our digital age, huge growth of information — audio, video, and more — must be handled in a limited time. This makes a real-time transcription function essential, because it is useful in many scenarios.

In audio or video conferencing, this function records meeting minutes that I can refer to later, which is more convenient than writing them all by myself. I've seen my kids struggling to take notes during their online courses, so I know this process can be so much easier with the help of the transcription function. In short, it removed the job of writing down everything the teacher says, allowing the kids to focus on the lecture itself and easily review the content again later. Also, the live captions provide viewers with real-time subtitles, for a better watching experience.

As a coder, I am a believer in "Actions speak louder than words". That's why I developed a real-time transcription function, with the help of a real-time transcription capability from ML Kit, like this.

Demo

Image 1

This function transcribes up to 5 hours of speech into Chinese, English (or both), and French languages in real time. In addition, the output text is punctuated and contains timestamps.

This function has some requirements: the support for French is dependent on the mobile phone model, whereas Chinese and English are available on all mobile phone models. Also, the function requires Internet connection.

Okay, let's move on to the point of this article: How I developed this real-time transcription function.

Development Procedure

  1. Make necessary preparations.
  2. Create and then configure a speech recognizer.
    Java
    MLSpeechRealTimeTranscriptionConfig config = 
              new MLSpeechRealTimeTranscriptionConfig.Factory()
    
        // Set the language, which can be Chinese, English, 
        // both Chinese and English, or French.
        .setLanguage(MLSpeechRealTimeTranscriptionConstants.LAN_ZH_CN)
    
        // Punctuate the text recognized from the speech.
        .enablePunctuation(true)
    
        // Set the sentence offset.
        .enableSentenceTimeOffset(true)
    
        // Set the word offset.
        .enableWordTimeOffset(true)
    
        .create();
    
    MLSpeechRealTimeTranscription mSpeechRecognizer = 
                   MLSpeechRealTimeTranscription.getInstance();
  3. Create a callback for the speech recognition result listener.
    Java
    // Use the callback to implement the MLSpeechRealTimeTranscriptionListener API 
    // and methods in the API.
    
    Protected class SpeechRecognitionListener 
              implements MLSpeechRealTimeTranscriptionListener{
        @Override
        public void onStartListening() {
            // The recorder starts to receive speech.
        }
    
        @Override
        public void onStartingOfSpeech() {
            // The speech recognizer detects the user speaking.
        }
    
        @Override
        public void onVoiceDataReceived(byte[] data, float energy, Bundle bundle) {
            // Return the original PCM stream and audio power to the user. 
            // The API does not run in the main thread, and the return result 
            // is processed in a sub-thread.
       }
    
        @Override
        public void onRecognizingResults(Bundle partialResults) {
            // Receive recognized text from MLSpeechRealTimeTranscription.
        }
    
        @Override
        public void onError(int error, String errorMessage) {
            // Callback when an error occurs during recognition.
        }
    
        @Override
        public void onState(int state,Bundle params) {
            // Notify the app of the recognizer status change.
        }
    }
  4. Bind the speech recognizer.
    Java
    mSpeechRecognizer.setRealTimeTranscriptionListener(new SpeechRecognitionListener());
  5. Call startRecognizing to begin speech recognition.
    Java
    mSpeechRecognizer.startRecognizing(config);
  6. Stop recognition and release resources occupied by the recognizer when the recognition is complete.
    Java
    if (mSpeechRecognizer!= null) {
        mSpeechRecognizer.destroy();
    }

History

  • 6th May, 2022: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
China China
This member doesn't quite have enough reputation to be able to display their biography and homepage.

Comments and Discussions

 
QuestionSystem idel time Pin
RaviKiran R 202218-May-22 7:00
RaviKiran R 202218-May-22 7:00 
QuestionMessage Closed Pin
8-May-22 23:51
Bur Bgremover8-May-22 23:51 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.