(untagged)

OCR in WPF using the WinRT OCR API

Meshack Musundi

0.00/5 (No votes)

17 Aug 2020

Optical character recognition in WPF using Window Runtime OCR

Learn how to use the WinRT OCR API in a WPF application to extract text from an image.

Download repository

Introduction

Optical Character Recognition (OCR) is one of the Windows Runtime features that is currently accessible to WPF and WinForms applications. This is made possible thanks to the Windows 10 WinRT API Pack which provides quick and easy access to a number of WinRT APIs, including the OCR API. This article will take a look at how to go about using the OCR API in a WPF application.

Background

The sample project for this article is a .NET Core WPF application which contains a button for launching an open file dialog, used to select an image; a combo box for selecting the language to use for text extraction; a button for executing text extraction; and a button for copying the extracted text onto the clipboard. The language codes listed in the combo box represent languages installed on a device.

Sample application

WinRT OCR

The WinRT OCR API is a highly optimized optical character recognition system that currently supports 26 languages and works without requiring an internet connection. The API can extract text from a wide variety of images; from scanned documents to photos with text in natural scene images.

Natural scene image text extraction

To use the API in a WPF application, you have to reference the Microsoft.Windows.SDK.Contracts NuGet package and install languages which you intend to use for text extraction. If you attempt to do an extraction using a language that isn't installed on a device or isn't supported by the API, the extraction process will fail.

Some languages installed on a machine

Extracting Text

With the Microsoft.Windows.SDK.Contracts package installed, using the OCR API is a very simple affair. In the sample project, the ExtractText() method in the OcrService class calls the RecognizeAsync() method, of the API's OcrEngine class, to extract text from a specified image using a specific language code.

      public async Task<string> ExtractText(string image, string languageCode)
      {
          ... 

          if (!GlobalizationPreferences.Languages.Contains(languageCode))
              throw new ArgumentOutOfRangeException($"{languageCode} is not installed.");
      
          StringBuilder text = new StringBuilder();
      
          await using (var fileStream = File.OpenRead(image))
          {
              var bmpDecoder = 
                  await BitmapDecoder.CreateAsync(fileStream.AsRandomAccessStream());
              var softwareBmp = await bmpDecoder.GetSoftwareBitmapAsync();
      
              var ocrEngine = OcrEngine.TryCreateFromLanguage(new Language(languageCode));
              var ocrResult = await ocrEngine.RecognizeAsync(softwareBmp);
      
              foreach (var line in ocrResult.Lines) text.AppendLine(line.Text);
          }
      
          return text.ToString();
      }

In ExtractText, an ArgumentOutOfRangeException is thrown if the specified language code doesn't represent any language installed on a device. To get the resultant text to closely match the layout of the text in the image, I'm getting each line of extracted text and adding it to a StringBuilder before returning the overall text.

Text can also be extracted from an image by using a device's first preferred language. This is done by calling the OcrEngine's TryCreateFromUserProfileLanguages() method.

public async Task<string> ExtractText(string image)
{
    ...

    StringBuilder text = new StringBuilder();

    await using (var fileStream = File.OpenRead(image))
    {
        var bmpDecoder =
            await BitmapDecoder.CreateAsync(fileStream.AsRandomAccessStream());
        var softwareBmp = await bmpDecoder.GetSoftwareBitmapAsync();

        var ocrEngine = OcrEngine.TryCreateFromUserProfileLanguages();
        var ocrResult = await ocrEngine.RecognizeAsync(softwareBmp);

        foreach (var line in ocrResult.Lines) text.AppendLine(line.Text);
    }

    return text.ToString();
}

Using the code above, if a device's first preferred language is simplified Chinese, and the text in the image is also simplified Chinese, then the text extraction will be done successfully. The sample project's MainWindowViewModel uses the ExtractText() method that requires a language code to be passed as a parameter.

public class MainWindowViewModel : ViewModelBase
{
    private readonly IDialogService dialogService;
    private readonly IOcrService ocrService;

    public MainWindowViewModel(IDialogService dialogSvc, IOcrService ocrSvc)
    {
        dialogService = dialogSvc;
        ocrService = ocrSvc;
    }

    // Language codes of installed languages.
    public List<string> InstalledLanguages => GlobalizationPreferences.Languages.ToList();

    private string _imageLanguageCode;
    public string ImageLanguageCode
    {
        get => _imageLanguageCode;
        set
        {
            _imageLanguageCode = value;
            OnPropertyChanged();
        }
    }

    private string _selectedImage;
    public string SelectedImage
    {
        get => _selectedImage;
        set
        {
            _selectedImage = value;
            OnPropertyChanged();
        }
    }

    private string _extractedText;
    public string ExtractedText
    {
        get => _extractedText;
        set
        {
            _extractedText = value;
            OnPropertyChanged();
        }
    }

    #region Select Image Command

    private RelayCommand _selectImageCommand;
    public RelayCommand SelectImageCommand =>
        _selectImageCommand ??= new RelayCommand(_ => SelectImage());

    private void SelectImage()
    {
        string image = dialogService.OpenFile("Select Image",
            "Image (*.jpg; *.jpeg; *.png; *.bmp)|*.jpg; *.jpeg; *.png; *.bmp");

        if (string.IsNullOrWhiteSpace(image)) return;

        SelectedImage = image;
        ExtractedText = string.Empty;
    }

    #endregion

    #region Extract Text Command

    private RelayCommandAsync _extractTextCommand;
    public RelayCommandAsync ExtractTextCommand =>
        _extractTextCommand ??= new RelayCommandAsync(ExtractText, _ => CanExtractText());

    private async Task ExtractText()
    {
        ExtractedText = await ocrService.ExtractText(SelectedImage, ImageLanguageCode);
    }

    private bool CanExtractText() => !string.IsNullOrWhiteSpace(ImageLanguageCode) &&
                                     !string.IsNullOrWhiteSpace(SelectedImage);

    #endregion

    #region Copy Text to Clipboard Command

    private RelayCommand _copyTextToClipboardCommand;
    public RelayCommand CopyTextToClipboardCommand => _copyTextToClipboardCommand ??=
        new RelayCommand(_ => CopyTextToClipboard(), _ => CanCopyTextToClipboard());

    private void CopyTextToClipboard() => Clipboard.SetData(DataFormats.Text, _extractedText);

    private bool CanCopyTextToClipboard() => !string.IsNullOrWhiteSpace(_extractedText);

    #endregion
}

Conclusion

As you can see, using the WinRT OCR API is quite a simple affair and it also works quite well in most cases. In comparison to Tesseract, I think it's a far better option, especially if you intend on extracting text from natural scene images.

History

17^th August, 2020: Initial post

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here