Learn how to use the WinRT OCR API in a WPF application to extract text from an image.
Introduction
Optical Character Recognition (OCR) is one of the Windows Runtime features that is currently accessible to WPF and WinForms applications. This is made possible thanks to the Windows 10 WinRT API Pack which provides quick and easy access to a number of WinRT APIs, including the OCR API. This article will take a look at how to go about using the OCR API in a WPF application.
Background
The sample project for this article is a .NET Core WPF application which contains a button for launching an open file dialog, used to select an image; a combo box for selecting the language to use for text extraction; a button for executing text extraction; and a button for copying the extracted text onto the clipboard. The language codes listed in the combo box represent languages installed on a device.
Sample application
WinRT OCR
The WinRT OCR API is a highly optimized optical character recognition system that currently supports 26 languages and works without requiring an internet connection. The API can extract text from a wide variety of images; from scanned documents to photos with text in natural scene images.
Natural scene image text extraction
To use the API in a WPF application, you have to reference the Microsoft.Windows.SDK.Contracts NuGet package and install languages which you intend to use for text extraction. If you attempt to do an extraction using a language that isn't installed on a device or isn't supported by the API, the extraction process will fail.
Some languages installed on a machine
Extracting Text
With the Microsoft.Windows.SDK.Contracts
package installed, using the OCR API is a very simple affair. In the sample project, the ExtractText()
method in the OcrService
class calls the RecognizeAsync()
method, of the API's OcrEngine
class, to extract text from a specified image using a specific language code.
public async Task<string> ExtractText(string image, string languageCode)
{
...
if (!GlobalizationPreferences.Languages.Contains(languageCode))
throw new ArgumentOutOfRangeException($"{languageCode} is not installed.");
StringBuilder text = new StringBuilder();
await using (var fileStream = File.OpenRead(image))
{
var bmpDecoder =
await BitmapDecoder.CreateAsync(fileStream.AsRandomAccessStream());
var softwareBmp = await bmpDecoder.GetSoftwareBitmapAsync();
var ocrEngine = OcrEngine.TryCreateFromLanguage(new Language(languageCode));
var ocrResult = await ocrEngine.RecognizeAsync(softwareBmp);
foreach (var line in ocrResult.Lines) text.AppendLine(line.Text);
}
return text.ToString();
}
In ExtractText
, an ArgumentOutOfRangeException
is thrown if the specified language code doesn't represent any language installed on a device. To get the resultant text to closely match the layout of the text in the image, I'm getting each line of extracted text and adding it to a StringBuilder
before returning the overall text.
Text can also be extracted from an image by using a device's first preferred language. This is done by calling the OcrEngine
's TryCreateFromUserProfileLanguages()
method.
public async Task<string> ExtractText(string image)
{
...
StringBuilder text = new StringBuilder();
await using (var fileStream = File.OpenRead(image))
{
var bmpDecoder =
await BitmapDecoder.CreateAsync(fileStream.AsRandomAccessStream());
var softwareBmp = await bmpDecoder.GetSoftwareBitmapAsync();
var ocrEngine = OcrEngine.TryCreateFromUserProfileLanguages();
var ocrResult = await ocrEngine.RecognizeAsync(softwareBmp);
foreach (var line in ocrResult.Lines) text.AppendLine(line.Text);
}
return text.ToString();
}
Using the code above, if a device's first preferred language is simplified Chinese, and the text in the image is also simplified Chinese, then the text extraction will be done successfully. The sample project's MainWindowViewModel
uses the ExtractText()
method that requires a language code to be passed as a parameter.
public class MainWindowViewModel : ViewModelBase
{
private readonly IDialogService dialogService;
private readonly IOcrService ocrService;
public MainWindowViewModel(IDialogService dialogSvc, IOcrService ocrSvc)
{
dialogService = dialogSvc;
ocrService = ocrSvc;
}
public List<string> InstalledLanguages => GlobalizationPreferences.Languages.ToList();
private string _imageLanguageCode;
public string ImageLanguageCode
{
get => _imageLanguageCode;
set
{
_imageLanguageCode = value;
OnPropertyChanged();
}
}
private string _selectedImage;
public string SelectedImage
{
get => _selectedImage;
set
{
_selectedImage = value;
OnPropertyChanged();
}
}
private string _extractedText;
public string ExtractedText
{
get => _extractedText;
set
{
_extractedText = value;
OnPropertyChanged();
}
}
#region Select Image Command
private RelayCommand _selectImageCommand;
public RelayCommand SelectImageCommand =>
_selectImageCommand ??= new RelayCommand(_ => SelectImage());
private void SelectImage()
{
string image = dialogService.OpenFile("Select Image",
"Image (*.jpg; *.jpeg; *.png; *.bmp)|*.jpg; *.jpeg; *.png; *.bmp");
if (string.IsNullOrWhiteSpace(image)) return;
SelectedImage = image;
ExtractedText = string.Empty;
}
#endregion
#region Extract Text Command
private RelayCommandAsync _extractTextCommand;
public RelayCommandAsync ExtractTextCommand =>
_extractTextCommand ??= new RelayCommandAsync(ExtractText, _ => CanExtractText());
private async Task ExtractText()
{
ExtractedText = await ocrService.ExtractText(SelectedImage, ImageLanguageCode);
}
private bool CanExtractText() => !string.IsNullOrWhiteSpace(ImageLanguageCode) &&
!string.IsNullOrWhiteSpace(SelectedImage);
#endregion
#region Copy Text to Clipboard Command
private RelayCommand _copyTextToClipboardCommand;
public RelayCommand CopyTextToClipboardCommand => _copyTextToClipboardCommand ??=
new RelayCommand(_ => CopyTextToClipboard(), _ => CanCopyTextToClipboard());
private void CopyTextToClipboard() => Clipboard.SetData(DataFormats.Text, _extractedText);
private bool CanCopyTextToClipboard() => !string.IsNullOrWhiteSpace(_extractedText);
#endregion
}
Conclusion
As you can see, using the WinRT OCR API is quite a simple affair and it also works quite well in most cases. In comparison to Tesseract, I think it's a far better option, especially if you intend on extracting text from natural scene images.
History
- 17th August, 2020: Initial post