Image to Text with Azure Computer Vision

Kenji Elzerman

5.00/5 (2 votes)

Nov 13, 2022

CPOL

12 min read

10606

205

A simple tutorial with some code of how to read text from an image with Azure Computer Vision

Download source code - 5.8 MB

Introduction

Not getting a receipt is something no computer app can help you with. But the first, the one with receipts, is something we can automate. Some techniques can transform an image into text. This is done with OCR, which stands for Optical Character Recognition. It can extract text from handwritten letters, images, street signs, receipts (!), product labels, and much more.

Microsoft Azure has Computer Vision, which is a resource and technique dedicated to what we want: Read the text from a receipt. Computer Vision can recognize a lot of languages. But I will stick to English for now.

I also tried another very popular OCR: Aspose.OCR. Although the internet shows way more tutorials for this package, it didn’t do what I want. Not all text was found and replaced with strange chars. “For better performance, you should buy it.” some say. I have Azure credits and it showed way better results when I am using free services.

Summary: Let’s create a small console app that can read a picture of a receipt and get the total amount. We’ll be using C#, Visual Studio, and Azure Computer Vision.

Creating a Computer Vision Resource

Before we can use the OCR of Computer Vision, we need to set it up in Azure Cloud. It’s just a service like any other resource. Via the portal, it’s very easy to create a new Computer Vision service. After you are logged in, you can search for Computer Vision and select it.

Computer Vision Search - Azure Computer Vision - Read a receipt - Kens Learning Curve

Press the Create button at the left top corner to create a new Computer Vision service.

Basics

The first page looks like many other Azure services, and so are the options. The first part is about your subscription (should be filled in already), a resource group, region, name, and a pricing tier.

Everything, except for the pricing tier, should be easy to understand. So let’s dive into the pricing tier a little bit.
The first option is Free F0. But there is a catch. You can only send 5000 free calls to this Computer Vision per month, 20 calls per minute. That doesn’t seem much, but you will reach it pretty quickly, especially when testing. This free tier is also for development and testing only, not for production. You can also use the free only once.

The second option is Standard S1. Although it says 10 calls per second, this tier has more ‘rules’ to this tier. You pay a fee for several transactions. For example, 0 t- 1M transactions cost $1 per 1,000 transactions, and so on.

For now, let’s stick to the free one. Sadly, I already used the free one in another project, so I can’t use it again. So don’t be alarmed if you see Standard S1 in my screenshots.

Network

The network tab gives you the option of how the new service can be accessed. There are three options:

All networks, including the internet, can access this resource.
This basically means it’s open for everyone that uses the internet (humans and computers). Not really a safe option if you want to handle important data.
For a simple receipt reader, this option would be the best.
Selected networks, and configure network security for your cognitive service resource.
For this option, you need to create and configure your own Azure Networks, which is a whole different world. I am not going to cover that at this moment.
If you do have your own networks, this option would let you select the network that has access to this resource. A nice way of handling important data.
Disabled, no networks can access this resource. You could configure private endpoint connections that will be the exclusive way to access this resource.
With this option, you can create your own endpoints, just like with an API. The downside of this is: It’s a lot of configuration than it is with an API.

I selected the first option.

Identity

With this tab, we can allow this new resource to grant access to other resources. This is a bit weird tab and an unlogical explanation – from Microsoft Azure – I must admit I have never used this one. I just keep the status off and move on.

If you do want to use this identity, set it to on and add your identities. Make sure to create them before you assign them… You can assign something that isn’t there, right?

Review & Create

Let’s click Create if all went well. This could take a while. You are not done yet! After the resource is created and up and running, we need to extract some information from it. That information allows a C# application to connect to the resource.

Endpoint and Key

To let our C# application communicate with the Computer Vision resource, we need to tell it what the endpoint of the resource is and authenticate with a key. You can find this via the overview of the just-made Computer Vision Resource.

It isn’t really hard to find the endpoint, it’s just there. On the right, in the middle. Copy it and save it for later.

The key, or keys, isn’t hard to find either. Just under the endpoint is “Manage Keys:” with a link behind it. Click this and you will be presented with a new screen, which also shows the endpoint… Again.

Simply click on the “Show Keys” button and the keys will be shown. You only need one key, so copy one and save it with the endpoint. You will be needing it later.

The endpoint is the location of the resource on the internet. The key is authentication. This way, not everyone can call the endpoint and make use of your resource(s).

That’s it! The Computer Vision resource is now created and running. We gathered the required information. Let’s move on to some C# programming.

The Console Application

To let a console app read the receipt, we need a receipt. You can use your own. Just take a picture of a receipt you have and save it on your drive. If you don’t have a receipt, you can use mine:

As you can see the receipt isn’t really nice. A bit unclear, with some wrinkles and smudged ink. Just the way I like it! Because this could be a challenge. Just remember: There is no perfect picture of a receipt or other piece of paper. I also took the picture with some margin, so you see some shadow too.

Let’s create the project. I create a simple .NET 6 Console app, nothing fancy. The file to the image should be in the command line arguments and not hard coded. This way I can test multiple images. I create a new Console App with C# and .NET 6. I am calling the project ReceiptReaderDemo.

The first thing I do is remove the scaffolded code from the program.cs. I will come back to this file later. Next, I create a new class and call this ReceiptReader.cs. This will be the class that handles everything to load the image from my hard disk, send it to Azure, and returns the text from Azure.

A Class for Reading

First, I declare two new static variables on top of my class. These two will hold the subscription key and the endpoint I got from Azure when I created the Computer Vision resource.

readonly string subscriptionKey = "e43cc17b60774baebc42fec1baac22d7";
readonly string endpoint = "https://receiptreadercv.cognitiveservices.azure.com/";

Personally, I would place this kind of information in a configuration file, like an appsettings. But I don’t have that here and it’s not in the scope of this article.

Before I can do anything with Azure Computer Vision, I need to install a package. This package contains everything I need. Let’s install this package.

Install-Package Microsoft.Azure.CognitiveServices.Vision.ComputerVision

After the installation, we can authenticate with the key and endpoint and we can let Azure convert the image to text. I create a new method in the ReceiptReader class: Authenticate. This method will send the subscription key to Azure to authenticate the application. If this succeeds, you will have a client that can send the image to Azure Computer Vision to be processed. The method looks like this:

private ComputerVisionClient Authenticate()
{
    ComputerVisionClient client = 
            new(new ApiKeyServiceClientCredentials(subscriptionKey))
    {
        Endpoint = endpoint
    };
    return client;
}

First, I initialize a new ComputerVisionClient, which is the communication between my Computer Vision resource and my application. The constructor needs some sort of authentication, so I give it a new initialized ApiKeyServiceCredentials, which authenticates my subscription key. I also set the endpoint to the ComputerVisionClient, so it knows where my Computer Vision API is living.

Now we can authenticate, and we can process a file. I create another async private method and call it ProcessFile. This is a pretty big method, so I will show it to you first.

private async Task<string> ProcessFile(ComputerVisionClient client, string pathToFile)
{
    FileStream stream = File.OpenRead(pathToFile);
    ReadInStreamHeaders textHeaders = await client.ReadInStreamAsync(stream);

    Thread.Sleep(2000);

    string operationLocation = textHeaders.OperationLocation;
    string operationId = operationLocation[^36..];

    ReadOperationResult results;

    do
    {
        results = await client.GetReadResultAsync(Guid.Parse(operationId));
    }
    while ((results.Status == OperationStatusCodes.Running || 
            results.Status == OperationStatusCodes.NotStarted));

    IList<ReadResult> textUrlFileResults = results.AnalyzeResult.ReadResults;

    StringBuilder sb = new();
    foreach (ReadResult page in textUrlFileResults)
    {
        foreach (Line line in page.Lines)
        {
            sb.AppendLine(line.Text);
        }
    }
    return string.Join(Environment.NewLine, sb);
}

Looks like a lot of code, right? I’ll walk you through it. Ready? Here…We… Go!

Line 3 opens the image into a FileStream. Line 4 sends that stream to my Computer Vision resource (the endpoint, remember?). This could take a few moments.

Line 6 makes the application wait for 2 seconds because my resource needs some time to handle the image.

I have only sent the image to be processed, I don’t get anything back… Well, I get an OperationLocation. This is basically a URI to my API. This URI contains a GUID, which is the identifier of my processed image. I need to get that GUID and use it to get the results. Lines 8 and 9 retrieve this GUID, which is 36 characters long.

On line 11, I declare a ReadOprationResult variable. This will hold the result from my Computer Vision resource; the actual text.

Because Azure could still be processing my image, I created a do-while loop. It should repeat the loop as long as the status of my ReadOperationResult is running or if it has not been started. Line 13 starts the do. Line 15 gets the result of my process from Azure via the client, using the identifier of the operation, gotten from line 9.

If all goes well, on my machine it will, the results.AnalyzeResult will contain ReadResults. These are the pages of text that the Computer Vision resource has found. On line 19, I place those pages in a list of ReadResult.

I want to iterate through the pages and get the lines. The following can be done with lesser code, but it makes it less understandable.
A StringBuilder on line 21 will hold all the lines of text. Line 22 starts the foreach-loop that will iterate through the pages. Line 24 initiates a loop through the lines of the page. I add the lines of text to the StringBuilder on line 26.

When all is done, I join the lines of the StringBuilder with a new line and return it.

Alright, all we now have to do is create a public method that can be called from the Console App. I create a new async public method called Read. It will receive the parameter pathToFile, which is the path to the image we want to read. This method returns a string. It will get the client from the Authenticate method and then retrieve the text from the ProcessFile method.

public async Task<string> Read(string pathToFile)
{
    ComputerVisionClient client = Authenticate();

    return await ProcessFile(client, pathToFile);
}

The ReceiptReader class is now done and ready to do some work. Let’s move back to the program.cs.

Filling the program.cs

This does not contain a lot of code. Just a check if the file exists, initialize the ReceiptReader class, retrieve the text from that class, and show it on the screen.

using ReceiptReaderDemo;

string path = Environment.GetCommandLineArgs()[1];

if (!File.Exists(path))
    throw new FileNotFoundException($"File not found at {path}");

Console.WriteLine("Reading receipt...");
ReceiptReader receiptReader = new();
string content = await receiptReader.Read(path);
Console.WriteLine("\tSucceeded");

Console.WriteLine(content);

I don’t think this needs much explanation, right?

Oh, one thing though! If you are running this app from Visual Studio, you can’t enter the path to the file… Or can you?

Command Line Arguments

There is a way to add command line arguments to a console app in Visual Studio. If you go to the properties of the console app (right-click Project -> Properties), expand Debug, and click General. Next, click on the “Open debug launch profiles UI”. This gives you a new window.

The textbox under “Command line arguments” is where you can add the path to the image you want to read. Make sure you surround it with quotes.

This only works when you run the application in Visual Studio. When you release your console app or run it from the command prompt, you need to enter the path yourself.

The Result

We now combined the Azure Computer Vision and some C# code. All this code and explanation sounds cool and all, but does it really work? Of course! If not, make sure you got it all right (I know you copied the code and just pasted it into your own project if you followed this article). If you do find a problem, let me know in the comments and I will fix it.

But if all goes well, the result should look something like this:

Conclusion

There are so many ways of converting images to text. I happen to know C# and Azure. Combine those two and you get a great tool to get a text from an image. I am sure other languages work just as well, or maybe even better.

Staying with C#. I did try other techniques too, but most of them didn’t come as close as the Computer Vision resource of Azure Cloud. I am not an Azure Cloud fan, but this works very smoothly.

All you need is a subscription, a Computer Vision resource, and some code.

In the introduction, I said I want to make an application that can remove some of the manual 'labor' when adding costs to a budget app. This article doesn't show it all, because there is still a lot of work to be done. But this is the first step. Next would be to identify the areas that are needed to save. Think about the amount, date, and a description. This article will follow soon.

History

13^th November, 2022: Initial version