AI Inference Software Fundamentals: Getting Started with Optical Character Recognition

Raymond_Lo

5.00/5 (1 vote)

Nov 1, 2022

CPOL

5 min read

7950

In this post, I will show you how you can get started with OCR using the machine learning platform TensorFlow and the Intel® Distribution of OpenVINO™ Toolkit.

AI/ML is opening a world of possibilities for developers to do new and exciting things with their applications. But if you really want to become a true AI developer, first you must understand the basics. A great foundational steppingstone is getting familiar with Optical Character Recognition (OCR). OCR might be a basic machine learning application (it’s been around since 1965!), but it’s a significant one for a few reasons:

It’s often the first machine learning problem you encounter in school due to its simplicity.
With deep learning Convolutional Neural Network (CNN), we can now achieve very high accuracy (with a low error rate ~ 0.17%).
It runs efficiently on modern hardware like laptop CPUs with OpenVINO™.

OCR applications enable users to extract, convert, and repurpose data from documents and images — eliminating error-prone and time-consuming manual data entry. Beyond the research, this application has found its home in many industrial use cases today, from digitizing books and bank transactions to warehouse inventories.

Take the mail system, for example. It takes a lot of effort to sort through the never-ending influx of letters and packages. Without OCR, deliveries may get delayed or lost. But with OCR capabilities, the mail sorting process can be automated — resulting in more packages and letters being delivered on time. And to my previous point, believe it not, OCR has been around and implemented by the USPS since 1965 — check out this video from the National Archives and records Administration to learn more.

Because of its versatility, OCR is a great learning tool for developers. In this post, I will show you how you can get started with OCR using the machine learning platform TensorFlow and the Intel® Distribution of OpenVINO™ Toolkit.

For this demo, we will run through a simple program that can recognize handwritten digits in the MNIST dataset and run it optimally on widely available hardware like your CPU. (For a quick introduction to the theory of AI-based digit recognition, I highly recommend watching this tutorial from Grant Sanderson).

Learning the Basics

You can find the full source code to today’s demo in a Kaggle notebook where it is formatted as a series of very short, numbered blocks.

For the sake of brevity, this post will walk through only the most significant snippets of the notebook’s code. But, of course, you can study the full notebook at your leisure by the block number and learn how we trained a neural network from scratch to achieve a level of accuracy not possible a decade ago.

In blocks 1 to 3, the notebook sets the Python environment for TensorFlow. In blocks 4 to 14, the notebook loads the database MNIST, which is what we will use to create a model that can recognize handwritten digits and train our neural networks. Then the new and exciting part Intel offers today is how these models can be optimized on Intel hardware to run more efficiently and quickly.

The OpenVINO Runtime (Core) is loaded in Block 15 with this command:

from openvino.runtime import Core

Since OpenVINO works on models in its own "Intermediate Representation" (IR) format, it is also necessary to declare where the model created with TensorFlow is, and the name and data type (FP16, that is floating point, 16-bit numbers) of the IR version:

model_name = "mnist"
model_path = Path(model_name)
ir_data_type = "FP16"
ir_model_name = "mnist_ir"

OpenVINO can also use a double-precision model (FP32, that is with 32 instead of just 16 bits per number), but as you will see in a moment, with FP16 models that run faster and consume less memory even on standard CPUs, you still get very good results.

The middle part of block 15 assembles and then runs the Model Optimizer command that generates the IR model:

mo_command = f"""mo                 --saved_model_dir "{model_name}"
                 --input_shape "[28,28]
                 --data_type "{ir_data_type}
                 --output_dir "{model_path.parent}
                 --model_name "{ir_model_name}"
                 """mo_command = " ".join(mo_command.split())# Run the Model Optimizer (overwrites the older model)print("Exporting TensorFlow model to IR... This may take a few minutes.")mo_result = %sx $mo_commandprint("\n".join(mo_result))

Next, block 16 loads the topology (model_xml) and weights (model_bin) of the IR model into the OpenVINO inference engine:

# Load network to the plugin
ie = Core()
model = ie.read_model(model=model_xml)
compiled_model = ie.compile_model(model=model, device_name="CPU")
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)

And then gives it some digit images from MNIST to recognize:

#test against a few images from the dataset
input_list = x_test[:10]
for input_image in input_list:
res = compiled_model([input_image])[output_layer]
X = input_image
X = X.reshape([28, 28]);
plt.figure()
plt.gray()
plt.imshow(X)
plt.text(0,-1, "The prediction is "+str(np.argmax(res[0]))+" @ "+str(max(res[0])*100)+"%")

As you can see in the table above, all 10 digits are identified correctly, with a certainty higher than 99.99% in many cases, but always around 99%! If you were to show this accuracy to someone in the 90s, you would basically have been a wizard creating the impossible.

Today, machine learning is opening doors to many applications that were once imaginable only in science fiction. If you look at how many of these difficult problems are finally solved and have moved mankind forward, perhaps it is time for you to be part of this revolution in AI computing — starting with the basic OCR 101. 😃

It’s always good to start from scratch, understand the fundamentals, and know how things work instead of just pushing magic buttons without knowing why, right?

What’s Next?

Now you’ve seen for yourself just how easy it is to write high-performing OCR code with OpenVINO.

To learn more about why and how OCR really is a foundational AI concept, do check out this video by the Intel® AI Dev Team. You can even start to practice with more advanced, but equally easy applications of OCR with our two notebooks about text recognition in photographs and how to recognize text with a webcam, even when it is moving!

As soon as you are ready to take your AI skill even further, visit the Intel AI Dev Team Adventures as we take these foundational concepts and solve real-world problems.

Resources

Notices & Disclaimers

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.