Number Recognition with MLP Neural Network

mohammad farahi

4.93/5 (6 votes)

May 23, 2017

CPOL

7 min read

16883

466

English Number recognition with Multi Layer Perceptron Neural Network (MLP)

Introduction

This project is about number recognition with multi layer perceptron and there are some new ways to extract features from pictures in this project.

This project get some datas (that are numbers here) and learn with neural network mlp (multi layer perceptron) and after it learned, we test other numbers and our network says what the number is.

Background

neural network
artificial intelligence
matlab

Using the Code

In this project, we have 10 different font of numbers for leaning and we extract 26 feature vector form pictures of numbers.

Feature vectors include:

Density of colors in 4 sides of pictures
The ratio of black to white in 4 sides of pictures
Determine the horizontal and vertical lines
Determine hole in photo
Determine number of up, down, left and right ways in writing a number with chain code
Convert the number to 9segment number
Determine tangent of total angle of number of black pixels in each row and column

These are our feature vectors that we want to learn it to our neural network.

This project includes the Main script and "pref" "Extract Number" "Extract Features" and "chain code" functions that do a special job that we will explain about. We also use the data from "datas" and target from "target" that we extracted from the "input data" function.

Input datas()

This function gets number 1 in an argument to run. So this function gives the address of pictures and sends them to "Extract Features " and saves the features. Finally, collect all features in one matrix and set it to output.

Datas

This data includes all feature vectors that we save from "input data", that we didn’t necessary extract again in each running and it's good for saving the time.

We have two different types of datas in it:

num: that includes feature vectors of 10 different fonts of numbers (0 to 9) and totally numbers are 100
Num_half: that includes half of "num" feature vectors

Target

Our target includes four different types in "binary" type and "one by one" type:

Target100: 10 different fonts divided by "one by one"
target100: 10 different fonts divided by binary type with 4 bit
Target50: 5 different fonts divided by "one by one"
target50: 5 different fonts divided by binary type with 4 bit

Tip: Attention to capital and small alphabet in targets name.

If we want to learn 10 fonts to our neural network, we have to use "Target100" or "target100" for our target argument and if we want to learn 5 (half) fonts to our neural network, we have to use "Target50" or "target50" for our target argument.

Chain Code

This is the function that gets binary pictures and simulates the handwriting in two types "8 ways" or "4 ways". In other words, if we use 4 ways, chain code exports the numbers between 1 to 4, that means:

1 is move up
2 is move right
3 is move down
4 is move left

With this function, we simulate handwriting and we count the number of up, down, left and right ways that we went in drawing the number on paper .

Extract Features

This function gets the address in input argument and exports all features vectors of pictures that we talk about in output argument. In the following, we explain each code.

First of all, in line 2 to 9, we convert the pictures to grayscale and binary, then we reshape and compress it.

I = imread(address);

%%chenge to gray
Igray = rgb2gray(I);
%%chenge to bw
Ibw = im2bw(Igray,graythresh(Igray));
T=reshape(Ibw,10000,1);
Ibw_comp=resizem(Ibw,0.25);

Then, in line 13 to 56, we divide picture to 4 parts, then we find the Density of colors and ratio of black to white in each part. (8 the first feature was created)

k=0;
j=0;
for i=1:2500
    if(T(i,1)== 1)
    k=k+1;%white
else
    j=j+1;%black
    end
end
ch1=j/2500;
w_b1=j/k;
j=0;
k=0;
for i=2500:5000
    if(T(i,1)== 1)
    k=k+1;%white
else
    j=j+1;%black
    end
end
ch2=j/2500;
w_b2=j/k;
j=0;
k=0;
for i=5000:7500
    if(T(i,1)== 1)
    k=k+1;%white
else
    j=j+1;%black
    end
end
ch3=j/2500;
w_b3=j/k;
j=0;
k=0;
for i=7500:10000
    if(T(i,1)== 1)
    k=k+1;%white
else
    j=j+1;%black
    end
end
ch4=j/2500;
w_b4=j/k;

In line 59 to 92, we count the number of black pixels in each column and row, then we find the arc-tangent of the angle. After that, we add all angles and finally, we tangent the angle for finding the line slope. (2 other feature vectors were created).

k=0;
j=0;
    for i=1:25
        for j=1:25
        if (Ibw_comp(i,j)==0)
            k=k+1;
        end
        end
        teta(i,1)=k;
        k=0;
    end
   k=0;
    for i=1:25
        for j=1:25
        if (Ibw_comp(j,i)==0)
            k=k+1;
        end
        end
        teta(i,2)=k;
        k=0;
    end
    for i=1:25
        teta(i,3)=(teta(i,2)/teta(i,1));
        teta(i,4)=atan(teta(i,3))
    end
    for i=1:25
        for j=1:4
            if (isnan(teta(i,j))==1)
                teta(i,j)=0;
            end
        end
    end
    teta(26,4)=sum(teta(:,4));
    teta(27,4)=tan(teta(26,4))

From line 96 to 114, we count the black pixels in each column and if the number was more than our threshold, we understand there was a vertical line in this picture and we say it happened by setting number "1" as feature vector.

Tip: '0' means there is no vertical line and '1' means there is a vertical line in the picture.

n=0;
j=1;
for i=1:100
    for j=1:100
    if(Ibw(j,i)==0)
        n=n+1;
       t(i)=n;
        
    end
    end
    n=0;
end
n=max(t);

    if(n>79)
        ver=1;
    else
        ver=0;
    end

From line 118 to 137 again, we count the black pixels in each row and if the number was more than our threshold , we understand there was a horizontal line in this picture and we say it happened by setting number "1" as feature vector.

Tip: It is the same as the previous tip '0' which means there is no horizontal line and '1' means there is a horizontal line in picture.

For example, number 1 has a vertical line and number 4 has a horizontal line.

(2 other feature vectors were created.)

t=0;
n=0;
j=1;
for i=1:100
    for j=1:100
    if(Ibw(i,j)==0)
        n=n+1;
       t(i)=n;
        
    end
    end
    n=0;
end
n=max(t);

    if(n>40)
        har=1;
    else
        har=0;
    end

From line 140 to 167, we simulate the handwriting by chain code and we export the number of total moving in each way as feature vector. (Two other feature vectors were created).

B = bwboundaries(Ibw,4); % find the boundaries of all objects
 
CC = cell(1, length(B)); % pre-allocate
 
for k = 1:length(B)
   CC{k} = chaincode(B{k},1); % chain code for the k'th object
end
up=0;
down=0;
left=0;
right=0;
for i=1:length(B)
    t=max(size(CC{1,i}.code));
    for j=1:t
    if(CC{1,i}.code(j,1)==0)
        right=right+1;
    else if(CC{1,i}.code(j,1)==2)
        up=up+1;
        else if(CC{1,i}.code(j,1)==4)
        left=left+1;
            else if(CC{1,i}.code(j,1)==6)
        down=down+1;
                end
    end
        end
           end
               end 
end

From line 171 to 292, we change the number to our 9 segment number.

We divide the picture to 9 parts, then in each part if the Density of black pixels are more than the threshold (that here is 150), the segment will be black and set number 1.

Do this in each 9 segments, then we change the number to look like 7 segments numbers and here 9 other feature vectors are created.

alfa=150;
k=0;
for i=1:33
    for j=1:33
        if(Ibw(i,j)==0)
            k=k+1;
        end
    end
end
if(k>alfa)
    seg1=1;
else
    seg1=0;
end
k=0;
%%------------------------
for i=1:33
    for j=33:66
        if(Ibw(i,j)==0)
            k=k+1;
        end
    end
end
if(k>alfa)
    seg2=1;
else
    seg2=0;
end
k=0;
for i=1:33
    for j=66:99
        if(Ibw(i,j)==0)
            k=k+1;
        end
    end
end
if(k>alfa)
    seg3=1;
else
    seg3=0;
end
k=0;
for i=33:66
    for j=1:33
        if(Ibw(i,j)==0)
            k=k+1;
        end
    end
end
if(k>alfa)
    seg4=1;
else
    seg4=0;
end
k=0;
for i=33:66
    for j=33:66
        if(Ibw(i,j)==0)
            k=k+1;
        end
    end
end
if(k>alfa)
    seg5=1;
else
    seg5=0;
end
k=0;
for i=33:66
    for j=66:99
        if(Ibw(i,j)==0)
            k=k+1;
        end
    end
end
if(k>alfa)
    seg6=1;
else
    seg6=0;
end
k=0;
for i=66:99
    for j=1:33
        if(Ibw(i,j)==0)
            k=k+1;
        end
    end
end
if(k>alfa)
    seg7=1;
else
    seg7=0;
end
k=0;
for i=66:99
    for j=33:66
        if(Ibw(i,j)==0)
            k=k+1;
        end
    end
end
if(k>alfa)
    seg8=1;
else
    seg8=0;
end
k=0;
for i=66:99
    for j=66:99
        if(Ibw(i,j)==0)
            k=k+1;
        end
    end
end
if(k>alfa)
    seg9=1;
else
    seg9=0;
end
seg10=[seg1 seg2 seg3
       seg4 seg5 seg6
       seg7 seg8 seg9];

In line 295, we find that our picture has a hole or not and say it by setting number 1 and 0 as feature vector. (1 other feature vector was created).

Tip: 1 means there is a hole in the picture and 0 means there is no hole in the picture. (For example, number 8 has a hole but number 1 does not.)

hole=bweuler(Ibw);

Finally in line 298, we collect all feature vectors in one matrix and export it to the output argument.

chall=[ch1 ch2 ch3 ch4 w_b1 w_b2 w_b3 w_b4 ver har hole up down left right 
       seg1 seg2 seg3 seg4 seg5 seg6 seg7 seg8 seg9 teta(26,4) teta(27,4)];

Main:

This is the main part of our code and we run our project here.

In line 6 and 8, we get the inputs and targets.

% load data
load('datas.mat');
% load targets
load ('targets.mat'); 
input=num;
target=Target100;

As I mentioned, we have 4 types of target and we choose it by "style" parameter.

Tip: If style=0, our target is "one by one" type and if style=1, our target will have to be "binary" type.

type=1;

In the following, we create our neural network with name "newff" and train our datas and targets with this network.

newff=feedforwardnet([10 10],'trainlm');
  newff=train(newff,input,target);

In line 29, we test the number 1. If our network says the number is 1, it works well.

  r= newff(num1);     %<==Number for test

So we test it, but network exports the number between 0 and 1 and we should normalize it by threshold 0.5 till we find our answer.

Lines 31 till 39 show how we normalize the exported number if our type was "one by one" type (the maximum similarity is answer).

if(type==1)  
max=max (r(:,1));
 for i=1:10
    if(max==(r(i,1))) 
        disp('your number is:')
        disp(i)
    end
 end
 end

%%check the answer:
% i=10 =>number is 0 % i=5 =>number is 5
% i=1 =>number is 1 % i=6 =>number is 6
% i=2 =>number is 2 % i=7 =>number is 7
% i=3 =>number is 3 % i=8 =>number is 8
% i=4 =>number is 4 % i=9 =>number is 9

Lines 45 till 56 shows how we normalize the exported number if our type was "binary" type (if number is more than 0.5(threshold), change it to 1, else change it to number 0). After that, we send the number to "findNumber" function and it says what is the number.

if(type==0)
    for i=1:4
        if(r(i,1)>0.5)
            r(i,1)=1;
        else
            r(i,1)=0;
        end
    end
    
    FindNumber(r)
 end

In lines from 60 to 111, we show the performance for one font (10 numbers from 0 to 9).

disp('performance by test num from 0 to 9 (10 number)')
per=0;
r= newff(num0);
r=ExtractNumber(1,r);
if(r==10)
    per=per+1;
end
r= newff(num1);
r=ExtractNumber(1,r);
if(r==1)
    per=per+1;
end
r= newff(num2);
r=ExtractNumber(1,r);
if(r==2)
    per=per+1;
end
r= newff(num3);
r=ExtractNumber(1,r);
if(r==3)
    per=per+1;
end
r= newff(num4);
r=ExtractNumber(1,r);
if(r==4)
    per=per+1;
end
r= newff(num5);
r=ExtractNumber(1,r);
if(r==5)
    per=per+1;
end
r= newff(num6);
r=ExtractNumber(1,r);
if(r==6)
    per=per+1;
end
r= newff(num7);
r=ExtractNumber(1,r);
if(r==7)
    per=per+1;
end
r= newff(num8);
r=ExtractNumber(1,r);
if(r==8)
    per=per+1;
end
r= newff(num9);
r=ExtractNumber(1,r);
if(r==9)
    per=per+1;
end

For finding the performance of our code, we should test all of our datas so we create a function "ExtractNumber" for using it for all datas and we create function "pref" to find the performance

Pref tests all input datas and counts the true and wrong answers and tells the performance of our code.

You can test your own number in this project. Just write the address of your picture of the number in line 29 in the main script, then you can see the answer of the network

In our test, we get the performance more than 70% and if we increase fonts for learning to 200 fonts or more, we will get better performance.

There is a document in the attached source code.

History

23^rd May, 2017: Initial version