Click here to Skip to main content
15,868,016 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Input image

I need to extract CR No.from the sample image above. Using Easyocr, I got the output in complex nested list form. How to update the code to filter out all the detected text/numbers and get only CR No. I tried to fetch the text "CR No" first, then its actual value, but it won't work all the time. I am running out of ideas, and help will be appreciated.

The expected output should be only- 211022203481161


What I have tried:

<pre lang="Python">import os
import easyocr
import cv2
from matplotlib import pyplot as plt
import numpy as np


IMAGE_PATH = '----/input7.jpg'
reader = easyocr.Reader(['en'])
result3 = reader.readtext(IMAGE_PATH)
result3

my_list2 = []

length = len(result3)

for i in range(length):
    if (result3[i][1]) == 'CR No':
        print(result3[i])
        print(result3[i+1])
        my_list2.append(result3[i+1]+result3[i])
        print(my_list2)

print('The CR No is:', my_list2[0][1])
Posted
Updated 23-Sep-22 0:46am
Comments
Richard MacCutchan 23-Sep-22 5:27am    
You need to analyse the results to find the required data. You cannot assume that will always be returned at a fixed location: result3[i][1].
M@153 23-Sep-22 6:01am    
I have analysed the result- it always comes in nested list format given below-

[([[212, 26], [314, 26], [314, 50], [212, 50]],
'SCB MEDICAL',
0.998906268787747),
([[36, 56], [84, 56], [84, 80], [36, 80]], '6ein8', 0.19502338570146513),
([[303, 99], [335, 99], [335, 119], [303, 119]], 'OPD', 0.9985000298670181),
([[23, 119], [59, 119], [59, 135], [23, 135]], 'CR No', 0.996245605687589),
([[55, 105], [182, 105], [182, 134], [55, 134]],
'211022203481161',
0.9774761960967909),
([[373, 129], [395, 129], [395, 141], [373, 141]], 'Sal', 0.3229167512055152),
([[401, 123], [461, 123], [461, 141], [401, 141]],
'29 years/M',
0.760099694251474)]

So my concern is how to filter this list to get the CR No.
Richard MacCutchan 23-Sep-22 6:20am    
It looks like the returned data consists of a list containing a tuple. And the tuple contains a list of list items, followed by a string and a value. So you just need to iterate through the list and examine the fields of each tuple element. If the data is always in exactly that format then it is easy to find what you are looking for.
M@153 23-Sep-22 6:30am    
I just need to extract the Cr no only, i.e 211022203481161, from the list. What changes need to be done in code?
Richard MacCutchan 23-Sep-22 6:45am    
See my Solution below.

1 solution

Something like:
Python
mainlist = [([[212, 26], [314, 26], [314, 50], [212, 50]], 'SCB MEDICAL', 0.998906268787747), ([[36, 56], [84, 56], [84, 80], [36, 80]], '6ein8', 0.19502338570146513), ([[303, 99], [335, 99], [335, 119], [303, 119]], 'OPD', 0.9985000298670181), ([[23, 119], [59, 119], [59, 135], [23, 135]], 'CR No', 0.996245605687589), ([[55, 105], [182, 105], [182, 134], [55, 134]], '211022203481161', 0.9774761960967909), ([[373, 129], [395, 129], [395, 141], [373, 141]], 'Sal', 0.3229167512055152), ([[401, 123], [461, 123], [461, 141], [401, 141]], '29 years/M', 0.760099694251474)]
for thetuple in mainlist:
    for x in range(len(thetuple)):
        if thetuple[x] == 'CR No':
            print(thetuple[x], thetuple[x+ 1])
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900