Click here to Skip to main content
15,885,309 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
I want to extract only a number from the following:
RTGS-VEERPAL/KARBH13267941306
TO STAMP PAPER CHARGES
2013
1348
By Clg/33790/ICICI/DELCTS1/166
RTGS-KUSUM TOMAR/CNRBH13271696922
RTGS-SHRI BALAJI BUILDHOME PVT /KARBH13294076078
By Clg/956226/AXIS/DELCTS1/171


What I have tried:

I've tried using the regular expression
[\d]{4,}
although it can extract the numbers, it also extracts numbers from the alphanumeric portion, which is not what I want.

I want
2013
1348
33790
956226
Posted
Updated 11-Jan-23 20:23pm
Comments
0x01AA 1-Jan-23 11:49am    
So what you have to do first is to define all the rules. So far I can see them it is
a.) Numbers in a line
b.) Numbers delimited by '/' (except the number is at end of line?)
c.) What more? E.g. delimited by whitechars like space, tab, ...? E.g. 'AlphaPart 3051 ,'
adriancs 1-Jan-23 23:36pm    
import re, then re.findall(r'\d+', datasource_string)

Try running this example

Python
import re

data = """
RTGS-VEERPAL/KARBH13267941306
TO STAMP PAPER CHARGES
2013
1348
By Clg/33790/ICICI/DELCTS1/166
RTGS-KUSUM TOMAR/CNRBH13271696922
RTGS-SHRI BALAJI BUILDHOME PVT /KARBH13294076078
By Clg/956226/AXIS/DELCTS1/171
"""

pat = re.compile(r'(?<!\w)\d{4,}')

for line in data.split('\n'):
    if (m := pat.search(line)):
        print(m.group(0))


To get that solution I went to https://www.autoregex.xyz/ and entered the description "four or more digits not preceded by an upper case letter".
 
Share this answer
 
v2
#STRING DATA MINING
def main():
    s = '''RTGS-VEERPAL/KARBH13267941306
TO STAMP PAPER CHARGES
2013
1348
By Clg/33790/ICICI/DELCTS1/166
RTGS-KUSUM TOMAR/CNRBH13271696922
RTGS-SHRI BALAJI BUILDHOME PVT /KARBH13294076078
By Clg/956226/AXIS/DELCTS1/171'''
    L = s.split('\n')
    N = []
    for e in L:
        N.append(e.split('/'))
    print(s, '')
    #print('--- WANTED NUMBERS ---')
    print(int(N[2][0]))
    print(int(N[3][0]))
    print(int(N[4][1]))
    print(int(N[7][1]))
    
if __name__ == '__main__':
    main()
    print('*** STRING DATA MINING OVER ***')
 
Share this answer
 
v2
Python
#STRING DATA MINING BY REGEX
import re

def main():
    s = '''RTGS-VEERPAL/KARBH13267941306
TO STAMP PAPER CHARGES
2013
1348
By Clg/33790/ICICI/DELCTS1/166
RTGS-KUSUM TOMAR/CNRBH13271696922
RTGS-SHRI BALAJI BUILDHOME PVT /KARBH13294076078
By Clg/956226/AXIS/DELCTS1/171'''
    L = s.split('\n')
    M = []
    for n in L:
        regex = '\d+'
        match = re.search(regex, n)
        M.append(match.group() if match else None)
    WANTED = [int(w) for w in M if w and int(w) <= 1_000_000_000]
    print('--- WANTED NUMBERS ---')
    for w in WANTED:
        print(w)
    
if __name__ == '__main__':
    main()
    print('*** STRING DATA MINING OVER ***')
 
Share this answer
 
Comments
CHill60 12-Jan-23 5:48am    
You have posted two different solutions - both without any commentary - which one is meant to be the solution? Don't post multiple answers to a question - if you need to update a solution then use the "Improve Solution" link on your first one.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900