Click here to Skip to main content
15,894,343 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
I'm a real newbie at regex. I have a banking program and it has regex rules for importing transactions. But sometimes the rules just don't work right. I'm trying to understand just what the rules are saying so that I can make them work the way they should.

I think I've figured out some of it but I just need some verification as to whether I'm on the right track or not.

Here's the actual wording that comes through on my credit card statement:
AT&T*BILL PAYMENT 800-331-0500 TX

Here's the regex rule that I evaluates for my banking program:
AT&T\*BILL PAYMENT +[0-9][^\p{L}]*T+[0-9][^\p{L}]*S+SQPZ

Here's the way that I understand the regex rule: (this is what I'd like to know if I'm understanding the expression rightly.

AT&T (Actual text to display) \ (escape character so next symbol (*) will display) BILL PAYMENT (Actual text to display) +[0-9] (If there are numbers then display those numbers however many times they exist and in the order they exist - Since there aren’t any numbers I THINK it put in 9 spaces instead) [^\p{L}] (when the ^ is used it can mean the start of a new line. But if it’s used inside a square bracket it means “not” - So, the \p{L}] when used with the ^ indicates an Arabic letter that is not a letter {L} or a number {N}) *T (I think it stands for a Tab +[0-9] (As above, if there are numbers then display those numbers however many times they exist and in the order they exist). [^\p{L}] (As indicated above, it indicates an Arabic letter that is not a letter {L}. *S+SQPZ (I don’t know what these stand for. It would seem that they are specific Letters that should be displayed at the end of the line.)

So, am I anywhere near right in deciphering the regex?

Thanks for whatever help comes!

Art
Posted

1 solution

Not quite, because your Regular Expression does not match your example. Can it be fixed?

No! This is because you are missing the whole idea of match. Regular expression cannot be built on just a single text sample. It covers some set of all matching strings, which you did not define and maybe you don't know it.

For example, I can suggest one Regex which matches your example: AT&T\*BILL PAYMENT [0-9]{3}-[0-9]{3}-[0-9]{4} TX. But how do you know that it can be only "TX" at the end? What if it should be two arbitrary Latin letters? Then it should be AT&T\*BILL PAYMENT [0-9]{3}-[0-9]{3}-[0-9]{4} [A-Z]{2}. It also matches your example. But how do you know that it should be two? How do you know it should be upper-case? And so on…

Your "problem" is a typical problem which is not really formulated. You need to know exact definition of the format, which should cover exact set of all possible string values. It can be defined mathematically, or… by a Regular Expression. :-)

—SA
 
Share this answer
 
Comments
aajoyce 28-Dec-15 19:53pm    
Thank you Sergey. You're right, that "typical problem" wasn't really formulated because I was trying to work only from the info that I had. Sometimes a transaction would have 3 or 4 rules that seemed to be all different, or if not different, so completely alike in almost every detail that it seemed like it was the same. I would import transactions and thought that the import rule (a regex that the program formulated) was what I wanted but it would never be exactly right and I found myself going over and over the same things, such as selecting the proper category, so it would register correctly in my bank program. I was trying to learn what the various regex codes meant so I could adjust them to give more consistent results. So I found myself removing parts of the regex and not really knowing what I was removing. I will continue to expand my learning of regex and perhaps one day I'll have a better grasp of it. I was also making an assumption that the company, such as AT&T always sent the same text with every transaction and couldn't figure out why my program was forming the rule as it did. Anyway …thanks!
Sergey Alexandrovich Kryukov 28-Dec-15 20:35pm    
You are welcome. Well, that's simple enough: you can collect some statistics and hypothesize on some set of input string and generalize it in some reasonable way, but you cannot be 100% sure unless you receive a formal definition from an original source... It's not really related to your knowledge of Regex, which should not be a problem.
—SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900