|
Have used some regexes in my article Translitera - Phonetic Typing in Some Indian Languages[^], which is a tool for transliterating from English to some Indian languages. The program uses regexes to identify patterns in each word, and hence split each word into manageable parts.
Of course, there are some situations which are not handled, there is always scope for improvement.
|
|
|
|
|
I use them for input validation. Things like dates and times are straightforward. Names are not! Even imposing cultural restrictions (two capitalised names and some fussing around the edges). Patrick O'Reilly-Smythe and Ian McDonald are about as complex as I allowed for members of our Rural Fire Brigade. I can't remember whether Giulio d'Angelo would pass or fail. If he joins up, I'll revisit the code.
My other major use is in (often throwaway) SED scripts, or of course grep. Things like extracting the word after Invalid user in security logs.
Another one I used recently was to reconstruct words that were hyphenated across lines in a OCR'd manual. Google translate barfs on the fragments of hyphenated words.
Cheers,
Peter
Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012
|
|
|
|
|
|
Regex is black magic and sacrifices of caffeine and pizza offered in copious amounts is the only way to appease the beast.
I'm not sure how many cookies it makes to be happy, but so far it's not 27.
JaxCoder.com
|
|
|
|
|
But it's such useful, compelling black magic!
Real programmers use butterflies
|
|
|
|
|
That it is, I've used Expresso Regular Expression Tool[^] for years. It helps but I still can't wrap my head around it. That and dark matter...
I'm not sure how many cookies it makes to be happy, but so far it's not 27.
JaxCoder.com
|
|
|
|
|
Forget backtracking regular expressions, as they don't have the same fancy mathematical properties as their non-backtracking counterparts. Use the non-backtracking operators and there's only 5 operations to remember, concatenation, alternation, parentheses, zero or one match and kleene star (looping * - zero or more match), and concatenation is implicit.
They are
1. Simpler to understand
2. Faster to execute
3. Weirdly mathy but in a cool way
4. The same across almost all regular expression engines
I give a primer at the end of this article. I taught them to my computer, and trust me - it's not very smart, but then I also taught it C in that article.
Fun With State Machines: Incrementally Parsing Numbers Using Hacked Regex[^]
Real programmers use butterflies
|
|
|
|
|
Mike Hankey wrote: dark matter Dark Matter[^]; great series. A shame it only went three seasons.
Software Zen: delete this;
|
|
|
|
|
It was great right up until that bit where they time-skipped around the future. I could see the downfall of the series coming a mile away.
My current theory is that time-travel, if not baked in from the beginning, is a sign the writers have run out of ideas.
On the other hand, my daughter hates time travel in movies and shows because it's almost always done wrong. The episode where they went back in time was well done, according to her.
Bond
Keep all things as simple as possible, but no simpler. -said someone, somewhere
|
|
|
|
|
I enjoyed the series right up until they were able to insert a little card into their existing engine that allowed them to travel anywhere immediately. I know that FTL is imaginary, but I'd think that "blink" might have required a completely different set of physics requiring a new engine or something. My "willing suspension of disbelief" became unwilling at that point.
Outside of a dog, a book is a man's best friend; inside of a dog, it's too dark to read. -- Groucho Marx
|
|
|
|
|
I would rather use whatever language I am working with to perform the parse. As you just stated... regex is technically another small programming language.
I am not sure if you know this... but you can take a regular expression and use the Ragel state machine compiler[^] to convert it to C/C++, D, Go, Java, Ruby and even Objective-C. Interestingly... I do not see C# support.
Ragel Cheat Sheet[^]
Someone should probably write a little Visual Studio addon that takes a regular expression and converts it to a C# state machine... as it seems .NET programmers use alot of regex.
Best Wishes,
-David Delaune
|
|
|
|
|
Quote: I am not sure if you know this... I didn't. Thank you for posting it.
"In testa che avete, Signor di Ceprano?"
-- Rigoletto
|
|
|
|
|
I could do that, since I already wrote several apps that do exactly that. The issue is it doesn't support backtracking because i don't like backtracking regex.
Real programmers use butterflies
|
|
|
|
|
regex is a tool.
regex is great when used correctly.
read the owner's manual, as with any tool.
no one should be afraid of using a tool to complete a specific task.
|
|
|
|
|
I am afraid of you, messing up with regex.
Quote: Or you can use them to generate code for state machines That's indeed intriguing.
"In testa che avete, Signor di Ceprano?"
-- Rigoletto
|
|
|
|
|
|
Reminds me of APL. The language of 80's modelling gods. Cryptic enough they had their own department.
At least in assembler we had a few letters for op codes.
Afraid? It's just too forgettable (for me).
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food
|
|
|
|
|
I built a regex DOM for people like you.
Real programmers use butterflies
|
|
|
|
|
By Law we need to quote this XKCD.
xkcd: Regular Expressions[^]
I used to have this T-shirt, too: Regular Expressions[^]
Personally regular expressions are my indulgent cheat. Kinda like having pizza. I know I should go easy on them, and I'm trying to give them up, but when they are good, they are sooo good.
cheers
Chris Maunder
|
|
|
|
|
I would wear that shirt but then people would ask me what it meant and if I told them they would ask me to fix their computers.
Apparently I have no impulse control because I even use regular expressions for things they were never intended for.
Real programmers use butterflies
|
|
|
|
|
|
The only regex(-like) syntax I felt somewhat comfortable working with was SNOBOL
That was 30+ years ago. I first met it as a 200 source lines version of Eliza, the therapist, which fascinated me immensely. Obviusly, that version never passed any Turing test, yet: Try to write anything comparable in 200 lines of any ordinary, algorithmic language! So I started playing around with it, just for fun - I never used it commercially.
Actually, not too long ago I picked up the source code of an old SNOBOL interpreter, hoping one day to port it. It is currently #43 on my project lists. Tuits are hard to find nowadays, especially round ones.
|
|
|
|
|
I like using regex for day-to-day, throwaway things. It's especially good for reformatting text. I'm certainly not intimidated by them.
That said, I don't think I would ever use one in product code with a long life-span. You must admit that regular expressions tend to be write-only, which is a cardinal sin against those who must maintain the code, including your future selves. Code written very concisely, and regular expressions may be the ultimate in concise, require a lot of mental unpacking during maintenance. Unless you write a ridiculous amount of comments for the expression, it might not be worth it.
Software Zen: delete this;
|
|
|
|
|
I think it depends. I generally agree that complicated regex is mug's game. However,
How do you technically, and accurately convey a set of rules around lexical requirements?
Such rules must be able to be conveyed to other developers precisely.
Such rules must be unambiguous, and testable.
Such rules must be absorbable in reasonable amount of time, meaning no poring over RFCs if one can avoid it.
Imagine conveying the rules for what constitutes a JSON number
You can either say:
(\-?)(0|[1-9][0-9]*)((\.[0-9]+)?([Ee][\+\-]?[0-9]+)?)
Which takes some unpacking as you say, but is certainly readable.
Or I can give you a page long document of requirements around JSON number parsing.
Personally, I can read that quite easily, but that's me.
Let me propose something - there is a meaningful subset of regular expressions which are easy to understand, and can fulfill most simple lexical specifications like the above, or say, like an email address, or an url, or any number of small, structured text fragments.
It beats the alternative, hands down.
Real programmers use butterflies
modified 3-Jan-21 11:31am.
|
|
|
|
|
To my mind that regex would be okay. It's the thousands of characters, wall-of-text abominations that I object to. I know, that's an example of poor use of regex, but it's the kind of thing you find. Inexperienced folks start using it, and all of a sudden it becomes their favorite toy.
A toy that's all sharp edges...
Software Zen: delete this;
|
|
|
|