|
You only have 9 chances to call the API.
CI/CD = Continuous Impediment/Continuous Despair
|
|
|
|
|
I guess your cat became a laptop... Like someone said, CatScript.
|
|
|
|
|
My kids have been sql procs that I needed to call in the right order. I totally get that.
|
|
|
|
|
Isn't this exactly what Scratch was built for?
Or maybe I'm thinking of Claw.
Either way, will probably end horribly.
|
|
|
|
|
You are aware that the algorithm for getting a cat to move of its own volition is classified as NP-hard[^]?
Software Zen: delete this;
|
|
|
|
|
I even have a partial implementation:
string PullTail()
{
return "Meeeewwwwww".
}
Nick Polyak
|
|
|
|
|
Let me know when you can disable the mouse cord attack mode and enable keyboard avoidance.
Money makes the world go round ... but documentation moves the money.
|
|
|
|
|
Maybe a short one, Chris. Then back on your head!
Will Rogers never met me.
|
|
|
|
|
I decided to start shoring up the regular expression syntax I use, and settling on something I can more readily document and hopefully mimics what's already out there because I don't want people to have to learn new things just to use my engine.
This is a lot harder than it sounds. For starters most modern engines do backtracking, trading speed and simplicity for expressive power. Mine is simple and fast, but lacks some of the expressive power, which at the end of the day means I can't support certain language constructs - because they require backtracking which my engine doesn't do.
I'm not looking to change that, particularly when my engines are as much as 3 times as fast as Microsoft's .NET regex, and can readily be used for tokenization as a result.
But this leaves me with something of a conundrum.
There are 3 or 4 major regex syntax varieties out there. POSIX, Perl, JS, .NET etc.
I've started with POSIX historically, because it actually means I can use it with certain tools I use - things like FLEX use a POSIXish syntax.
I think I'd like to continue that, but I've noticed, I don't actually need that much POSIX specific nonsense to support almost all FLEX inputs in practice.
I hope I haven't lost or bored you yet. Bear with me.
I'm loathe to add to the syntax nightmare that is regex, so I want to be very careful here.
I'd rather not support multiple "modes" to support different syntax styles like .NET's engine does.
However, I'm thinking of, as much as possible, creating "one syntax to rule them all" as much as I can, such that I can accept constructs allowable in any engine, where possible.
One part of me wants to strangle the other part of me in my sleep for even considering this. /s
The other part of me wonders what the alternative to this mega-syntax is.
90% of this has to do with what is allowed to appear inside [] braces.
Any ideas?
Real programmers use butterflies
|
|
|
|
|
IMHO it is totally acceptable to use a subset of the available commands/syntaxes - just do not introduce something of your own...
Try make that subset compatible with as many flavors as you can (should be easy if you start cutting down from POSIX) so your engine (or a port of it) should run in each environment...
"The only place where Success comes before Work is in the dictionary." Vidal Sassoon, 1928 - 2012
|
|
|
|
|
The problem with that is that your common subset may not be powerful enough to do what you want it to do.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
xkcd: Standards
AKA embrace and extend
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Ha! I had that very comic in mind when I was attacking this.
Real programmers use butterflies
|
|
|
|
|
I'm motivated to make things better, not the same.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
Oh but I've already done that.
XBNF is a description language that can describe Chomsky type 2 and Chomsky type 3 languages both using simple compositions of logical constructs like a repeat construct {}, alternation |, concatenation (implicit) etc.
Regex is for the benefit of people that don't want to learn XBNF. It's also for the benefit of leveraging all of the existing regular expressions out there, and this is where it gets important.
If I mimic what's out there it means you can use my tools with the content *you already have* saving you time.
That is an improvement, no?
There's no real reason to "improve" regex. I'd argue that too many already have, leading to the present situation.
I'm not trying to improve that situation. Too many people already have, again leading to the present situation.
I'm simply trying to make the present situation as painless for users of my code as possible.
Real programmers use butterflies
|
|
|
|
|
I am not really good at regex but what about:
Keep your own syntax and offer some sort of "converter" - basically some function that takes a known syntax (POSIX oder .NET or whatever) and returns the "translated" regex which is in your syntax.
This way people choose (perhaps based on knowledge or usecase) which syntax they want to use.
|
|
|
|
|
I've considered that but there are some issues with it, like being able to scan reams of existing regexes (for example in lengthy lexer specifications). It is possible to run each thing through said function, but it's also very difficult to build and test exactly to a syntax versus making a syntax loose enough to accept most constructs from most languages. Doing so easily satisfies the 80/20 rule, so I think that's the way to go.
Real programmers use butterflies
|
|
|
|
|
'as painless ... as possible' is well worth the effort imo.
I wish I could help, but alas, all I can do is encourage. I look forward to seeing what you wind up with.
|
|
|
|
|
I have used Regex provided in various applications, and noticed that some require back slashes in front of operators and others require that they not be used. There seems to be no common ground. The Regex expressions must be different, depending on which app one uses. I would have to have different rules and examples of them for each application.
|
|
|
|
|
I handle escapes by allowing anything to be escaped, but only requiring it where necessary. Most regular expression engines are *supposed to* work that way.
Real programmers use butterflies
|
|
|
|
|
- Decide which constructs/features it makes the most sense to support and do those well (as compliant as possible, without sacrificing performance on your most critical expressions).
- Fail cleanly whenever possible with as much information as possible (e.g., provide a warning about 'regex "x" may not work as expected because feature y is not supported').
- If users say they really need something you don't think you can/should support, then provide a fallback approach (e.g., switch to an off the shelf package and let the user know about the hit they're taking and why).
FWIW, I totally get it. I've become a big fan of RegEx over the years (despite initial reluctance). While they can be quite complicated and obtuse, simple ones are very easy to write and most people adapt to them well as most simple searches just work. At the same time, I worry I'm sometimes giving up too much performance processing RegEx's and compiling them hasn't always generated the performance I'm looking for, so I've been very tempted to write an optimization wrapper that handles simple cases using language intrinsics (e.g., substring, begins with, and exact match).
I've not gotten around to the optimizations; however, I did (unfortunately) implement a couple of simple 'extensions' to RegEx as it doesn't support a couple of basic operations I frequently need:
- Use a leading "!" to mean NOT a match (where the rest of the input is the regex).
- Use a leading >, <, >=, or <= to treat the string as a numeric (or date) comparison (very handy for things like searching for numbers >10).
While it's possible to code those in RegEx the complexity is sufficient to reduce the value to near zero (especially since I use this for quick adhoc database column searches).
|
|
|
|
|
Honestly? You could just use a DFA regex. Even one written in C# will be up to 3 times faster than Microsoft's. downside is no backtracking, but oh well.
Real programmers use butterflies
|
|
|
|
|
Thanks. Did you mean that I should replace the .NET RegEx with a 3rd party (DFA based) RegEx, or that I should have people use DFAs instead of RegEx?
Backtracking's not really an issue. Most of the time I'm just looking for one of the following:
- String Contains Pattern (e.g., "Pattern")
- String Contains Pattern A or Pattern B or ... (e.g., "PatternA|PatternB|...")
- String Starts with Pattern (e.g., "^Pattern") or Ends with Pattern (e.g., "Pattern$")
- String Exactly matches Pattern (e.g., "^Pattern$")
Sometimes I use more complex options:
- String Contains Pattern A or Pattern B (e.g., "Pattern(A|B)")
- String Matches product code (e.g., "P\d+(-\d+)?")
And the ones that RegEx doesn't support:
- String doesn't match one of the above (implemented by me as "!...")
- Numeric or Date comparison (implemented by me as "<10" or ">=1/1/2000" ...)
- Within range (currently not implemented except via regex starts with).
That's about as complex as it gets.
The main performance issue is that I'm using it for ad hoc live filtering of up to 3-4,000 records (filter changes as every character is typed) with the potential for filters on multiple fields, and I'm trying to keep it responsive (< 2 seconds worst case, preferably < 1/10 second).
So far, the performance is reasonable, if not ideal (using the native .NET REGEX), so I've not been highly motivated to change. A 3rd party drop in engine might work (my thoughts were more along the line of recognizing the simple cases and hard coding them, it's hard to beat string.IndexOf and other string intrinsics which can easily handle three of the first four cases.
I also use RegEx's for backend filtering (before it gets to the UI) and there I'm limited to what the database engine supports. Performance is generally pretty good; however, I wonder if there would be value in my detecting simple cases up front and converting them to different operations before sending to the backend. For example:
- Instead of 'Field matches regex "Pattern"' generate 'Field contains "Pattern"'.
- Instead of 'Field matches regex "^Pattern$"' generate 'Field == "Pattern"'.
These patterns are also typically entered by users, but not live (they have to 'submit' the query). I already do some simple pattern manipulation, mostly adding a (?i) to the front as the engine is case sensitive by default and I'd rather it not be. This would be a bit more complex as I'd have to manipulate the operation, not just the pattern.
|
|
|
|
|
David On Life wrote: Did you mean that I should replace the .NET RegEx with a 3rd party (DFA based) RegEx
Yes, that.
However, given what you're telling me - are those records coming from a database? If so you might get more mileage using LIKE from within SQL itself, at least for the simple stuff. That will be orders of magnitude faster than anything you could do on the client side.
Real programmers use butterflies
|
|
|
|
|
Yes. However, the database is Kusto (aka Azure Data Explorer) which has native RegEx support. I use a two-stage approach.
The first stage allows the input of parameters which are passed to Kusto to select a small subset of relevant data (typically 1 to 1,000 records, sometimes more). Parameters may be RegEx, equals, list of matches, contains, startswith, or any other Kusto comparison operation (determined as part of the parameter setup, not by the user). They are not live but processed as part of the query (just like you're suggesting, except there's no 'like' operator in Kusto).
The second stage is local filtering once the data is already on the client. That's the live component. Since the data is already on the client at that point, local filtering is typically faster than requerying. I currently give users the option of either RegEx or simple Contains, but I'm not sure the Contains option is that meaningful.
A typical use case would be using the first stage to pull all storage performance test results in the last month for project x using configuration y. Then use client-side filtering to look for issues (e.g., performance < 90% of expected) and/or further filter on specific test setups (different storage types or different computer types).
|
|
|
|
|