|
Yep, kinda.
|
|
|
|
|
And there I was going to congratulate you on your New Year's resolution to give up smoking.
I wanna be a eunuchs developer! Pass me a bread knife!
|
|
|
|
|
An extra keystroke... THE HORROR!
|
|
|
|
|
Nand32 wrote: I've found new love for all small case separated by underscores
Ewwwww!
|
|
|
|
|
Typing in underscores suck.
|
|
|
|
|
Why restrict yourself to an alternate restriction?
I do what I want to do when I want to do it because I can.
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
Okay, so I don't like to use goto, but it is sometimes if not necessary, then efficient.
I'm running into a problem right now that is very similar to the problem of supporting multiple optional parameters in a stored proc.
Basically for an LL(k) parse in a recursive descent parser, for every k>1 value the number of IFs grows exponentially.
if(firstSym==lparen)
{
if(secondSym==rbracket) {
} else if(secondSym==intLiteral || secondSym==floatLiteral || secondSym==identifier) {
} else if(secondSym==...) {
...
} else ...
} else if(firstSym==lbracket) {
else if(firstSym==...) {
....
} else ...
...
}
for each additional k value i need to add a thirdSym, fourthSym, and accompanying nested ifs.
Meanwhile, if I rewrite this as a state machine I can use gotos, and then recycle similar paths.
This is really hard to show you an example of because the result is spaghetti but it's easy to generate programmatically, or to graph visually. (i don't want to break out graphviz right now though - it's late)
However, being able to recycle those paths as they converge can dramatically reduce the amount of code you need, but you have to be able to bounce between the different branches, the way an if won't.
You can do similar using goto case albeit less efficiently than a goto based baked state machine overall.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
One of the good things about becoming an experienced developer is that you know when it's OK to use a goto, as well as when it's not.
Goto has value - but only in circumstances where performance is critical, and commenting can help overcome the maintenance / readability / spaghettification issues they can easily bring.
Having said that, I've never needed one in C#, and the last time I used one was in assembler code ... (which is even more of a performance improvement and maintenance / readability / spaghettification issue that stuffing a goto into C# code would be).
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
The situation I laid out is applicable to every imperative programming language I've used.
It's applicable to all C/C++ family and C/C++-ish syntax languages, including the java and javascript beasts. It's applicable to Pascal. To Basic. To Perl. To Bash - you actually run into the problem i outlined a fair bit in shell scripts in both unix and windows cmd depending on what you're trying to do.
Goto in these situations - that is, in situations where you need a state machine and don't have ready access to tables. Batch files are a perfect example. Stored procedures are too but only IF you don't want to touch the DB in your logic path - obv once you go to the DB you have tabular data all day long.
But also in languages where you do have it just as often constructing an array to traverse your state machine is just as messy as traversing it with gotos.
So you're back to square one.
And this is in any case where you need a deterministic finite automata or something with its fundamental properties ( a basic stackless state machine)
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
My "Programming 101" course was Fortran based, and we did use gotos because that was the only alternative. Ever since I have been working with "structured" languages of the "Pascal class" (or C class, if you like). Whenever I had felt a need to skip out, there has been a reason for it, something related to the structured flow: I am through with the loop prematurely, and should break out of it. There is nothing more to do in this function, so I return. Etc.
Admittedly, not all languages provide all the facitlites I would like. So sometimes I resort to tricks, like in C type languages when I might write a "do { ... } while (false);" so that can handle alternatives in order: "if (so-and-so) { ...; break; }" This is an old habit from the CHILL programming language where you could give any statement a label (which denoted that block, not a location in the code), allowing you to EXIT out of any labeled statement, whether a function, a loop, an if statement or whatever.
I never had to make a jump for completely unspecified reasons! So I never used a completely unspecified goto. If the language does not directly provide what I want, I do thricks like that "do while (false)". It is not bad: The indentation indicates what I am breaking out of, and usually the break occurs in another indented statement, making it stand out (or in). I much prefer that to an arbitrary goto.
|
|
|
|
|
implement the canonical reduced DFA to match C# keywords without using gotos or a table driven approach. There are about 520 states, IIRC.
You'd be writing code until you were dead.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Now I have come to accept 520 states to be "within reason" - but of course it is large.
If you were familiar wiht X.225, with its mere 32 states, you would know that without the supplementary state variables and predicate mechanism, but each cases modeled as a separate state, there would have been hundreds of states - maybe exceeding 520. The state variables is a way to reduce the size of the FSM. By proper use of state variables to collapse "n" states into one, with substates representing those "n", you can make it far more managable. Obviously those "n" substates have largely the same behaviour; they are not arbitraritly chosen.
Also, events may carry data. Say, if you parse a number, you do not model 0 through 9, +, - and decimal point as thirteen different events: All digits are born equal, so you have digit event, a sign event and a decpt event. Handling of digits is different in the integer and the fractional part, so you could make that two different states - or you coud make a decpt event set a state variable indicating 'fraction part', that steers the handling of the subsequent digits. And you would have a prediacate on this fraction part variable: If it is set, another decpt or sign is a protocol/syntax error.
Starting out with 520 initial states before you start cleaning up and grouping them into "significant" states, and grouping those events that are treated equally in the FSM as single events carrying data, sounds like a task that can be managed with reasonable effort. The size in itself is not a reason for rejecting an FSM approach.
In the X.225 standard, the presentation of the FSM is split over several state tables, each with a subset of the states and events, so they look like quite small tables. But they are just subsets of that rectangle of the total state table covering only those states (columns) that define handling of some events (rows): If an event occurs in any other state, it is illegal, square empty. E.g. those events related to connection establishment and those states related to connection establishment are shown as subtable. I have included this into my framework editor: You may define any number of sets of events and states, and display only those rows and columns that cover the area you are currently working on. You can have a "blob-and-arrow" plot of this subset. When you are done with, say, the connection part of the protocol, and go on to another subset: In the B&A plot, the connect phase states may be appear as a single "super-state" where the intermediate states in that stage are hidden inside a single blob.
I would be writing far less code than in any other approach - once the table is in place. FSM is NOT for that super-"agile" approach where you start with typing into your code editor "int main(int argc, char** argv) {" before you turn to the customer: "Now we are going - please describe your problem to me!" An FSM approach certainly requires a significant problem analysis, and even solution design, before you can start building your transition table. (In that sense, it is not a "modern" method...)
With an FSM, I would be relieved of writing tons of flow control. With a generator, I would be relieved of writing tons of function headings etc. I would be focusing on the true actions only, not how to get to them. Porting to another language would require a fraction of the work, compared to porting the full code body. I am using this approach because it saves me a whole lot of code writing!
|
|
|
|
|
A) just so we're talking about the same thing: an FSM is finite state automata. It does not use a stack. If it does, it's no longer an FSM
B) a PDA is a state machine that drives a stack. Parsers require a PDA.
Fundamentally, mathematically, A and B behave very differently, and the principles you are outlining here aren't really addressing a PDA.
C) and finally, none of what you wrote addresses the need to eliminate gotos from the final state machine. You cannot jump from both state 2 to state 2 *and* from state 2 to state 5 using only while loops. It's not possible because of how a while loop works. It's simple. Because of this, you cannot recycle paths. Every branch potentially leads to more branches, not less.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
I know for sure that I am not going to convince you about anythning...
The way OSI protocols, and my framework, use FSM implmentations is a pragmatic approach to make safe and managable code. They are useful adaptations of the mathematical, theoretical finite state automata idea.
If it breaks with your theory, stick to your nested else if sequences. I just do not see any advantage of that approach, compared to a table driven one. And that goes for a lot of aspects. You don't "need" switch statements either - in fact you do not need else clauses. But they are helpful. As are FSM, as a flow control construct. But if you don't want to use it, don't use it!
|
|
|
|
|
You wouldn't see it there. You'd see it in its ability to interoperate with other parsers, delegating parts of the parse to them and with the ability in Parsley to allow one to take over this entire procedure and parse themselves in code <--- this is where the table approach falls down. You cannot expect a human being to manually work with a compiled FSM table!
I should add, I've written several table driven parser generators. This one isn't. I can do things with it I can't with the others, like the above.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Very difficult not to use them in assembler - lots of implicit ones as well
"We can't stop here - this is bat country" - Hunter S Thompson - RIP
|
|
|
|
|
You should write this as a 2D table of delegates, indexed by firstSym and secondSym. No "else if" tests at all, just two simple multiplications plus an addition (fistSym * secondSym * sizeOfADelegate + baseAddress) to find the delegate to run. At C# level, you obviously see it as plain indexing, not as multiply and add.
Most likely, the table would be quite sparse. If the span of symbol values is large, making the table huge, you may want to use tricks to compress it. A delegate reqires quite a few bytes of storage, so if your table is large, but the number of elseif-bodies, actions, is moderate, your 2D table might contain not delegates, but byte or halfword delegate indexes into a dense delegate array.
Note that if the same delegate action is used many places in the table, with a direct delegate table you would repeat the space overhead many times. With an byte/halfword index table and a separate delegate array, you pay the delegate space overhead only once per action routine; each new reference to it requires only another byte or halfword (when your 2D table is sparse and you have packed it).
If your actions are brief, just a single statement or two, you might avoid all the delegate overhead (both in space for the delegate table and the run time overhead) by setting up the actions as a switch case. If you make the entries in your 2D table not as plain integers but an enum type with symbolic value names that describe the semantics of the actions, it might in fact be quite readable!
I created such a framework this Youle vacation, and was positively surprised by how compact and efficient the driver could be made, how compact the tables turned out, and the advantage of having a single copy of the action code used in several different cases. You fix a bug, or add an extension, in one place; you don't have to duplicate the fix in six different elseifs. The most important thing, though: Any empty square in the 2D table is a red flag: Here is a case you forgot to handle! Check it up!
In my framework, you edit the body of each action by itself, and you edit the 2D table by itself. Then a generator (optionally) packs the table, and creates either a delegate table or a switch statement, and a driver function (which is oblivious to the problem at hand; the only reason for generating it for every model is to strip off the lines not required because no action calls for that specific functionality, such as a third dimension of the table). One other reason to use a generator is that the 2D table is independent of programming language; the action bodies exist in multiple instances in C#, C++, C, Python or whatever language you have written a generator for (which is a fairly trivial matter). The first "language" alternative is a pseudocode alternative that can be used as a model for other languages. You edit the action body only; the generator takes care of creating function headers, adding any (language independent) comment block at the top of the function etc. The red tape of coding complex flows is drastically reduced - once you've got that 2D table in place!
As a convenience feature, I added a red flag alternative: Given an event (in your case: second symbol) in a given state (first symbol), the driver scans the state row of the packed table for events that have a defined action. If you have defined a red flag column to your table, it will be at the end of list, substituting for all the empty table cells that have been compressed away. So I can make a general catch-all error handler: Illegal event (/secondSymbol) xxx in state (/firstSymbol) yyy.
I am considering presenting this framework as a CP article once it is polished. I really would like to make it as a VS plugin, but I have not yet learned how to make VS plugins. (Hints to a good guide would be welcome!) As of now, I must generate the code outside VS (which is fast, but still...), and if you edit the code in VS you must import it back in for the next time you generate it. That is a little cumbersome, so I guess that to make it generally usable, I must provide a VS plugin for it.
|
|
|
|
|
one issue with this approach: It works for k=2, but not k=n.
The issue is the index.
The number of symbols in a viable prefix (k) is arbitrary but known at generation time
but my generated code needs to work for k=whatever regardless of the overarching machine/code size.
I can compact the table in a recursive descent parser by only extending k where necessary. in the functions where k=1 (most of them) none of this is necessary.
But i can't use an actual table to begin with.
To understand why you'd have to understand Parsley. It generates recursive descent composition parsers. Basically it creates a series of recursive descent parsers that share the same symbol table and lexer and can call each other.
Now, I could handle this using tables, sort of (tables have a hard time delegating to other tables when parsing) but i could do it.
But what I couldn't handle is virtual non-terminal productions, which allow someone to completely take over the parse.
The reason i can't use tables there is it increases the complexity of the code that exists in the grammar (the end user's code) to a point where it's not feasible. Tables aren't intelligible if they're cooked and machine generated, but this has to be.
I can only refer you to using Parsley in a real world scenario in hopes to illustrate the problem further. If you understand how to use Parsley you should understand why this approach probably won't work, unless i'm missing something.
Parse Anything with Parsley: A Different Approach[^]
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Can't you scale the higher values of k in powers of 2 effectively giving you a bit pattern leading to a unique number and add the values rather than multiplying?
|
|
|
|
|
only if I'm willing to work with runs of hundreds or thousands of bits or more.
I'd need one bit for every symbol in the table, and that wouldn't necessarily let me solve all of k's space requirements. It would just let me compact some of the comparisons.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Yes, you are right that finite state machines are finite
My way of doing it assumes that both the set of states/firstSymbol and events/secondSymbol are known when the tables are generated. For the third dimension, predicates for denying or allowing a transition, the set of predicate tests are predefined, but may test on state variables which extend the main state: They are predeclared, but their values may vary. In some cases, it could be possible to express some of these as even more indexed dimensions. In all the protocols I have seen, conditional transitions occur so rarely that the array would be extremely sparse, and the predicates are so simple that they can be tried in order much more efficiently than handling more dimensions. (For almost all cases, the defined predicates are not generated as individual functions but as a switch case.)
My source of inspiration is essentially X.225[^]. The state tables are found in Annex A. I certainly know that transforming an "unruly" set of switches / if-then-elses, maybe with action code spread all over the place (and which may block) into an orderly, well behaved FSM can be a major task. But a surprising large fraction of nested if-then-else sequences can be refactored into FSMs.
I never looked at Parsley, so you may very well be right that it cannot make use of FSM techniques, neither for the generating parser nor for the parser generated. Then again, very few of all the sw development collagues I have had over the years have had the slightest clue of how to do a table driven FSM, but have great "aha experiences" when I explain. So the major reason why I started developing this FSM framework is to have something that works, so I can show them: This is another and better way to do it! Maybe some cases will require extensions to the framework, but I am not going to believe in any "That approach is not a viable one" until I understand why it is not. It might be true. It might be a question of refactoring cost. It might be a lack of understanding of the FSM approach.
|
|
|
|
|
It's actually quite simple. The parser cannot be built using an FSM because it is recursive descent.
It cannot be a repurposed PDA because it already is a PDA, and the stack is the call stack.
The only way to change that is to fundamentally change parsley
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
If you implement, say, a protocol stack, you couldn't possibly do it as one single, monolithic FSM. There would be at least one FSM per protocol layer, generating events to the layers above and below, and receiving events from either.
You are of course right that transforming a recursive parser (or any solution that is fundamentally a recursive design) as a whole into a single monolithic FSM is next to impossible while maintaining its recursive nature. (Well, in principle you could manage a recursion stack as an explicitly managed list or array of stack frames, but I would hardly recommend that, except in very special cases.)
Yet, running the logic of each recursion as an FSM could be a good idea. When you dive into another recursion layer, you dive into a new FSM. A primitive example: You (hopefully) wouldn't parse a floating point literal as a deep recursion - probably there would be few other function calls than to obtain the next character. Translating that string literal into a binary float would be a perfect task for a tiny FSM.
Think of an FSM as a control structure related to a switch statement. In a recursive parser, you can see a token loop around a switch on the token value. You could just as well run an FSM and loop over feeding it token events. An FSM is really not much more than a 2- or 3-dim switch!
You rejections strongly resemble those I am used to receive It just illustrates that FSM is hardly at all taught as a programming technique nowadays. If no well known programming language had had a switch case statement, introducing it would have been met with similar rejection!
|
|
|
|
|
using an FSM inside those procedures is only really needed for k>1. To explain: Here's what a typical procedure looks like - the comparisons are only 1 deep. That's why k=1. Watch:
internal static ParseNode ParseIdentifier(ParserContext context) {
int line__ = context.Line;
int column__ = context.Column;
long position__ = context.Position;
if (((ExpressionParser.verbatimIdentifier == context.SymbolId)
&& ExpressionParser.WhereIdentifier(context.GetLookAhead(true)))) {
ParseNode[] children = new ParseNode[1];
if ((false
== (ExpressionParser.verbatimIdentifier == context.SymbolId))) {
context.Error("Expecting verbatimIdentifier at line {0}, column {1}, position {2}", context.Line, context.Column, context.Position);
}
children[0] = new ParseNode(ExpressionParser.verbatimIdentifier, "verbatimIdentifier", context.Value, context.Line, context.Column, context.Position);
context.Advance();
return new ParseNode(ExpressionParser.Identifier, "Identifier", children, line__, column__, position__);
}
if (((ExpressionParser.identifier2 == context.SymbolId)
&& ExpressionParser.WhereIdentifier(context.GetLookAhead(true)))) {
ParseNode[] children = new ParseNode[1];
if ((false
== (ExpressionParser.identifier2 == context.SymbolId))) {
context.Error("Expecting identifier at line {0}, column {1}, position {2}", context.Line, context.Column, context.Position);
}
children[0] = new ParseNode(ExpressionParser.identifier2, "identifier", context.Value, context.Line, context.Column, context.Position);
context.Advance();
return new ParseNode(ExpressionParser.Identifier, "Identifier", children, line__, column__, position__);
}
throw new SyntaxException("Expecting verbatimIdentifier or identifier", line__, column__, position__);
}
Sorry for posting so much code, but i wanted it to be true to what I'm generating.
Notice how the if statements are only one level deep. (except for the where constraint - WhereIdentifier but that's an aside right now - it's a separate conversation)
If i needed more than that, I'd certainly use an FSM inside this procedure. I hear you and in fact it's how I plan to do it. My in defense of gotos post was about using gotos as an FSM to do what you're talking about here. Using whiles to do it complicates the generation and explodes the output code size, requiring other compression techniques which make the code more, not less confusing. I prefer to make my state machines renderable to graphviz and then simply label each of the goto labels after the state in graphviz it represents. It's a map to the state machine. It makes it clear. That's my only contention in terms of how to go about it. I prefer the goto approach because i can make it followable to humans. I cannot do so with table compression techniques.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Plus you can have a label call Hell: meaning the line calling it is goto Hell, done that a few times, once it got in to production code!
|
|
|
|
|