The Lounge - CodeProject

H.Brydon4-Jan-20 16:21

4-Jan-20 16:21

honey the codewitch wrote:
if(firstSym==lparen)
{
if(secondSym==rbracket) {
} else if(secondSym==intLiteral || secondSym==floatLiteral || secondSym==identifier) {
} else if(secondSym==...) {
...
} else ...

} else if(firstSym==lbracket) {
else if(firstSym==...) {
....
} else ...
...
}

As an alternative to this specific code, you could also use switch statements. The branch with multiple "||" logic would be multiple case labels.
I just present this as a choice, not necessarily a preferred choice.

I'm retired. There's a nap for that...
- Harvey

Re: In defense of goto

honey the codewitch4-Jan-20 17:00

honey the codewitch

4-Jan-20 17:00

Well, CodeDOM doesn't do switch so my code generator won't render a switch. Even still it's not much different than the if scenario, except you can use goto case, which i guess makes it a bit like gotos. Not a game changer in any case, if you'll pardon the pun. Smile | :)

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Re: In defense of goto

Stefan_Lang5-Jan-20 22:22

Stefan_Lang

5-Jan-20 22:22

Exactly because you have such a complex branching structure, you shouldn't use goto. I'm not sure I fully understand your problem, but it looks like it's in need of either refactoring, or code generation. (or both Wink | ;)

)

I did use an UML tool for generating code for a state machine some 20 years ago, and it let me choose between using switch statements or inheritance. Our team went with switch, because inheritance seemed over the top, and we had limited resources on our embedded target machine. But if our target system had been a PC, nothing would have stopped us from using inheritance, i. e. one class for each state. Not sure if that would be a suitable approach in your case, just saying that code generators exist for such things, and indeed existed 20 years ago. (for C++)

P.S.: that said, if your code generator does use goto, and you will never have to manage that code manually, then it's ok to do just that. After all, the main reason agaist goto is it's bad behavior regarding maintenance, and that's a non-issue for 100% generated code.

GOTOs are a bit like wire coat hangers: they tend to breed in the darkness, such that where there once were few, eventually there are many, and the program's architecture collapses beneath them. (Fran Poretto)

Re: In defense of goto

honey the codewitch5-Jan-20 23:14

honey the codewitch

5-Jan-20 23:14

The gotos are there for a reason and should be.

The code is generated.

It's either 20 thousands lines of gotos

or 70 thousand lines of what you're talking about.

I'll take the gotos

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Re: In defense of goto

Stefan_Lang5-Jan-20 23:38

Stefan_Lang

5-Jan-20 23:38

Well, as I said (amybe you haven't seen the P.S. at the time you wrote the answer): if it's generated, it's fine. 20K lines doesn't sound like you're going to maintain that code manually anyway Poke tongue | ;-P

Re: In defense of goto

honey the codewitch6-Jan-20 6:43

honey the codewitch

6-Jan-20 6:43

The issue here is state machines.

There are three ways to implement them:

1. Table driven
2. Goto driven
3. Using code synthesis to achieve while's, fors and ifs like you'd want.

There are a number of problems with #3, aside from code complexity.

State machines necessarily operate like spaghetti code and must jump freely.

Otherwise paths can't converge and you end up with a lot more code. Consider the following image: C# Keywords DFA[^]

That may seem extreme but it's not. This simply matches C# keywords. This exact state machine could be used in microsoft's production C# lexer (part of the csc.exe compiler) to do exactly that. It certainly would be if it was generated from the same keyword list.

See all the convergent paths? These are impossible to get to without gotos. The equivalent machine without convergent paths is about 5 times! the size. So the code will be, and the nests will be so deep it will be unintelligible anyway.

Edit: I mean yes, you could use inheritance but at that point all you're doing is presenting a facade over "gotos" but it's using the vtbl for it instead of explicit jumps.

It's still nasty when you have 500 states (not uncommon in a lexer) and the code would get huge.

A state is 3-5 lines of code generally in a lex routine. adding a class to that mix at least doubles that i'd think.

And yeah i found it matters. VS chokes on 1MB files when it comes to reporting the position of errors (sometimes)

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

modified 6-Jan-20 13:10pm.

Re: In defense of goto

Stefan_Lang7-Jan-20 22:03

Stefan_Lang

7-Jan-20 22:03

Normally I'd have to add a lot to these statements, but since you're generating code, I'll just say this:

Use whatever works for you. If VS can't handle the code your generator produces, then use another generator option, or a different generator. Discussing the pros and cons of the available methods is meaningless if your code doesn't compile.

That said, I'm not an expert on parsers, but I wonder whether it is the usual approach to implement one as a state machine. Wouldn't it be straightforward to just implement each rule of the grammar as a function that converts a string of tokens (recursively) into a syntax tree? IIRC, there's typically no need to maintain a global state, and therefore no need to construct a state engine. A state would only be needed in case of a context-dependend grammar/language. I may have missed it, but is that what you have to deal with - a context depended language?

Re: In defense of goto

honey the codewitch7-Jan-20 22:09

honey the codewitch

7-Jan-20 22:09

For the lexer, a state machine is the typical approach.

The state machine technique for the parser is to implement LL(k) similar to how ANTLR implements LL(*)

The reason for this is, creating an LL(k) parser with your technique requires far too much code.

It's both more efficient and more compact to use a state machine rather than a giant parse table.

There's a research paper on doing it this way somewhere but i just googled for it and I can't dredge it up again.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Re: In defense of goto

Stefan_Lang7-Jan-20 23:36

Stefan_Lang

7-Jan-20 23:36

That's not what I expected, but as I said, I'm not an expert. Thanks for the feedback.

Re: In defense of goto

Stefan_Lang5-Jan-20 22:38

Stefan_Lang

5-Jan-20 22:38

I'm no expert on parsers, but can't you use some parser generator, such as Yacc - Wikipedia[^] ? Just an idea...

Re: In defense of goto

honey the codewitch5-Jan-20 23:15

honey the codewitch

5-Jan-20 23:15

I could, but i don't see why I'd write a parser generator, throw it away, and then use Yacc, which doesn't even use the same parsing algorithm as the generator i just wrote.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Mini CCCs 160

Tim Deveaux2-Jan-20 17:30

Tim Deveaux

2-Jan-20 17:30

Time after can change colour (4)

Re: Mini CCCs 160

OriginalGriff2-Jan-20 19:55

OriginalGriff

2-Jan-20 19:55

TINT

"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!

Re: Mini CCCs 160 - Done

Tim Deveaux2-Jan-20 19:56

Tim Deveaux

2-Jan-20 19:56

we need to harness this talent for CodeProject !

BillWoodruff2-Jan-20 17:01

BillWoodruff

2-Jan-20 17:01

There's some obvious cross-platform rich UI skills here currently being wasted on trivial pursuits OMG | :OMG:

[^].

What do you think about this approach:

Quote:
Dear Russell,

Here at CodeProject, so many of us are impressed by your work in Material Design multimedia; our fourteen million++++++ member community can offer you a chance to develop your talents with support ... and friendly challenge. Here you can write your good name and ideas in the sky of technical eternity (good to the last byte).

You can enjoy the bonhomie of the ~~Lounge~~ Parser Discussion Forum, and I guarantee you the mix of esoteric conversations, personal revelations, daily-life minutiae, and heart-warming stories of geeks who love gadgets too much, will not only amuse you, but, stimulate deep reflections on the meaning of it all.

I want to assure you that: while I am just a way-past-use-by-date pedantic oddball, there's so many people here to bond with, learn from, teach, ventilate with, enjoy, celebrate, etc., who can write intelligible English uncluttered by archaic vocabulary. So, please don't let my flourishes and foofaraw put you off.

I hope you will not be offended if I dare to say your talents deserve a wider audience than mere photographers.

best wishes, Bill

«One day it will have to be officially admitted that what we have christened reality is an even greater illusion than the world of dreams.» Salvador Dali

Re: we need to harness this talent for CodeProject !

PIEBALDconsult2-Jan-20 18:20

PIEBALDconsult

2-Jan-20 18:20

But but but... I'm a photographer!

Re: we need to harness this talent for CodeProject !

BillWoodruff3-Jan-20 1:12

BillWoodruff

3-Jan-20 1:12

PIEBALDconsult wrote:
But but but... I'm a photographer!

You are so much more than that to me OMG | :OMG:

«One day it will have to be officially admitted that what we have christened reality is an even greater illusion than the world of dreams.» Salvador Dali

Re: we need to harness this talent for CodeProject !

Mark_Wallace2-Jan-20 23:50

Mark_Wallace

2-Jan-20 23:50

Isn't that kinda like asking someone who organises his wardrobe well to become a carpenter?

I wanna be a eunuchs developer! Pass me a bread knife!

My observations on parser generators

honey the codewitch2-Jan-20 14:07

honey the codewitch

2-Jan-20 14:07

After using them and building several of them, I still prefer building my parsers by hand.

The only reason I don't is error handling and maintenance, although maintenance with generated parsers is a double edged sword. For starters, it generates a lot more code than you would if you hand rolled a parser, typically. Second, with a hand rolled parser you can parse directly into your final tree. With a generated parser you must first parse into a parse tree, before converting *that* to your final tree, and so maintenance becomes an issue again because there's that second phase which is usually hand-written. Although to be fair, the data you're working with in the second phase has already been validated and error checked for syntax and structure.

I provide macros to help with the second phase, but it doesn't help that much. It would be nice if i could make a grammar description that could completely shape a tree, and create arbitrary objects off of it, to mimic the creation-into-the-final-tree that hand rolled parsers can do.

I don't like that the generated parser will take my codedomgokit source from 500KB to about 1.5-2MB

But I also don't like how sketchy my hand rolled parser is.

Is the extra 1+MB worth it? I guess that's probably 400-500k compiled based on previous experience. Currently CodeDOM Go Kit is about 250K compiled.

I don't know. It's something to think about. I really want a Better Way(TM) - something that handles that second phase and rolls it into the first.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Re: My observations on parser generators

Greg Utas2-Jan-20 15:04

Greg Utas

2-Jan-20 15:04

honey the codewitch writes:
I still prefer building my parsers by hand.

I even built lexical code by hand, so no arguments from this 🦕.

With a generated parser you must first parse into a parse tree, before converting that to your final tree...

🤮

I provide macros to help with the second phase...

Naughty! I'd deprecate macros with prejudice. No, not even deprecate: they'd just vanish overnight. I've only seen one instance where they were used elegantly: to generate assembly opcodes of all things. Maybe I've led a sheltered life. But I'll confess that I never "got" templates until I read a comment describing them as "macros on steroids".

But I also don't like how sketchy my hand rolled parser is.

Would you like to borrow my rubber 🦆, who's a refactoring taskmaster? He promises to be gentle.

Quote:
Is the extra 1+MB worth it?

I'd bet this is a problem with all non-trivial generated code.

Re: My observations on parser generators

honey the codewitch2-Jan-20 15:10

honey the codewitch

2-Jan-20 15:10

I'm glad you read this post. I was thinking of you when I wrote it. I enjoy our conversations, as we have similar passions, it seems.

My "Macros" are actually type checked. After the code gets parsed and dumped into an AST then I go back and resolve macros and I do it in a type safe manner. They resolve to function calls. It isn't a simple preprocessor, so doesn't have a lot of those traditional drawbacks.

Sure, templates are "macros on steroids" but more importantly perhaps, they're *type checked* "macros".

I love templates in C++. Once you learn Generic Programming(TM) you never go back unless you have to.

By the way, you might want to consider using a lexer like flex or lex for your parser.

The reason is speed, not just ease.

If you learn how to use it (the learning curve isn't big, lex is stupid simple and mostly you write it in C/C++ code) - it will do things like shortest string discovery. It uses these optimizations to skip examining characters in the input stream, speeding up the parse. This is near impossible in a hand built lexer, because you need a lot of tabular data to do it properly.

It also makes your parser less buggy than hand lexing, in my experience. Evaluating tokens is a hell of a lot less error prone than evaluating individual characters, although I'm only assuming you're doing scannerless/lexerless parsing.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Re: My observations on parser generators

Greg Utas2-Jan-20 15:59

Greg Utas

2-Jan-20 15:59

If you could recommend a link for lex or flex, I'll take a look, though it'd probably be hard to retrofit it at this point. The shortest string stuff sounds nice. Even worse is having to back up and re-parse an entire identifier. I ran into this today while writing code to support a label that's the target of a goto. Not that they're used much, but goto is on the list of keywords that I never got around to supporting.

My lexer code is very string oriented, so there aren't distinct lexer and parser phases, which is what I assume you mean by "scannerless/lexerless". It's probably a trade-off between things like the shortest string optimization that you mentioned, and the ability to override rules (e.g. to allow punctuation in a name so that a template instance can be referred to by its verbatim name rather than some mangled monstrosity).

Re: My observations on parser generators

honey the codewitch2-Jan-20 16:13

honey the codewitch

2-Jan-20 16:13

The man pages are actually VERY good. But also Using <CODE>flex</CODE> - An Overview of @code{flex}, with Examples[^]

That's a very basic tutorial on using flex.

Download Flex[^]

Now, here's something for some reason nobody really talks about.

In practice, forget those samples. They give you basics but nothing real world. Real world though, is even easier.

Have a set of constants for all your symbols. If you intend to parse >= you need a GREATERTHANOREQUAL/GTE constant. Give it an int value. Do it for all your terminals. It doesn't matter what the values are. You're going to use these to distinguish your tokens.

In your scanner:

C++

%option unicode, codepage:raw, out: outputfile.c


%{
// extra code.
%}

%%
<<EOF>>		{ return EOS; }

"int" {return INT_TYPE; }
"uint" { return UINT_TYPE; }
// all keywords BEFORE IDENT
(_|[[:IsLetter:]])(_|[[:IsLetterOrDigit:]])* { return IDENT; }


// ignore whitespace
[\r\n\t\v\f ] 

// capture anything we didn't get upstream and report it as an error
[\n]|[^\n] { return ERROR; }

basically you just return a constant a value for every token you capture. You can do more - whatever code you need in those brackets is fine. I use them for building my skip lists inside those little brace breakouts above where we return consts. Like
{ yyskip(LINE_COMMENT); }
since i didn't return a value, flex simply "hides" it. It only gives you something back when you return. otherwise it just keeps going. yyskip() is a function I made that would have been included in the "extra code" part above. It adds the current value to my skiplist.

Oh to get access to these values you use int symid = yylex();
and the text returned is in yytext

It's all pretty simple. It's a bit of work defining your tokens and consts up front but hey, it's a lot less work than parsing them yourself.

And it will make this FAST.

I actually generate these files using Parsley Laugh | :laugh:

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Re: My observations on parser generators

honey the codewitch2-Jan-20 15:54

honey the codewitch

2-Jan-20 15:54

One thing I forgot to mention to you - regarding skipping comments.

I learned the hard way, if you're going to skip, you need to manually clear the skip list periodically.

I was trying to clear it after every read, and it doesn't work, because trailing items get lost if you aren't checking *everywhere*

I learned the hard way as the way I exposed these things doesn't leave me room to do this. (Part my fault, and part limitations of the lexer generator i'm using - skipped lists weren't really in the design Big Grin | :-D

)

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Re: My observations on parser generators

Mark_Wallace2-Jan-20 23:55

Mark_Wallace

2-Jan-20 23:55

I used to build databases by hand, and use PostScript and mark-up languages (the vast majority of which were far better suited to the job than XML) to produce documents.

... But then better tools came along, which allowed me to spend my time and energy on doing the work, rather than on setting up the infrastructure for it.

I wanna be a eunuchs developer! Pass me a bread knife!

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Welcome to the Lounge