I want to say to my parser not to skip "\n" before "///" in documentation comments!

Question

0.00/5 (No votes)

See more:

Hi,
First: This is not a syntactic error! My grammar file does not contain any error!
I want to write a C# grammar for my program. I have downloaded this grammar from CodePlex: http://antlrcsharp.codeplex.com/[^].
I have not found better C# grammar for ANTLR 4. This grammar does not support documentation comments which are very important for my program. This grammar skips all the comments, so I have deleted the skiping of documentation comments and written my code for documentation comments, but I do not know how to say, that after "\n" has to be "///" in a documentation comment. I am afraid, that the lexer, when he recognizes "\n", skips the lexical symbol automatically and never match this: "('\n' '///')?" in my parser rule. Does anyone know, how to solve this problem? Or Could anyone explain to me, that my code is right, if it is?

Here is my parser rules for documentation comments:

C#

//documentation comments
doc_comment :
'///' (  summary remarks?
        |remarks
        );
summary :
    '<summary' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* tag_body+ (('\n' '///')* comment_text ('\n' '///')*)* '</summary>';
remarks :
    '<remarks' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</remarks>';
tag_body :   '<c' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</c>'
            |'<code' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</code>'
            |'<example' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</example>'
            |'<exception' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</exception>'
            |'<include' 'file' '=' '\'' comment_text '\'' 'path' '=' '\'' comment_text ('[' '@name' '=' '"'identifier '"' ']')? '\'' '/' '>'
            |'<list' 'type' '=' ('"bullet"' | '"number"' | '"table"') '>' ('\n' '///')* listheader? listitem? '</list>' 
            |'<para' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</para>' 
            |'<param' 'name' '=' '"' identifier '"' '>' (('\n' '///')* comment_text ('\n' '///')*)* '</param>'
            |'<paramref' 'name' '=' '"' comment_text '"' '/' '>'
            |'<permission' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</permission>'
            |'<returns' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</returns>'
            |'<see' cref '/' '>'
            |'<seealso' cref '/' '>'
            |'<typeparam' 'name' '=' '"' comment_text '"' '>' (('\n' '///')* comment_text ('\n' '///')*)* '</typeparam>'
            |'<typeparamref' 'name' '=' '"' comment_text '"' '/' '>'
            |'<value' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</value>';
cref : 'cref' '=' '"' comment_text '"' ;
listheader : '<listheader>' ('<term>' (('\n' '///')* comment_text ('\n' '///')*)* '</term>')? ('<description>' (('\n' '///')* comment_text ('\n' '///')*)* '</description>')? '</listheader>';
listitem : '<listitem>' ('<term>' (('\n' '///')* comment_text ('\n' '///')*)* '</term>')? ('<description>' (('\n' '///')* comment_text ('\n' '///')*)* '</description>')? '</listitem>';

And here is a lexer rules for white spaces (WS), a parser rule for comment_text and a lexer rule for comment chars (ANY_CHARS):

C#

WS:
    (' '  |  '\r'  |  '\t'  |  '\n'  ) -> skip;
comment_text : ANY_CHARS;
fragment ANY_CHARS: (.)*;

Thanks for replies!
Pepin z Hane

Posted 23-Feb-14 10:44am

Pepin z Hane

Updated 23-Feb-14 10:45am

v2

Add a Solution

Comments

Sergey Alexandrovich Kryukov 23-Feb-14 17:03pm

It does not look like a question.
—SA

Pepin z Hane 23-Feb-14 18:36pm

I want to know how it works... if this part of rule: ...('\n' '///')... will be ignored or not, if the lexer rule WS says: skip '\n'. So the question is: Is this part of rule ignored or not? Got it?

Andreas Gieriet 23-Feb-14 18:35pm

Do you need the comments with the associated declaration only or do you need to parse everything? I.e. if you need a parser that collects all structured comments of a declaration, then you do not need to parse every thing in detail, i.e. you can employ kind of a lazy parser. It's easy to only collect the declarations and skip the body (e.g. of a function or property) and skip the initialization (e.g. of a member variable, etc.). The tricky part is a) handle attributes properly, and b) deduce which comments belong together.
I wrote such a parser once that collected extended structured comments that associated some extended structured comments for signing-off suppression attributes in the code. I might outline such a parser if this is a similar problem above.
Cheers
Andi

Pepin z Hane 23-Feb-14 18:44pm

I have to parse nearly everything... I don't have to parse only body of methods. I am creating a program for documentation generation. So I have to include namespace, class and class members. There could be "\n" anywhere. If I parsed only comments, It would not be no problem to say where can be "\n". I would not use "\n" -> skip. But in my case I have to. And I want to parse it at once, if it is possible.

Andreas Gieriet 23-Feb-14 18:58pm

This is already invented. Visual Studio generates a XML file for each compiled assembly (if set so in the project properties).
That file contains all the structured comments with a reference to the respective member, class, namespace. See How to: Use the XML Documentation Features (C# Programming Guide).
Cheers
Andi

Pepin z Hane 24-Feb-14 20:38pm

I know, but my tool is much more complicated and will be able to generate documentation specified by your own - something similar to doxygen. XML documentation in C# contains only member names and comments, nothing about memeber types or structure, that is why it is not useful (only with reflection).

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Andreas Gieriet · Answer 1 · 2014-02-23T12:54:00

Solution 1

I don't know ANTLR well enough, but my suspicion is that ANTLR has a lookahead predicates that allow to tell the lexer to only skip new-lines that are *not* followed by optional whitespaces followed by ///.
E.g. something like

C#

WS1: ( ' '  |  '\r'  |  '\t' ) -> skip;
WS2: ( '\n' { _input.LT(1).getType() != COMMENT}? ) -> skip;

You need of course a COMMENT production.
I've not tried it out, this is only a hint of a possible direction to solve this.
Cheers
Andi

Posted 23-Feb-14 12:54pm

Andreas Gieriet

Updated 23-Feb-14 13:00pm

v3

Comments

Pepin z Hane 24-Feb-14 21:39pm

... does not contain definition for LT... and La returns int, it is not the same... Probably, I have to say somehow, that if it is called from this rule do not skip it. Good idea, but how to write it correctly??... :-D

Andreas Gieriet 25-Feb-14 2:59am

I think your approach is flawed at least in two aspects:
1) this tool is already invented: use the produced XML files (it contains also all needed type information) or if not sufficient for your needs, grab the missing info from reflection
2) if you really want to re-invent it (it must be a hobby, otherwise it costs too much): you seem to do all in one go (having comments as tokens and at the same time already parse it's contents, which is a separate grammar *inside* the C# grammar.

So, for 2): treat comments as tokens, skip block comments, collect all line comments. In a second go, determine which structured comments belong together and concat their content (leaving away the leading ///), preserving new lines within the content. Then parse that content as its own grammar.

Andi