Click here to Skip to main content
15,888,301 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi,
First: This is not a syntactic error! My grammar file does not contain any error!
I want to write a C# grammar for my program. I have downloaded this grammar from CodePlex: http://antlrcsharp.codeplex.com/[^].
I have not found better C# grammar for ANTLR 4. This grammar does not support documentation comments which are very important for my program. This grammar skips all the comments, so I have deleted the skiping of documentation comments and written my code for documentation comments, but I do not know how to say, that after "\n" has to be "///" in a documentation comment. I am afraid, that the lexer, when he recognizes "\n", skips the lexical symbol automatically and never match this: "('\n' '///')?" in my parser rule. Does anyone know, how to solve this problem? Or Could anyone explain to me, that my code is right, if it is?

Here is my parser rules for documentation comments:
C#
//documentation comments
doc_comment :
'///' (  summary remarks?
        |remarks
        );
summary :
    '<summary' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* tag_body+ (('\n' '///')* comment_text ('\n' '///')*)* '</summary>';
remarks :
    '<remarks' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</remarks>';
tag_body :   '<c' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</c>'
            |'<code' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</code>'
            |'<example' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</example>'
            |'<exception' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</exception>'
            |'<include' 'file' '=' '\'' comment_text '\'' 'path' '=' '\'' comment_text ('[' '@name' '=' '"'identifier '"' ']')? '\'' '/' '>'
            |'<list' 'type' '=' ('"bullet"' | '"number"' | '"table"') '>' ('\n' '///')* listheader? listitem? '</list>' 
            |'<para' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</para>' 
            |'<param' 'name' '=' '"' identifier '"' '>' (('\n' '///')* comment_text ('\n' '///')*)* '</param>'
            |'<paramref' 'name' '=' '"' comment_text '"' '/' '>'
            |'<permission' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</permission>'
            |'<returns' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</returns>'
            |'<see' cref '/' '>'
            |'<seealso' cref '/' '>'
            |'<typeparam' 'name' '=' '"' comment_text '"' '>' (('\n' '///')* comment_text ('\n' '///')*)* '</typeparam>'
            |'<typeparamref' 'name' '=' '"' comment_text '"' '/' '>'
            |'<value' cref? '>' (('\n' '///')* comment_text ('\n' '///')*)* '</value>';
cref : 'cref' '=' '"' comment_text '"' ;
listheader : '<listheader>' ('<term>' (('\n' '///')* comment_text ('\n' '///')*)* '</term>')? ('<description>' (('\n' '///')* comment_text ('\n' '///')*)* '</description>')? '</listheader>';
listitem : '<listitem>' ('<term>' (('\n' '///')* comment_text ('\n' '///')*)* '</term>')? ('<description>' (('\n' '///')* comment_text ('\n' '///')*)* '</description>')? '</listitem>';


And here is a lexer rules for white spaces (WS), a parser rule for comment_text and a lexer rule for comment chars (ANY_CHARS):
C#
WS:
    (' '  |  '\r'  |  '\t'  |  '\n'  ) -> skip;
comment_text : ANY_CHARS;
fragment ANY_CHARS: (.)*;


Thanks for replies!
Pepin z Hane
Posted
Updated 23-Feb-14 10:45am
v2
Comments
Sergey Alexandrovich Kryukov 23-Feb-14 17:03pm    
It does not look like a question.
—SA
Pepin z Hane 23-Feb-14 18:36pm    
I want to know how it works... if this part of rule: ...('\n' '///')... will be ignored or not, if the lexer rule WS says: skip '\n'. So the question is: Is this part of rule ignored or not? Got it?
Andreas Gieriet 23-Feb-14 18:35pm    
Do you need the comments with the associated declaration only or do you need to parse everything? I.e. if you need a parser that collects all structured comments of a declaration, then you do not need to parse every thing in detail, i.e. you can employ kind of a lazy parser. It's easy to only collect the declarations and skip the body (e.g. of a function or property) and skip the initialization (e.g. of a member variable, etc.). The tricky part is a) handle attributes properly, and b) deduce which comments belong together.
I wrote such a parser once that collected extended structured comments that associated some extended structured comments for signing-off suppression attributes in the code. I might outline such a parser if this is a similar problem above.
Cheers
Andi
Pepin z Hane 23-Feb-14 18:44pm    
I have to parse nearly everything... I don't have to parse only body of methods. I am creating a program for documentation generation. So I have to include namespace, class and class members. There could be "\n" anywhere. If I parsed only comments, It would not be no problem to say where can be "\n". I would not use "\n" -> skip. But in my case I have to. And I want to parse it at once, if it is possible.
Andreas Gieriet 23-Feb-14 18:58pm    
This is already invented. Visual Studio generates a XML file for each compiled assembly (if set so in the project properties).
That file contains all the structured comments with a reference to the respective member, class, namespace. See How to: Use the XML Documentation Features (C# Programming Guide).
Cheers
Andi

1 solution

I don't know ANTLR well enough, but my suspicion is that ANTLR has a lookahead predicates that allow to tell the lexer to only skip new-lines that are *not* followed by optional whitespaces followed by ///.
E.g. something like
C#
WS1: ( ' '  |  '\r'  |  '\t' ) -> skip;
WS2: ( '\n' { _input.LT(1).getType() != COMMENT}? ) -> skip;

You need of course a COMMENT production.
I've not tried it out, this is only a hint of a possible direction to solve this.
Cheers
Andi
 
Share this answer
 
v3
Comments
Pepin z Hane 24-Feb-14 21:39pm    
... does not contain definition for LT... and La returns int, it is not the same... Probably, I have to say somehow, that if it is called from this rule do not skip it. Good idea, but how to write it correctly??... :-D
Andreas Gieriet 25-Feb-14 2:59am    
I think your approach is flawed at least in two aspects:
1) this tool is already invented: use the produced XML files (it contains also all needed type information) or if not sufficient for your needs, grab the missing info from reflection
2) if you really want to re-invent it (it must be a hobby, otherwise it costs too much): you seem to do all in one go (having comments as tokens and at the same time already parse it's contents, which is a separate grammar *inside* the C# grammar.

So, for 2): treat comments as tokens, skip block comments, collect all line comments. In a second go, determine which structured comments belong together and concat their content (leaving away the leading ///), preserving new lines within the content. Then parse that content as its own grammar.

Andi

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS


CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900