Click here to Skip to main content
15,887,596 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
See more:
I am working with Antlr4 and I want to parse and analyze any c++ code to detect any loops in any c++ source code to do dependency analysis on them, but I can't detect any loop in the source code.

What I have tried:

This is the rule that I used for "for loops" in Antlr:
forBlock: 'for' '(' (classicFor | forEach) ')' controlStructureBody ;
forExpression: primaryExpression (',' primaryExpression)* ;

I print out the tokens in this code:
C++
<pre>#include <iostream>
using namespace std;
int main()
{
    for (int i = 0; i <= 5; i++) {
        for (int j = 0; j <= 5; j++) {
            cout << i << j << " \t";
        }
        cout << "\n";
    }
    return 0;
}

using this code:
Java
public void printToken(String inputFile) throws FileNotFoundException, IOException {
        System.out.println("The tokens of the source code is: \n");
        CharStream inputStream = CharStreams.fromFileName(inputFile);
        TokensLexer tokensLexer = new TokensLexer(inputStream);
        CommonTokenStream tokenStream = new CommonTokenStream(tokensLexer);
        tokenStream.fill();
        for (Token token : tokenStream.getTokens()) {
            System.out.println("<" + token.getText() + "> " + "<" + token.getType() + ">");
        }
    }
and it gave me the type of each for loop as
<for> <45>
<for> <45>

I tried this code:
Java
        CharStream inputStream = CharStreams.fromFileName(inputFile);
        // lexing the code
        TokensLexer tokensLexer = new TokensLexer(inputStream);
        CommonTokenStream tokenStream = new CommonTokenStream(tokensLexer);
        // parsing the code
        TokensParser tokensParser = new TokensParser(tokenStream);
        tokenStream.fill();
        for (Token token : tokenStream.getTokens()) {
            if (token.getType() == 45)
                System.out.println("loop is found");
        }
}

When I put "45" it prints out loop is found twice and when I change the number to "39" it prints out loop is found only one.

I tried "39" because I have this value in my generated files from Antlr grammar.
Tokens.tokens -> for = 45.
Tokens.lexer.tokens -> for = 45.
TokensParser.java -> Rule_forBlock = 39; Rule_forExpression = 40;

and when I try to add more loops:
C++
#include <iostream>
using namespace std;
int main()
{
    for (int i = 0; i <= 5; i++) 
    {
        for (int j = 0; j <= 5; j++) {
            cout << i << j << " \t";
        }
        cout << "\n";
    }
    for (int  i = 0; i < 100; i++)
    {
       cout<<"Test"<<endl;
    }
    
    return 0;
}
and use the number 39 still detects only one loop.

Is there a way to detect the loops in the source code using Antlr and differentiate between the outer and inner loop?
Posted
Updated 25-Sep-20 12:08pm
v4
Comments
Stefan_Lang 25-Aug-20 11:57am    
I have no idea about Antlr or how it works. But the grammer you defined doesn't distinguish between a for loop body consisting of a single statement ending in ';', or a code block enclosed in '{' and '}' which may not end in ';'. Specifically your last code example has multiple for loops not ending in ';'

1 solution

Why don't you post the whole grammar? This is just a fraction of it. In you examples, it shall enter the "classicFor". How is this classicFor defined?

The output indicates that you code just recognized the "for" token.
 
Share this answer
 
Comments
HishamMohammedA 6-Oct-20 14:20pm    
okay, I was able to locate for loop but now I have a problem of recognizing the nested loop when I use visitor.

I am using this peace of grammar,
iterationstatement
: While '(' condition ')' statement #WhileStatement
| Do statement While '(' expression ')' ';' #DoWhileStatemen
| For '(' forinitstatement condition? ';' expression? ')' statement #ClassicForStatement
;
to visit for loop and write above it, after doing some analysis on the string forStr.

public T visitClassicForStatement(GrammarParser.ClassicForStatementContext ctx) {
String forStr =ctx.statement().compoundstatement().statementseq().statement().getText();

this string supposes to be statements inside the for loop, but I am facing a problem that when I put nested loop and when I receive this text I receive the for loop with it.
the problem is:
when I am receiving a statement like this:
for(){statements}
my analysis work on statements just fine but when I receive it like this:
for(){for(){statments}}
I have a problem because the inner loop comes with statements and this makes a problem for my way to analyze.

How I can ignore the inner for loop and take only statements like the way I take it when I have only a single loop, and write only above the outer loop.


this is the grammar that I use:
https://gofile.io/d/CLRikA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900