Click here to Skip to main content
15,890,717 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I am writing a small tool to give me some code stats. Most of what I need is written, but I still need to identify commented-out lines of code. Identifying comments is not hard, but I specifically want to separate genuine comments from old commented out code.

I'm thinking along the lines of

Regex ccRegex = new Regex(C#pattern, RegexOptions.Ignorecase);
IEnumerable<string> commentedCode = linesInFile.Where(l => ccregex.IsMatch(l);

linesInFile is a string array of the lines of text read from the source file.

The question is: what should the string 'C#pattern' be?

Can this be done with a single Regex?

TIA
Posted
Comments
OriginalGriff 18-Aug-10 13:55pm    
"something that was mostly right would suffice."
It's not a case of "mostly right" - it's how much can you afford to be "mostly wrong"?
Neil Haughton 19-Aug-10 3:53am    
In this case, probably quite a lot. I'm identifying all the comment lines, and within those I want to identify as closely as practical which are the commented-out code and which are genuine comments. This will tell me(broadly) how extensively the code is commented, and how progress is being made cleaning up unwanted (dead) code, two metrics I need to report. It's a big application so simply reading the code to achieve this is not practical. So if I can apply some process to a commented line to see if it is very probably old code, that will suffice. It doesn't have to be precisely definitive, and I can see from your explanation why that would be very difficult to achieve - although the C# compiler clearly does exactly what I am suggesting in order to compile the code in the first place. If I can apply the same process to commented lines I will be home and dry.

1 solution

"Can this be done with a single Regex?"

No.


To expand a little, in theory it could be done. But in practice the regex would be so complex that it would take more time to develop and debug than it is worth. Think about it:
// Console.WriteLine
Is that a comment? Or commented out code?
If we look at the following line
// Console.WriteLine
//      ("TestString");

Then yes, it is commented out code.
If we look at a different following line:
// Console.WriteLine
//      Prints a message on the console

Then it isn't.
Now, write a regex that works for those two. Now replace "Console" with "MyClass" and write one that handles that as well...
 
Share this answer
 
Comments
Sandeep Mewara 18-Aug-10 12:05pm    
Comment from OP:
Okay, it would be a challenge to be precise and complete. I can see that, but something that was mostly right would suffice.

Is there an alternative approach I could try using something in the CodeDom napespace, perhaps?
Neil Haughton 18-Aug-10 12:58pm    
Okay, it would be a challenge to be precise and complete. I can see that, but something that was mostly right would suffice.

Is there an alternative approach I could try using something in the CodeDom napespace, perhaps?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900