So it looks like you want to find which files have errors according to the patterns.
What you have will give you
up to 4 lines in the log indicating which type of errors exist
somewhere in the file hierarchy.
This seems very uninformative!
Wouldn't it be better to know (at least)
which files have errors?
Wouldn't it be better to know
which errors are in each of those files?
Here's what I suggest.
(This may be totally off from your requirements, but what you have just feels like a lot of work for nearly zero information content.)
The way you have this structured the parallelization is
very inefficient, as it reads each file for each pattern, so each file is potentially scanned
five times!
Parallelize across the filenames.
This then means that the
Regex
instances need to be created and compiled ahead of the execution loops.
You also read the whole file each time. The patterns that you've shown do not span lines, so match against each line one at a time to avoid potentially lots of IO.
I've extracted the per-file checking to a function so it can eliminate redundant IO and
Regex
matching.
I also simplified the
Regex
patterns.
Something like
(my VB is rusty):
Dim patterns = New List(Of String()) From {
({"Check figure link", "(?<!>)(?:figures?|figs?\.) \d+"}),
({"Check table link", "(?<!>)(?:tables?|tabs?\.) \d+"}),
({"Check section link", "(?<!>)(?:sections?|sect?\.) \d+"}),
({"Check space", "</inline>\w+"})}
Dim compiledPatterns = New Dictionary(Of Regex, String)
For Each pat As String() In patterns
compiledPatterns.Add(New Regex(pat(1), RegexOptions.Compiled), pat(0))
Next
Dim pathsByMessage = New Dictionary(Of String, List(Of String))
For Each pat As String() In patterns
pathsByMessage.Add(pat(0), New List(Of String))
Next
Dim filteredFilenames = From tFile In Directory.EnumerateFiles(TextBox1.Text, "*.txt", SearchOption.AllDirectories)
Where tFile Like "*\#*\#*\#*\txt\#*.txt"
Dim output = From tFile In filteredFilenames.AsParallel
Let checks = CheckFile(tFile, compiledPatterns)
Where checks.Any
Select Path = tFile, Messages = checks
For Each pm In output
For Each msg In pm.Messages
pathsByMessage(msg).Add(pm.Path)
Next
Next
File.WriteAllLines(TextBox1.Text.TrimEnd("\"c) & "\Checklist.log",
From pbm In pathsByMessage
Where pbm.Value.Any
Select String.Format("""{0}""=={1}", pbm.Key, vbNewLine & String.Join(vbNewLine, pbm.Value)))
MsgBox("Process Complete")
And
Function CheckFile(tFile As String, compiledPatterns As Dictionary(Of Regex, String)) As List(Of String)
Dim messages As New List(Of String)
Dim checks = New HashSet(Of Regex)(compiledPatterns.Keys)
For Each line In File.ReadLines(tFile)
If Not checks.Any Then
Exit For
End If
For Each re In checks.ToList
If re.IsMatch(line) Then
messages.Add(compiledPatterns(re))
checks.Remove(re)
End If
Next
Next
Return messages
End Function