I have a list of keywords (sometimes with non-alphanumeric characters) that I’d like to find in a list of files. I can do that with the code below, but I want to avoid matching keywords if they are found inside another word, e.g.:
Keywords.csv:
Keywords
Lo.rem
<-- Match if not prefixed by nor suffixed with a letter
is
<-- Same
simply)
<-- Match if not prefixed by a letter
printing.
<-- Same
(text
<-- Match if not suffixed with a letter
-and
<-- Same
Files.csv:
Files
C:\AFolder\aFile.txt
C:\AFolder\AnotherFolder\anotherFile.txt
C:\AFolder\anotherFile2.txt
What I have tried:
Here's my code so far if useful:
$keywords = (((Import-Csv "C:\Keywords.csv" | Where Keywords).Keywords)-replace '[[+*?()\\.]','\$&')
$paths = ((Import-Csv "C:\Files.csv" | Where Files).Files)
$count = 0
ForEach ($path in $paths) {
$file = [System.IO.FileInfo]$path
Add-Content -Path "C:\Matches\$($count)__$($file.BaseName)_Matches.txt" -Value $file.FullName
$hash = @{}
Get-Content $file |
Select-String -Pattern $keywords -AllMatches |
Foreach {$_.Matches.Value} |
%{if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1} |
Out-File -FilePath "C:\Matches\$($count)__$($file.BaseName)_Matches.txt" -Append -Encoding UTF8
$count = $count +1
}
I’ve tried playing with regex negative lookahead/lookbehind but did not get anywhere, especially since I’m a beginner in PowerShell, e.g.:
Select-String -Pattern "(?<![A-Za-z])$($keywords)(?![A-Za-z])" -AllMatches
Any suggestions? Much appreciated