Click here to Skip to main content
15,891,657 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I'm currently using fso (FileSystemObject) to search multiple strings of content of PDF's. It works fine when reading less than 1,000. But when searching over 5,000 it slows down to 5-10 minutes on a search. Any ideas or help is greatly appreciated. This is the code I'm using:

<% Server.ScriptTimeout = 100000000 %>
<%
'Search Text
Dim strtextToSearch
strtextToSearch = Request("TextToSearch")

'Now, we want to search all of the files
Dim fso

'Constant to read
Const ForReading = 1
Set fso = Server.CreateObject("Scripting.FileSystemObject")

'Specify the folder path to search.
Dim FolderToSearch
FolderToSearch = "C:\inetpub\mysite\Files\allpdfs\"

'Proceed if folder exists
if fso.FolderExists(FolderToSearch) then

Dim objFolder
Set objFolder = fso.GetFolder(FolderToSearch)

Dim objFile, objTextStream, strFileContents, bolFileFound
bolFileFound = False

Dim FilesCounter
FilesCounter = 0 'Total files found

For Each objFile in objFolder.Files
Set objTextStream = fso.OpenTextFile(objFile.Path,ForReading)
'Read the content
strFileContents = objTextStream.ReadAll
If InStr(1,strFileContents,strtextToSearch,1) then
%>

<%
Response.Write objFile.Name & "

"
FilesCounter = FilesCounter + 1
End If
objTextStream.Close
Next

if FilesCounter = 0 then
Response.Write "Sorry, No matches found."
else
Response.Write "Total files found : " & FilesCounter
end if

'Destroy the objects
Set objTextStream = Nothing
Set objFolder = Nothing

else
Response.Write "Sorry, invalid folder name"
end if
Set fso = Nothing
%>
Posted
Comments
Patrice T 27-Oct-15 13:58pm    
1,000 and 5,000 what ?
Member 10807414 27-Oct-15 14:27pm    
5,000 pdf's.
onelopez 27-Oct-15 14:10pm    
if these files aren't updated as often, might want to put their contents in a database and then query against said database. it would be a lot faster to search 5000 records in db than it is to open each file and look at the string content.
Member 10807414 27-Oct-15 14:29pm    
They are archived information and not updated continuously at all.
Member 10807414 27-Oct-15 17:50pm    
FYI, I tried turning indexing on the server and still wouldn't speed the process. I noticed that anything under 1,000 pdf's is searchable at an acceptable time (less than a minute) but anything over, makes it a long wait.

Thank you in advance,

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900