Click here to Skip to main content
15,890,438 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi,

I have numerous of tasks which run and process files and its possible that multiple task can process same file (parsing etc..).

I want to process the file handling in such a way that we can know that particular file is processed by particular task and next time it won't process the same file.

How could i achieve that?? Any idea we can distinguish two files which have same name??

What I have tried:

Should we distinguish it on file size or MD5 hash though i am not sure. Please let me know
Posted
Updated 3-May-16 0:52am
Comments
Mohibur Rashid 3-May-16 5:54am    
Create another temporary file upon finishing the task

1 solution

In the past I've approached this in three slightly different ways

1. Each process had it's own folder for handling the files "owned" by that process. Files were moved from the central drop area into the processing folder, then once that process was finished the file was physically moved into a "done" folder.

2. A file watcher looked for new files entering the drop zone and added their name to a database. In this instance we did use MD5 hashing to check for duplicate files being sent. If the files "have the same name" then they obviously wouldn't be in the same folder, so that is not an issue. Filenames were generated by the FTP process in our case, so we had to do something to check that the content had not already been processed.

3. A "lock" file was generated for a file containing the name (PID) of the process that was claiming that file.

The most successful method for us was number 1 - it meant that it was easy to restart the process in the event of failure. Method number 2 makes for easier reporting (if required). Method 3 could be fiddly to restart in the event of failures.

If I was doing it again I'd probably use a combination of all 3 of these methods for robustness, ease of restart and ease of reporting/auditing.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900