The whole thing reminds me of a tipical pattern I regularly use. First your program should start by composing the list of files by popping up a file open dialog. You might let the user to do this multiple times to add files from different locations. When you are finished and have your 100 or whaterver number of files you can do this:
Create a list of the files (array, vector, whatever). This list is readonly. Initialize an integer index to -1. You start lets say 10 threaads running the same code that does the following: Calls the
InterlockedIncrement()[
^] winapi function or the
__sync_add_and_fetch()[
^] gcc builtin on the index. This increments the index and returns the incremented value. The thread uses the returned index to pick an item from the readonly array for processing, and when the processing is done it repeates the previously described pattern until the index is bigger or equal to the size of the array. When the index reached the size of the array, the thread terminates. The main thread does nothing else just waits for the worker threads to exit.
I think this kind of threading is much easier to implement without errors especially by a threading newbie, not to mention that its ususally faster than locking/unlocking a mutex.
Further place for optimization: Lets say you have 1 million items and only 10 threads - then you can speed up this kind of multithreading by putting more than one item into one slot in your array (lets say 100), thus increasing the size of a single job. This speeds up the stuff because this way you need to do much less thread synchronization calls to InterlockedIncrement() or other kind of locks that are ususally considered very slow operations.
For multithreaded bulk processing on a fixed prepared list consider using this technique.
Note: InterlockedIncrement() on windows expects a LONG parameter so you might need a cast if you use other 32bit integer type.