That could be a bit of a big/broad question for the 'Quick Answers' section, I suspect there are many tools that would do the job for you, but its what you havnt mentioned that would determine overall approach and possibly 'tools' & language etc
Some questions that might be asked (ie, forming 'requirements') :-
- "large amount of documents" (how many ?)
- how much text is in each document ?
- what is the source of the documents - file system/web server/email/database (etc) ?
- why Excel or XML for output - what do you need to do with the extracted text, eg, search it, reformat it ?
- are you envisaging a batch process or a real-time/on demand process
- do you have a budget ? ie, can you pay for tools ?
- what are the time factors for delivering your 'project' I'll call it ?
- how are you going to track/trace documents extracted etc
(and probably lots more)
You say (of ByteScout & Aspose.PDF) "but I don't fully understand them" - we dont know your background and how much experience you have - if you're going to have to write and support something, you may be better off with a $$ product so you can use the product supplier for help & support - any decent SDK should also come with a number of examples/samples and support - this is a 'buy vs build' question
Answers/thoughts to/on some of those questions above might also suggest VB.Net for example over VBScript - ie, robustness, level of automation, ...
So, Im sorry, there's no 'best way' on the information you have shown - there could be lots of good ways and more bad ways - the extract 'tool' is only a small part of the solution
[edit : Added]
You could also 'outsource' the extraction to a bureau/service of course - you send them the PDF's and they send you back the data in the format you require - no coding required on your part !
[/edit]
[edit 2]
ok, I would 'start' with a solution that goes along the lines of the following, recognising that you may evolve some parts later on. Basically, it plays upon your strengths in (for example) VB.Net and VBScript and what I believe are their strengths, and developing a set of 'modules' - each 'module' as a simple purpose
Input Modules
a) write a set of 'input' modules - one for each type of input you have, for example
extract from email -> disk folder. May be VB.Net
copy from website folder -> disk folder. May Be VBScript Module
(manual) from mail ? scan
Each input module needs to be able to accept various parameters (command line) unique to how its getting its input - eg SMTP/email paramters, and the directory into which to place the PDF's
Processing Modules
b) write a 'core' PDF Extractor - Im suggesting VB.Net for this rather than VBScript - I think you'll find the power/flexibility/expressiveness suits the task - a console program, that reads from disk and extracts the text and stores the xml as a file on disk
The processing Module needs to be able to accept parameters (command line) where to read the PDF's from, where to put (for example) the XML output from the extraction
c) write a database loader module (or use SSIS or ...) that reads an XML file from (b) from disk and uploads into the database.
The database module/loader will need to be able to accept (command line) parameters to indicate where the XML files are, and how to connect to the DB
VBScript is used like 'DOS Batch' language - a 'glue' to bind everything together .. it :-
- runs each of the input modules
- for each PDF File on disk, runs the PDF extractor
- for each XML file runs the upload to DB module
- runs any audit steps
- can be scheduled or run manually
Keeping things as separate modules means for example something written in VBScript can be upgraded/replaced with something written in VB.Net or C# or even c++ later on. Obviously, some inputs to the modules can be command-line, some you may wish to read from config-type files
[/edit 2]
Updated 18-May-16 15:02pm
v5