So i just installed pdf2docx and was going to try it out. Couldn't write a single line of code before getting errors all over the place.
This my entire, ultra-complex code:
from pdf2docx import parse
And it tells me:
ImportError: cannot import name 'Iterable' from 'collections' (C:\Users\spacemonkey\AppData\Local\Programs\Python\Python310\lib\collections\__init__.py)
UPDATE: I got it working, i just had to uninstall any trace of Python, then restart, then pip install pdf2docx again, then add the Script folder that got installed to to the system PATH, then restart.
But now, new problem. This is the code now:
from pdf2docx import parse
from typing import Tuple
def convert(input_file: str, output_file: str, pages: Tuple = None):
if pages:
pages = [int(i) for i in list(pages) if i.isnumeric()]
result = parse(pdf_file=input_file, docx_file=output_file, pages=pages)
summary = { "File": input_file, "Pages": str(pages), "Output File": output_file }
print("## Summary ##################################################")
print("\n".join("{}:{}".format(i, j) for i, j in summary.items()))
print("#############################################################")
return result
inp1 = str(input("Directory to PDF: "))
inp2 = str(input("Directory to save DOCX: "))
print(convert(input_file=inp1, output_file=inp2))
Running it, this is the entire console thing, with input and all:
Directory to PDF: "C:\Users\spacemonkey\Downloads\Elon Musk - Wikipedia.pdf"
Directory to save DOCX: "C:\Users\spacemonkey\Downloads\Elon Musk - Wikipedia.docx"
mupdf: cannot open "C:\Users\spacemonkey\Downloads\Elon Musk - Wikipedia.pdf": Invalid argument
Traceback (most recent call last):
File "c:\Users\spacemonkey\source\repos\SCRAM\python\pdftodocx.py", line 16, in <module>
print(convert(input_file=inp1, output_file=inp2))
File "c:\Users\spacemonkey\source\repos\SCRAM\python\pdftodocx.py", line 7, in convert
result = parse(pdf_file=input_file, docx_file=output_file, pages=pages)
File "C:\Users\spacemonkey\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pdf2docx\main.py", line 34, in convert
cv = Converter(pdf_file, password)
File "C:\Users\spacemonkey\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pdf2docx\converter.py", line 39, in __init__
self._fitz_doc = fitz.Document(pdf_file)
File "C:\Users\spacemonkey\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\fitz\fitz.py", line 3815, in __init__
_fitz.new_Document(
RuntimeError: cannot open "C:\Users\spacemonkey\Downloads\Elon Musk - Wikipedia.pdf": Invalid argument
PS C:\Users\spacemonkey>
What I have tried:
I haven't tried anything really, since i have no idea what so ever why this doesn't work. I assumed it would work since i just downloaded it, but it seems there is something more i need to install or import into the script to make it work.