Click here to Skip to main content
15,889,852 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I am getting binary values in both the directory files (before conversion, after conversion) BUT I want the original file name to remain the same after conversion.

What I have tried:

I am working on a machine learning project, I have successfully found a way to convert using the 'Textract' library. I made two directories... one containing the files to be converted and one with the files after conversion.
I just want to know how can I get the original name after conversion?

Code
import os
import uuid
import textract
source_directory = os.path.join(os.getcwd(), "C:/Users/syedm/Desktop/Study/FOUNDplag/Plagiarism-checker-Python/mainfolder")

for filename in os.listdir(source_directory):
    file, extension = os.path.splitext(filename)
    unique_filename = str(uuid.uuid4()) + extension
    os.rename(os.path.join(source_directory,  filename), os.path.join(source_directory, unique_filename))

training_directory = os.path.join(os.getcwd(), "C:/Users/syedm/Desktop/Study/FOUNDplag/Plagiarism-checker-Python/trainingdata")

for process_file in os.listdir(source_directory):
    file, extension = os.path.splitext(process_file)

    # We create a new text file name by concatenating the .txt extension to file UUID
    dest_file_path = file + '.txt'

    # extract text from the file
    content = textract.process(os.path.join(source_directory, process_file))

    # We create and open the new and we prepare to write the Binary Data which is represented by the wb - Write Binary
    write_text_file = open(os.path.join(training_directory, dest_file_path), "wb")

    # write the content and close the newly created file
    write_text_file.write(content)
    write_text_file.close()
Posted
Updated 19-Feb-21 3:22am

1 solution

Because you are renaming the filenames as you list them. Why not create a function that takes a source filename and a destination filename to do the conversion. Then your main code just needs to use os.listdir to get the source files, create the destination names and pass the two values to the function.
Python
def convert(source, destination):
    # add the code here to convert ...

# then you just need 
for filename in os.listdir(source_directory):
    file, extension = os.path.splitext(filename)
    unique_filename = str(uuid.uuid4()) + extension
    source = os.path.join(source_directory,  filename)
    dest = os.path.join(source_directory, unique_filename)
    convert(source, dest)
 
Share this answer
 
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900