I have a dataset whose directory structure is extremely abysmal.A sample of the directory structure is shown below
Data
----1
------Jpeg
----------<arbitary-name>.jpg
-----------<arbitary-name>(1).jpg
----------<arbitary-name>(2).jpg
----------<arbitary-name>(3).jpg
----2
--------Jpeg
----------<arbitary-name>.jpg
----3
-------Jpeg
--------<arbitary-name>.jpg
-------<arbitary-name>(1).jpg
-------<arbitary-name>(2).jpg
-------<arbitary-name>(3).jpg
.
.
.
.
.
.
.
67
----Pose and expression change
------<arbitary-file-name>(1).jpg
-------<arbitary-file-name>(2).jpg
-------<arbitary-file-name>(3).jpg
----Reference Image
-----<arbitary-file-name>.jpg
Note that this is not the exact data strucutre.
for example from folder 1-15 might have one sub folder JPEG in which there are 3 or more images with extremely long file names.
and then from folder 20 to 25 each folder will have two sub folders similar to the one shown in folder 67
In order to make the dataset more consitent for further processing,I want to reorganize the folder as follows
data
----1
-----caucasian_male_1_[x].jpg
------[if folder 1 contains 3 images then x belongs to [0,1,2]
----2
----3
----4
----5
---6
.
.
.
.
.
.
.
----500
------caucasian_male_500_[x].jpg
{Again if folder 500 contains 4 jpg files then x varies from 0 to 3[0,1,2,3].
MY platform is windows and I am trying to come up with a solution to automate the process in python.Any suggestions on how to reorganise the data folder will be welcome.
What I have tried:
I am currently using the following code available on github
GitHub - weisslj/dir-edit: Rename or remove files in a directory using an editor[
^]
But this is not efficient as you need to specify the directory everytime ,and then open a text file to edit the file name.
if the directory has say 1000 subfolders and each folder has 3 or more images,then that is not efficient.
I have also tried the following method .
It lists all the directories in the folder and is able to print the file names,but somehow I don't think this is correct
import os
import shutil
files_l=[]
for root ,dirs,files in os.walk('D:/dataset/'):
for dire in dirs:
file=os.path.join(root,dire)
files_l.append(file)
files_l=sorted(files_l)
x='D:/synthetic_photo/synthetic_data\\1\JPEG'
images=[]
ctr=0
for x in files_l:
for file in os.scandir(x):
new_path=os.path.join(x,'my_image_name.jpg')
old_path=os.path.join(x,file)
os.rename(old_path,new_path)