I am planning to retrieve around 47M records from a Oracle DB to a flat file using python(Here I can not able to use other languages.). But it is taking nearly 2 hours to get it completed.
Can anyone please help me to get it done much in a faster way.
I have read a few articles regarding the same but could not be able to help.
Please find my code below which I have tried,
What I have tried:
import pandas as pd
import cx_Oracle
import datetime
import os
try:
chunk_size = 1000000
batch_no = 1
start_time = datetime.datetime.now()
print("Program Start Time:: ", start_time)
dsn_tns = cx_Oracle.makedsn('HOSTNAME', 'PORT', service_name=r'SERVICE_NAME')
conn = cx_Oracle.connect(user=r'USERNAME', password=r'PASSWORD', dsn=dsn_tns)
query = """Select /*+ parallel(a,16) */ * from SCHEMA.TABLE_NAME a"""
for chunk in pd.read_sql_query(query, con=conn, chunksize=chunk_size):
chunk.to_csv(r'<path>\Test_Sample_Output_20230303_'+str(batch_no)+'.txt', sep='|',
quotechar='"', escapechar='\\', index=False, encoding='utf-8-sig')
batch_no += 1
end_time = datetime.datetime.now()
print("Program End Time:: ", end_time)
Exact_time_Taken = end_time - start_time
print("Exact Time Taken :: ", Exact_time_Taken)
except cx_Oracle.DatabaseError as e:
print("There is a problem with Oracle", e)