How to retrieve millions of rows from an oracle DB to a flat file using Python

Question

1.00/5 (2 votes)

See more:

I am planning to retrieve around 47M records from a Oracle DB to a flat file using python(Here I can not able to use other languages.). But it is taking nearly 2 hours to get it completed.

Can anyone please help me to get it done much in a faster way.

I have read a few articles regarding the same but could not be able to help.

Please find my code below which I have tried,

What I have tried:

import pandas as pd
import cx_Oracle
import datetime
import os

try:
    chunk_size = 1000000
    batch_no = 1
    start_time = datetime.datetime.now()
    print("Program Start Time:: ", start_time)
    dsn_tns = cx_Oracle.makedsn('HOSTNAME', 'PORT', service_name=r'SERVICE_NAME')
    conn = cx_Oracle.connect(user=r'USERNAME', password=r'PASSWORD', dsn=dsn_tns)
    query = """Select /*+ parallel(a,16) */ * from SCHEMA.TABLE_NAME a"""
    for chunk in pd.read_sql_query(query, con=conn, chunksize=chunk_size):
        chunk.to_csv(r'<path>\Test_Sample_Output_20230303_'+str(batch_no)+'.txt', sep='|',
                   quotechar='"', escapechar='\\', index=False, encoding='utf-8-sig')
        batch_no += 1
    end_time = datetime.datetime.now()
    print("Program End Time:: ", end_time)
    Exact_time_Taken = end_time - start_time
    print("Exact Time Taken :: ", Exact_time_Taken)

except cx_Oracle.DatabaseError as e:
    print("There is a problem with Oracle", e)

Posted 6-Mar-23 18:00pm

Bittu14

Add a Solution

Comments

Richard MacCutchan 7-Mar-23 4:03am

It's due to the laws of physics. 47,000,000 multiplied by: the time to retrieve the data, format it into an SQL response packet, convert it to a pands dataframe item, and convert that to a csv string, which then gets written to a flat file.

Dave Kreskowiak 7-Mar-23 7:43am

On top of what Richard has said, another factor limiting your speed is the size of each record, times 47 million. How much data is that? What's the speed of your network between the code and the Oracle server?

Say each record is 2K bytes. That results in about 96 GIGABYTES of data being transferred!!

...and you want that transferred in less than 2 hours? You better be on a 10Gb network or faster.

[no name] 7-Mar-23 12:54pm

Defragging the output drive and / or using bigger chucks might squeeze out a few more seconds or minutes.

Member 15627495 10-Mar-23 3:19am

as you do, you load 'all' records in your query, the param 'chunk_size' split the 'sql result' every "1 000 000 rows". divide that job again.

did you try with a dynamic query using 'LIMIT' statement, to make the sql engine working instead of python RT ?
because you are really carrying lot lot lot of work ( and Datas ) at "read_sql_query(query, con=conn, chunksize=chunk_size)" and by your 'Sql query' too.

did you identify the 'heavy consumer' of time ( and so computing ) ?
add time counter at each instructions, to have a concrete answer about 'big consumer of time'.

Member 15627495 10-Mar-23 3:39am

to achieve file writing in a better process, and avoid Operating System busy state, you must use a timelaps after the CSV writing.
the execution of writing operations will execute more softly, and avoid 'pressure' on the 'Os'. ( for a Time gain )
in python you have : sleep(n time value), explanation at

https://superfastpython.com/thread-sleep-in-python/

Member 10601191 18-Jul-23 7:20am

why panda !, if its a straight csv, Oracle has tools to do this directly, the expdp; utility is probably the most efficient method.

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)