Distinct query taking long time to run

Question

1.00/5 (1 vote)

See more:

Hi - I have this insert query in one of my proc to load the data into a target table where the distinct clause is making huge impact and taking too much time(since each table is having records more than 2 and 3 million records). Is their any way to optimize this select query with distinct?

What I have tried:

SQL

Insert into target table
SELECT  DISTINCT

        tab1.col1 col1,

        tab1.col2  col2,

        tab2.col3  col3,

        tab2.col4 col4,

        tab3.col5 col5,

        tab3.col6 col6,

        tab3.col7 col7,

        tab3.col8 col8,

        tab3.col9 col9,

        to_char(tab3.col7, 'YYYY-MM') col10,

        to_char(tab3.col7, 'YYYY') col11,

        to_char(tab3.col7, 'MM') col12,

        CURRENT_DATE  col13

      FROM tab3@dblink t2

        JOIN tab4@dblink ON tab3.col5 = tab4.col13

        JOIN tab5@dblink ON tab4.col14 = tab5.col15

        JOIN tab2@dblink ON tab5.col16 = tab2.col17

        JOIN tab1 ON tab1.col18 = tab2.col3;

Posted 10-May-18 8:05am

GJSS

Updated 10-May-18 9:54am

RickZeeland

v2

Add a Solution

Comments

[no name] 10-May-18 15:19pm

Not sure about this: But probably left join would help a Little bit. Otherwhise Keep in mind:
- Distinct over that much columns is a heavy Operation
- Make sure that the fields involved in the join are indexed

GJSS 10-May-18 23:54pm

What kind of index would be able to increase the speed of operation?

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Wendelius · Accepted Answer · 2018-05-10T09:56:00

Solution 1

What comes to the distinct problem, if you have lots of rows and you need to find non-duplicates based on multiple columns, it's going to take time and resources. If the selectivity is high then proper indexing would benefit you but if the query retrieves say more than 10% of the rows in the table then indexing isn't a viable option.

Based on the syntax, I take it this is Oracle? If that is correct, few things to consider:

- At the moment you utilize separate database links. This probably forces all rows to be transferred via the link before doing the join. If the links point to the same source database, consider creating a view in the source with the joins and the distinct and select only the view. This could remove the need of unnecessary data transfer between databases

- Ensure that sufficient amount of PGA is available, in other words that you're not limiting the size

- But at the same time ensure that the server isn't paging. If paging occurs, it will drastically slow down the query.

Posted 10-May-18 9:56am

Wendelius

v2

Comments

GJSS 10-May-18 16:36pm

Hi Wendelius - Am using same DB link which points to same database.Also we are referencing not only remote tables also the tables which are in local database. In that case, how come we create the view which includes the complete results of that select query?

Wendelius 10-May-18 23:23pm

If your using tables from both locations, you should ensure that a minimum number of rows is transferred. Try crating a view in the remote end which does filtering and joining based on the data that is present in that database. Also you can probably add the distinct operation also to that view so that the data is distinct already when selecting from the view. After that modify your query to select from the view via dblink and join it to the tables in the local database.

GJSS 10-May-18 23:51pm

Thank you Wendelius. If we create a view to store the query results which has join and distinct.,how come it will get refreshed with updated data everytime? here we are supposed to run this proc on weekly basis to load the incremental data in target table.
In that case, can we go with materialized view?

Wendelius 10-May-18 23:57pm

Materialized view could be an option if you for example query the view several times and possible changes in source data do not matter. However, why not start with a traditional view and see what happens. Afterwards make modifications if further adjustments are needed. With a normal view, you don't have to worry about refreshing the data. It's just a query so it will not store any data.

GJSS 11-May-18 16:00pm

Thank you, Wendelius.