Click here to Skip to main content
15,921,028 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hi - I have this insert query in one of my proc to load the data into a target table where the distinct clause is making huge impact and taking too much time(since each table is having records more than 2 and 3 million records). Is their any way to optimize this select query with distinct?

What I have tried:

SQL
Insert into target table
SELECT  DISTINCT

        tab1.col1 col1,

        tab1.col2  col2,

        tab2.col3  col3,

        tab2.col4 col4,

        tab3.col5 col5,

        tab3.col6 col6,

        tab3.col7 col7,

        tab3.col8 col8,

        tab3.col9 col9,

        to_char(tab3.col7, 'YYYY-MM') col10,

        to_char(tab3.col7, 'YYYY') col11,

        to_char(tab3.col7, 'MM') col12,

        CURRENT_DATE  col13

      FROM tab3@dblink t2

        JOIN tab4@dblink ON tab3.col5 = tab4.col13

        JOIN tab5@dblink ON tab4.col14 = tab5.col15

        JOIN tab2@dblink ON tab5.col16 = tab2.col17

        JOIN tab1 ON tab1.col18 = tab2.col3;
Posted
Updated 10-May-18 9:54am
v2
Comments
[no name] 10-May-18 15:19pm    
Not sure about this: But probably left join would help a Little bit. Otherwhise Keep in mind:
- Distinct over that much columns is a heavy Operation
- Make sure that the fields involved in the join are indexed
GJSS 10-May-18 23:54pm    
What kind of index would be able to increase the speed of operation?

1 solution

What comes to the distinct problem, if you have lots of rows and you need to find non-duplicates based on multiple columns, it's going to take time and resources. If the selectivity is high then proper indexing would benefit you but if the query retrieves say more than 10% of the rows in the table then indexing isn't a viable option.

Based on the syntax, I take it this is Oracle? If that is correct, few things to consider:

- At the moment you utilize separate database links. This probably forces all rows to be transferred via the link before doing the join. If the links point to the same source database, consider creating a view in the source with the joins and the distinct and select only the view. This could remove the need of unnecessary data transfer between databases

- Ensure that sufficient amount of PGA is available, in other words that you're not limiting the size

- But at the same time ensure that the server isn't paging. If paging occurs, it will drastically slow down the query.
 
Share this answer
 
v2
Comments
GJSS 10-May-18 16:36pm    
Hi Wendelius - Am using same DB link which points to same database.Also we are referencing not only remote tables also the tables which are in local database. In that case, how come we create the view which includes the complete results of that select query?
Wendelius 10-May-18 23:23pm    
If your using tables from both locations, you should ensure that a minimum number of rows is transferred. Try crating a view in the remote end which does filtering and joining based on the data that is present in that database. Also you can probably add the distinct operation also to that view so that the data is distinct already when selecting from the view. After that modify your query to select from the view via dblink and join it to the tables in the local database.
GJSS 10-May-18 23:51pm    
Thank you Wendelius. If we create a view to store the query results which has join and distinct.,how come it will get refreshed with updated data everytime? here we are supposed to run this proc on weekly basis to load the incremental data in target table.
In that case, can we go with materialized view?
Wendelius 10-May-18 23:57pm    
Materialized view could be an option if you for example query the view several times and possible changes in source data do not matter. However, why not start with a traditional view and see what happens. Afterwards make modifications if further adjustments are needed. With a normal view, you don't have to worry about refreshing the data. It's just a query so it will not store any data.
GJSS 11-May-18 16:00pm    
Thank you, Wendelius.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900