Selecting duplicating records

Question

0.00/5 (No votes)

See more:

Hi new to this site, hoping someone might be able to help me with this SQL question.

So I have a table called T_EE_EVALUATIONS that has duplicate records within in it. The table has the following structure below. I need to find the duplicate address where the house_id is not the same as it's duplicate address. Neither ID's are P keys. I tried this but it doesn't really work the way I want it to work.

SQL

Select E.eval_id , E.house_id , E.ClientCity, E.ClientAddr, E.Houseregion, T.eval_id , T.house_id, T.ClientCity , T.ClientAddr , T.Houseregion
From T_EE_EVALUATIONS E INNER JOIN
(Select eval_id , house_id,ClientCity, ClientAddr, Houseregion, COUNT(*) AS CountOf
FROM T_EE_EVALUATIONS GROUP BY eval_id , house_id,ClientCity, ClientAddr, Houseregion-- HAVING COUNT(*)>1
) T
ON E.ClientCity || E.ClientAddr || E.Houseregion = T.ClientCity || T.ClientAddr || T.Houseregion

eval_id  house_id  ClientCity  ClientAddr      Houseregion
------------------------------------------------------------
12345    56455     Quebec      34 Anyplace     Quebec
12345    23765     Quebec      34 Anyplace     Quebec
12345    94213     Quebec      34 Anyplace     Quebec
13456    23456     Russell     214 mystreet    Ontario
13456    33456     Russell     214 mystreet    Ontario
14526    14567     Quispamsis  456 Thatstreet  New Brunswick
14526    13567     Quispamsis  456 Thatstreet  New Brunswick

What I have tried:

SQL

Select E.eval_id , E.house_id , E.ClientCity, E.ClientAddr, E.Houseregion, T.eval_id , T.house_id, T.ClientCity , T.ClientAddr , T.Houseregion
From T_EE_EVALUATIONS E INNER JOIN
(Select eval_id , house_id,ClientCity, ClientAddr, Houseregion, COUNT(*) AS CountOf
FROM T_EE_EVALUATIONS GROUP BY eval_id , house_id,ClientCity, ClientAddr, Houseregion-- HAVING COUNT(*)>1
) T
ON E.ClientCity || E.ClientAddr || E.Houseregion = T.ClientCity || T.ClientAddr || T.Houseregion

Posted 13-Nov-19 4:37am

greatwhite1

Updated 14-Nov-19 0:57am

MadMyche

v2

Add a Solution

Comments

ZurdoDev 13-Nov-19 10:50am

Don't group by house_id because that is not part of the duplicate data.

greatwhite1 13-Nov-19 11:44am

How do I include house_id? I have to have that so I can see if the house_id is different

ZurdoDev 13-Nov-19 11:48am

One way is to use a derived table. Your derived table pulls groups on just the address and then join back to that based on the address fields.

If you want to delete all but one for each house there are simple scripts you can find online to do that.

greatwhite1 13-Nov-19 12:28pm

No once I find the records I won't be deleting them they have to have their house_id updated to the original. From my understanding before I started here something caused some addresses to be given a new house ID , Duplicate data is normal but they should all have the same house_id. I know weird but I didn't build it, just have to live with it.

ZurdoDev 13-Nov-19 13:49pm

1. Reply to the comment so that I am notified instead of posting a new comment.
2. Do something like this
SELECT *
FROM table t
LEFT JOIN
(
SELECT addr1, addr2, ...
FROM table
GROUP BY addr1, addr2, ...
HAVING COUNT(*) > 1
) x ON t.addr1 = x.addr1 AND t.addr2 = x.addr2, ...

greatwhite1 19-Nov-19 8:26am

Tried the solutions and tried modifying them to suit my needs. I beginning to think what I am trying to do might not be possible.

I need every address that is a duplicate. and I need it's eval_id and house_id. Once I isolate these I will review them to see why they are duplicates. The current queries can't be right because the return of records is more than is possible.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Maciej Los · Answer 1 · 2019-11-14T00:58:00

Try to get all duplicates for specific eval_id and then to join house_id:

SQL

SELECT t1.*, t2.house_id 
FROM 
(
    SELECT eval_id, ClientCity, ClientAddr, Houseregion
    FROM T_EE_EVALUATIONS 
    GROUP BY eval_id, ClientCity, ClientAddr, Houseregion
    HAVING COUNT(*)>1
) AS t1 INNER JOIN T_EE_EVALUATIONS AS t2 ON t1.eval_id = t2.eval_id AND t1.ClientCity = t2.ClientCity AND t1.ClientAddr = t2.ClientAddr AND  t1.Houseregion = t2.Houseregion