How do i...check for duplicate entries on mysql DB while adding new row

Question

0.00/5 (No votes)

See more:

I currently use the following SQL Stored Procedure to add patient information. The stored procedure checks if the patient already exist on the DB. It is important to understand that the DB will store millions rows.

SQL

CREATE DEFINER=`rootuser`@`%` PROCEDURE `SP_patAddDetails`(
IN pFileNo 			VARCHAR(11), 
IN pKey 			VARCHAR(1),
IN pTitle 			VARCHAR(15),
IN pFirstName 		VARCHAR(30),
IN pInitials 		VARCHAR(5),
IN pSurname 		VARCHAR(30),
IN pIDNo 			VARCHAR(15),
IN pDOB 			VARCHAR(10),
IN pGender  		VARCHAR(8),
IN pLanguage 		VARCHAR(15),
IN pOccupation 		VARCHAR(30),
IN pDependantCode 	VARCHAR(2),
IN pRelationship 	VARCHAR(15),
IN pMemNo 			VARCHAR(20)
)
BEGIN
START TRANSACTION;

INSERT INTO patDetails(patFileNo, patKey, patTitle, patFirstName, patInitials, 
                       patSurname, patIDNo, patDOB, patGender, patLanguage,
                       patOccupation, patDependantCode, patRelationship, 
                       patMemNo, patUpdated) 
/* to avoid duplicate entries during the patient add process 
   check to see if patient already exist in the patDetails table*/		
        
        SELECT * FROM (SELECT pFileNo, pKey, pTitle, pFirstName, pInitials, 
                              pSurname, pIDNo, pDOB, pGender, pLanguage, 
                              pOccupation, pDependantCode, pRelationship, 
                              pMemNo, now()) AS tmp
        
        WHERE NOT EXISTS (
			SELECT patFileNo, patKey, patTitle, patFirstName, patInitials, patSurname
			FROM   patDetails 
			WHERE  patFileNo = pFileNo
			AND    patKey = pKey
			AND    patTitle = pTitle
			AND    patFirstName = pFirstName
			AND    patInitials = pInitials
			AND    patSurname = pSurname
        ) 
        LIMIT 1;
                
/*        VALUES (pAccKey, pKey, pTitle, pFirstName, pInitials, pSurname, pIDNo,
                pDOB, pGender, pLanguage, pOccupation, pDependantCode,
                pRelatioship, pMemNo, NOW);
                SELECT LAST_INSERT_ID()
*/
COMMIT;

SELECT LAST_INSERT_ID()

COMMIT;
END

There are three things I would like to know:

1. Can this MySQL procedure be improved to achieve better performance?

2. There are more information that needs to be stored apart from what is currently displayed in the SP and it is also stored in different tables e.g: Contact Info Table, Address Info Table, Medical Aid Table, etc.
Is there a way where I can test for a "false" (no duplicate entry) response, get the LAST_INSERT_ID to use as an Index when adding the other information,

3. Is it possible to put everything into a single Stored Procedure ?

What I have tried:

Stored Procedure as shown in question.

Posted 23-Jun-16 3:12am

Tino Fourie

Updated 23-Jun-16 4:46am

Add a Solution

Comments

[no name] 23-Jun-16 9:37am

brute force: Insert it and handle error situation. This of course needs a unique key over your fields.

Tino Fourie 23-Jun-16 9:43am

I understand column constraints and their uses but unfortunately I can't use that because I have to handle inconsistencies in the information about a patient.

How does brute force improve performance ?
How do you combine all the other Insert's into a single Stored Procedure and validate there is not a duplicate entry ?

[no name] 23-Jun-16 10:03am

Maybe I do not understand all in Details ;)
What I understand from your example is, that " patFileNo, patKey, patTitle, patFirstName, patInitials, patSurname" Needs to be unique. So I would define a Unique Key for These fields and insert. Insert will anyway check then and throw an error condition if you violate this condition.... so what is the Need that I do the same check by myself?

It looks/sounds like your current DB does not allow to establish this Unique Key at the Moment because you have allready inconsitent data in it. If it is like this, I would really try to clean up this first...

Tino Fourie 23-Jun-16 10:19am

Thank you for your reply and explanation. With regards to the inconsistencies of the information, example:
I could have a records like: AAA BBB CCC DDD EEE,
then I could have information on the same patient like: AAA BB1 CCC DD2 EEE.
AAA - being the ID No (SSN / PPSN) number of the individual.
This inconsistency is because the information comes from different sources and because people don't care much about the correctness of information I could end up with a patient with no Initials but a corresponding ID No.

If I place a NN constraint on the Initials column, and because the software used to capture patient information does not check for "blank" on the Initials field, I could potentially skip this patient where in comparison I could add the information and have the user update the information.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CHill60 · Accepted Answer · 2016-06-23T04:46:00

Solution 1

1. The way you are searching for duplicates can be improved by using a Unique index on the fields you don't want duplicated - see MySQL Handling Duplicates[^]

But as you imply, user supplied information can have inconsistencies but I don't think there is much you can do about that when inserting the data. You could identify near matches to the data that is about to be entered (e.g. see php - Searching a single MySQL text column with fuzzy matching - Stack Overflow[^]) and present that list to the User ... along the lines of "we already have someone on the system with those details, is your patient one of these people..." - I will tell you however that the Users will still mess this up (experience speaking here :-().

2. Yes - reference documentation here - MySQL :: MySQL 5.7 Reference Manual :: 25.8.15.3 How to Get the Unique ID for the Last Inserted Row[^]

3. Yes. But remember that Stored Procedures should have a single purpose and you might get bogged down with a lot of parameters

Posted 23-Jun-16 4:46am

CHill60

Comments

Tino Fourie 23-Jun-16 11:49am

CHill, thank you for your reply.

I completely agree with your comment on users. Sadly the software they are using to initially capture the patient information does not enforce data validation... will hold the software vendor responsible for poor design :P

1. It takes me roughly about 2 hours to write 42000 records (consisting of 47 fields) to a web DB over a 2mb line to make sure that even the slowest of internet connections will achieve the desired results. The 2 hours it takes to write 42000 records is against an already populated DB of 42000 records (yes, I have written to an empty DB, then write the same 42000 records to the now populated DB).

I am handling duplicates in my WinForms app instead of a DB procedure or trigger, where I use currID (current ID) and prevID (previous ID) getting the latest DB ID with the LAST_INSERT_ID() (see provided stored procedure code).

2. Already using LAST_INSERT_ID() (see provided stored procedure code). I was wondering if one could use an IF...ELSE statement to check for a new DB ID with LAST_INSERT_ID(). If a new AUTO_INCREMENT ID was generated then INSERT the rest of the data in their respective Tables.

3. Would you still advise to handle the logic inside the WinForms app instead of having a long winded Stored Procedure ?
e.g:

IF currID <> prevID then
'New patient added, store rest of patient information
'Set prevID to latest AUTO_INCREMENT ID
END IF
Next Patient

Would you also go with the "Brute Force" method as suggested earlier ?

[no name] 23-Jun-16 12:32pm

"Brute Force" is probably a bad Name I choosed. It means simply let the "SQL Engine" check, whether it allready does exists or not.... and with proper error handling this should be ok.

Anyway I'm looking Forward to the comment of CHill60 on this.

Tino Fourie 23-Jun-16 13:03pm

Oh no I totally agree with your description of a straight forward INSERT with column constraints. Let the DB decide, could be a much faster and more concrete way of doing it... instead of trying to reinvent the validation wheel. :)

CHill60 23-Jun-16 18:05pm

It's a good name! Yes - let the SQL Engine do the work ... Stored Procedures working on the server end will take the load off the 2mb line.
I'll have another look through your questions in the morning when I'm a bit more awake (it's 23:00hrs here and I really shouldn't have looked at my messages!)

Tino Fourie 23-Jun-16 17:51pm

After a more thorough look at the data, it is clear that I can not use the "Brute Force" method because the only unique identifier in the data (patIDNo) is not always completed by the user. And as mentioned before the software used to capture the data does not validate the data for completeness. For example, the patIDNo will either be blank or consist of the Date of Birth. Out of 42000 records read, 15800 records had a blank patIDNo, 14200 records had an incomplete patIDNo and 1018 records exceeded the maximum allowed digits for an ID No.

The first reply I got seemed to have been the most obvious choice IF data validation were done properly by the software used to capture the data. CHill, in a way, confirmed this by his Solution.

To both, thank you for your input and advice.