Click here to Skip to main content
15,917,565 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi - I have a requirement to match two different columns and delete the matching records.

say in column A, we have some 60000 records(contains duplicate) and in column B we have 900 records(unique).

Here i need to match(not row by row) 900 records (Col B) with that of 60000 records(col A) to know the matching counts (including duplicate records) in column A.

Could you please help me on this?

What I have tried:

i tried with some vlookup ,it doesnt work out
Posted
Updated 27-Mar-18 9:11am
Comments
RedDk 27-Mar-18 14:07pm    
Ok, here's a tip from a MSOffice adherent ... Excel is a spreadsheet application and Access is a database application. Because there are similarities between the two, mainly the external appearance of "cell" or "range" to "rowset" or "column"/"row", a new user becomes quickly confused.

If you are in need of an editing facility that will do calculations with digital input ... use Excel. If you are in need of relational manipulations based upon fields and rows ... use Access.

All this is clear in the MS Office documentation. The fact that you are mentioning "record" leads me to think Access would be of more use in this case.
GJSS 27-Mar-18 14:41pm    
Can you assist me in using Access?
RedDk 27-Mar-18 15:33pm    
Ok. Start by opening Access and attempting to use it to open your spreadsheet. This is a one-way process, by the way. There's no opening an Access database using Excel.
GJSS 5-Apr-18 11:17am    
Hi - I have accomplished this task by using the subquery running in SQL view in access DB.
Thanks for your assistance.

This is the typical situation where Excel is just not the right tool.
Your problem is typical for a database like Access or any other database engine.

It is impossible to help you without exact specification of your problem and of the data.
And it is impossible to teach you database in the space of this forum, and few directions will never be enough.

Advice: learn databases or find someone to do the job.
 
Share this answer
 
v2
Comments
Maciej Los 27-Mar-18 16:00pm    
Who downvoted your answer? Upvoted to balance unfair vote!
Patrice T 27-Mar-18 16:08pm    
Thank you
It's easy to achieve by using array formula[^]. How to: Create an array formula - Office Support[^].


I'd suggest to move unique data into - let's say - Sheet2 - column A. Then in Sheet1 (with non-unique data) - column B add the following formula (assuming that a first row is a header row):
= SUM(IF(A2=Sheet2!A$2:A$901,1,0))
Accept it by pressing combination of keys: Ctrl+Shift+Enter
Fill-down above formula till the end of data in Sheet1.

This will return the number of occurencies: 0 - none (new - unique), >=1 - exits (duplicated)

Finally, add filter to column B (higher than zero) and remove duplicated data ;)

Good luck!
 
Share this answer
 
Comments
Patrice T 27-Mar-18 15:33pm    
The problem with those formulas is that runtime grow geometrically with data size.
Maciej Los 27-Mar-18 15:39pm    
For such portion of data, a risk of unrestricted grow of RAM consumption is minimal. I worked out with much bigger data set.
;)
Patrice T 27-Mar-18 15:43pm    
For such problems, I always switch to database.
Maciej Los 27-Mar-18 15:57pm    
As to me - it depends... I'm constructively lazy, so i use that tool which is more-handy to me. ;)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900