Compare two pandas dataframes of different lengths and find the row index of the matches?

Question

0.00/5 (No votes)

See more:

I have two dataframes. The smaller one looks like this

Compounds
0	a-viSvasan
1	SaSAfka-vakwre
2	ni-veSanam
3	SAswra-niwyAH
4	mArga-AyAsasya

This has about 10,000 lines. The bigger dataframe on the other hand looks like this,

<blockquote class="quote"><div class="op">Quote:</div>File Name	Source/Story	Raw Sentence	Components	Compound Word	Tag	Position	Clean Context	Total No Of Compounds	WX_Compounds
0	aBI_samasa.txt	वीरः अभिमन्युकुमारः	वीरः <अभिमन्यु-कुमारः>K1	अभिमन्यु-कुमारः	?	K1	1	वीरः अभिमन्यु-कुमारः	1.0	aBimanyu-kumAraH
1	aBI_samasa.txt	वीरः अभिमन्युकुमारः	<इन्द्रप्रस्थ-नगर्यां>T6 पाण्डवाः राज्यं परि...	इन्द्रप्रस्थ-नगर्यां	?	T6	0	इन्द्रप्रस्थ-नगर्यां पाण्डवाः राज्यं परिपालय...	1.0	inxraprasWa-nagaryAM
2	aBI_samasa.txt	वीरः अभिमन्युकुमारः	<सदा-आचारिणः>Bs6 <न्याय-प्रियाः>Bs6 <सत्य-व्...	सदा-आचारिणः	?	Bs6	0	सदा-आचारिणः न्याय-प्रियाः सत्य-व्रतिनः पराक्...	4.0	saxA-AcAriNaH
3	aBI_samasa.txt	वीरः अभिमन्युकुमारः	<सदा-आचारिणः>Bs6 <न्याय-प्रियाः>Bs6 <सत्य-व्...	न्याय-प्रियाः	?	Bs6	1	सदा-आचारिणः न्याय-प्रियाः सत्य-व्रतिनः पराक्...	4.0	nyAya-priyAH</blockquote>

This dataframe has 11 columns. What I need to do is find which words in the smaller dataframe match with the words in the WX compounds column in the bigger dataframe and to return the row index of those matches. After that I want all information from that row of that dataframe to be extracted and pasted in that format in a third dataframe along with the word from the smaller dataframe? What would be a good way to go about this?

What I have tried:

I cannot figure out where to start for the matching. If I could get the row index of the matches I could probably use df.iloc[] to extract the information from that row. Again, I do not know how to properly write it all in the right format. Any help would be appreciated.
As an example if the first line from the columns matched I should get an output for that in a third dataframe in this way

Word from DF1 File Name Source/Story Raw Sentence Components Compound Word Tag Position Clean Context Total No Of Compounds WX_Compounds

Basically, the entire format of the bigger dataframe only preceded by the word from Df1.

Any help will be appreciated!

Posted 3-Jun-21 7:33am

adideva98

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)