Click here to Skip to main content
15,908,618 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have two dataframes. The smaller one looks like this
Compounds
0	a-viSvasan
1	SaSAfka-vakwre
2	ni-veSanam
3	SAswra-niwyAH
4	mArga-AyAsasya


This has about 10,000 lines. The bigger dataframe on the other hand looks like this,
<blockquote class="quote"><div class="op">Quote:</div>File Name	Source/Story	Raw Sentence	Components	Compound Word	Tag	Position	Clean Context	Total No Of Compounds	WX_Compounds
0	aBI_samasa.txt	वीरः अभिमन्युकुमारः	वीरः <अभिमन्यु-कुमारः>K1	अभिमन्यु-कुमारः	?	K1	1	वीरः अभिमन्यु-कुमारः	1.0	aBimanyu-kumAraH
1	aBI_samasa.txt	वीरः अभिमन्युकुमारः	<इन्द्रप्रस्थ-नगर्यां>T6 पाण्डवाः राज्यं परि...	इन्द्रप्रस्थ-नगर्यां	?	T6	0	इन्द्रप्रस्थ-नगर्यां पाण्डवाः राज्यं परिपालय...	1.0	inxraprasWa-nagaryAM
2	aBI_samasa.txt	वीरः अभिमन्युकुमारः	<सदा-आचारिणः>Bs6 <न्याय-प्रियाः>Bs6 <सत्य-व्...	सदा-आचारिणः	?	Bs6	0	सदा-आचारिणः न्याय-प्रियाः सत्य-व्रतिनः पराक्...	4.0	saxA-AcAriNaH
3	aBI_samasa.txt	वीरः अभिमन्युकुमारः	<सदा-आचारिणः>Bs6 <न्याय-प्रियाः>Bs6 <सत्य-व्...	न्याय-प्रियाः	?	Bs6	1	सदा-आचारिणः न्याय-प्रियाः सत्य-व्रतिनः पराक्...	4.0	nyAya-priyAH</blockquote>


This dataframe has 11 columns. What I need to do is find which words in the smaller dataframe match with the words in the WX compounds column in the bigger dataframe and to return the row index of those matches. After that I want all information from that row of that dataframe to be extracted and pasted in that format in a third dataframe along with the word from the smaller dataframe? What would be a good way to go about this?

What I have tried:

I cannot figure out where to start for the matching. If I could get the row index of the matches I could probably use df.iloc[] to extract the information from that row. Again, I do not know how to properly write it all in the right format. Any help would be appreciated.
As an example if the first line from the columns matched I should get an output for that in a third dataframe in this way

Word from DF1 File Name Source/Story Raw Sentence Components Compound Word Tag Position Clean Context Total No Of Compounds WX_Compounds

Basically, the entire format of the bigger dataframe only preceded by the word from Df1.

Any help will be appreciated!
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900