Filling empty column values of a dataframe based on dictionary and a list in Python

Question

0.00/5 (No votes)

See more:

I have 2 dictionaries and a list which have information as:

dic_merged={'A': [['K', 'J'], 2.0],
 'B': [nan, nan],
 'C': [['Y'], 1.0],
 'D': [['B', 'C'], 2.0],
 'J': [nan, nan],
 'K': [nan, nan],
 'G': [['A', 'H'], 2.0],
 'Y': [['Z'], 1.0],
 'H': [nan, nan],
 'Z': [['G'], 1.0]}

This dictionary has keys as id and values as a list of its child and number of child within the list. The list gives us the ids which are new:

new_list=['J', 'G', 'Y', 'Z']

I also have dictionary for each pair of id and it's child with respect to the type as:

lookup_dict={('B', 'D'): 'AA',
 ('C', 'D'): 'BB',
 ('K', 'A'): 'AA',
 ('J', 'A'): 'BB',
 ('A', 'G'): 'AA',
 ('Z', 'Y'): 'AA',
 ('Y', 'C'): 'AA',
('H', 'G'): 'BB',
('G', 'Z'): 'BB'

I have my dataframe for which I want to fill the 'NaN' values from 'new_list' and lookup_dict:

Python

df2= pd.DataFrame({ 'id':['A', 'B', 'C','D','J','K','Z','Y','H','G'], 
                    'test1':[10, 9, 8,7,np.nan,6,np.nan,np.nan,5,np.nan],
                   'test2':[1, 2, 3,4,np.nan,5,np.nan,np.nan,6,np.nan],
})

The id's whose rows that I would like to fill are the ones in the new_list. My conditions to fill them are like:

For each id in the new_list:

-If it has no child(NaN) then assign zero to test1 and test2

-If the id has 1 child and that is not new (not in new_list), then
--If the type is AA then fill the test1 and test2 with the test1 value of the child.
--If type is BB then fill the test1 and test2 with the test2 value of child.

-If the id has more than 1 child (i.e ['A','H'] for id=G) and none of them is new, then
--If all the types of child is AA then fill the test1 and test2 with max of test1 values of all the child (i.e all the child means both A and H from [A, H]).
--If all the types of child is BB then fill the test1 and test2 with max of test2 values of all the child.
--If the types of child is both AA and BB then fill the test1 of the id with the max of test1 values of the childs and fill the test2 of the id with the max of test2 values of the childs

-If the id has child which is new then do the processes 1,2,3 for that child then check the processes 1,2,3 for that id since each id is dependent on the it's child values.

So far, I have managed to achieve the processes 1,2,3 but I do not know how I can deal with the process 4 (how to integrate my code for it). The correct output should be like:

Python

correct_output= pd.DataFrame({ 'id':['A', 'B', 'C','D','J','K','Z','Y','H','G'], 
                    'test1':[10, 9, 8,7,0,6,6,6,5,10],
                   'test2':[1, 2, 3,4,0,5,6,6,6,6],
})

I could only achieve to this result so far:

Python

half_correct= pd.DataFrame({ 'id':['A', 'B', 'C','D','J','K','Z','Y','H','G'], 
                    'test1':[10, 9, 8,7,0,6,np.nan,np.nan,5,10],
                   'test2':[1, 2, 3,4,0,5,np.nan,np.nan,6,6],
})

The elements ('id') are being updated as the conditions are processed and if the id has no child('NaN'), 'df2' will be updated with right information and the id which is in 'new_list' will be studied. However, the algorithm can start from another id from the 'new_list' so in that case the algorithm should check whether the id has child from the 'new_list' and if it does it should first fill the information of that child and then come back to the id in question. This is just a small part of a bigger dataframes. I think recursion may help but I couldnt figure out how. Any help would be appreciated.

What I have tried:

Python

#get values into dictionary hoping for simplicity
dic_df2=dict([(i,[a,b]) for i,a,b in zip(df2['id'], df2['test1'],df2['test2'])])
act_aa={}
act_bb={}
def test_newcase(i):
    if str(dic_merged[i][0])=='nan':
        df2.loc[df2['id'] == i, ['test1','test2']] = 0
    elif any(x not in new_list for x in dic_merged[i][0]):
        if dic_merged[i][1]==1.0:
            for k in dic_merged[i][0]:
                if lookup_dict[(k, i)]=='AA':
                    df2.loc[df2['id'] == i, ['test1','test2']] = dic_df2[k][0]
                else:
                    df2.loc[df2['id'] == i, ['test1','test2']] = dic_df2[k][1]
        else:
            for k in dic_merged[i][0]:
                #print(i)
                if lookup_dict[(k, i)]=='AA':
                    act_aa[k]=dic_df2[k][0]
                else:
                    act_bb[k]=dic_df2[k][1]
            if act_aa and act_bb:
                df2.loc[df2['id'] == i, ['test1']] = max(act_aa.values())
                df2.loc[df2['id'] == i, ['test2']] = max(act_bb.values())
            elif act_aa and not act_bb:
                df2.loc[df2['id'] == i, ['test1','test2']] = max(act_aa.values())
            elif act_bb and not act_aa:
                df2.loc[df2['id'] == i, ['test1','test2']] = max(act_bb.values())
    elif any(x in new_list for x in dic_merged[i][0]):
        for k in dic_merged[i][0]:
            if k in new_list:
                test_newcase(k)
                
for i in new_list:
    test_newcase(i)

Posted 16-Oct-22 8:48am

Member 15753358

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)