Click here to Skip to main content
15,867,308 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I have 2 dictionaries and a list which have information as:
dic_merged={'A': [['K', 'J'], 2.0],
 'B': [nan, nan],
 'C': [['Y'], 1.0],
 'D': [['B', 'C'], 2.0],
 'J': [nan, nan],
 'K': [nan, nan],
 'G': [['A', 'H'], 2.0],
 'Y': [['Z'], 1.0],
 'H': [nan, nan],
 'Z': [['G'], 1.0]}

This dictionary has keys as id and values as a list of its child and number of child within the list. The list gives us the ids which are new:
new_list=['J', 'G', 'Y', 'Z']

I also have dictionary for each pair of id and it's child with respect to the type as:
lookup_dict={('B', 'D'): 'AA',
 ('C', 'D'): 'BB',
 ('K', 'A'): 'AA',
 ('J', 'A'): 'BB',
 ('A', 'G'): 'AA',
 ('Z', 'Y'): 'AA',
 ('Y', 'C'): 'AA',
('H', 'G'): 'BB',
('G', 'Z'): 'BB' 

I have my dataframe for which I want to fill the 'NaN' values from 'new_list' and lookup_dict:
Python
df2= pd.DataFrame({ 'id':['A', 'B', 'C','D','J','K','Z','Y','H','G'], 
                    'test1':[10, 9, 8,7,np.nan,6,np.nan,np.nan,5,np.nan],
                   'test2':[1, 2, 3,4,np.nan,5,np.nan,np.nan,6,np.nan],
})

The id's whose rows that I would like to fill are the ones in the new_list. My conditions to fill them are like:

For each id in the new_list:

-If it has no child(NaN) then assign zero to test1 and test2

-If the id has 1 child and that is not new (not in new_list), then
--If the type is AA then fill the test1 and test2 with the test1 value of the child.
--If type is BB then fill the test1 and test2 with the test2 value of child.

-If the id has more than 1 child (i.e ['A','H'] for id=G) and none of them is new, then
--If all the types of child is AA then fill the test1 and test2 with max of test1 values of all the child (i.e all the child means both A and H from [A, H]).
--If all the types of child is BB then fill the test1 and test2 with max of test2 values of all the child.
--If the types of child is both AA and BB then fill the test1 of the id with the max of test1 values of the childs and fill the test2 of the id with the max of test2 values of the childs

-If the id has child which is new then do the processes 1,2,3 for that child then check the processes 1,2,3 for that id since each id is dependent on the it's child values.

So far, I have managed to achieve the processes 1,2,3 but I do not know how I can deal with the process 4 (how to integrate my code for it). The correct output should be like:
Python
correct_output= pd.DataFrame({ 'id':['A', 'B', 'C','D','J','K','Z','Y','H','G'], 
                    'test1':[10, 9, 8,7,0,6,6,6,5,10],
                   'test2':[1, 2, 3,4,0,5,6,6,6,6],
})

I could only achieve to this result so far:
Python
half_correct= pd.DataFrame({ 'id':['A', 'B', 'C','D','J','K','Z','Y','H','G'], 
                    'test1':[10, 9, 8,7,0,6,np.nan,np.nan,5,10],
                   'test2':[1, 2, 3,4,0,5,np.nan,np.nan,6,6],
})

The elements ('id') are being updated as the conditions are processed and if the id has no child('NaN'), 'df2' will be updated with right information and the id which is in 'new_list' will be studied. However, the algorithm can start from another id from the 'new_list' so in that case the algorithm should check whether the id has child from the 'new_list' and if it does it should first fill the information of that child and then come back to the id in question. This is just a small part of a bigger dataframes. I think recursion may help but I couldnt figure out how. Any help would be appreciated.

What I have tried:

Python
#get values into dictionary hoping for simplicity
dic_df2=dict([(i,[a,b]) for i,a,b in zip(df2['id'], df2['test1'],df2['test2'])])
act_aa={}
act_bb={}
def test_newcase(i):
    if str(dic_merged[i][0])=='nan':
        df2.loc[df2['id'] == i, ['test1','test2']] = 0
    elif any(x not in new_list for x in dic_merged[i][0]):
        if dic_merged[i][1]==1.0:
            for k in dic_merged[i][0]:
                if lookup_dict[(k, i)]=='AA':
                    df2.loc[df2['id'] == i, ['test1','test2']] = dic_df2[k][0]
                else:
                    df2.loc[df2['id'] == i, ['test1','test2']] = dic_df2[k][1]
        else:
            for k in dic_merged[i][0]:
                #print(i)
                if lookup_dict[(k, i)]=='AA':
                    act_aa[k]=dic_df2[k][0]
                else:
                    act_bb[k]=dic_df2[k][1]
            if act_aa and act_bb:
                df2.loc[df2['id'] == i, ['test1']] = max(act_aa.values())
                df2.loc[df2['id'] == i, ['test2']] = max(act_bb.values())
            elif act_aa and not act_bb:
                df2.loc[df2['id'] == i, ['test1','test2']] = max(act_aa.values())
            elif act_bb and not act_aa:
                df2.loc[df2['id'] == i, ['test1','test2']] = max(act_bb.values())
    elif any(x in new_list for x in dic_merged[i][0]):
        for k in dic_merged[i][0]:
            if k in new_list:
                test_newcase(k)
                
for i in new_list:
    test_newcase(i)
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900