Click here to Skip to main content
15,881,882 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
The sample of the dataset I am working on:
# List of Tuples
matrix = [(1, 0.3, 0, 0.7, 30, 0, 50),
          (2, 0.4, 0.4, 0.3, 20, 50, 30),
          (3, 0.5, 0.2, 0.3, 30, 20, 30),
          (4, 0.7, 0, 0.3, 100, 0, 40),
          (5, 0.2, 0.4, 0.4, 100, 30, 80)
          ]
# Create a DataFrame
df = pd.DataFrame(matrix, columns=["id", "terror", "drama", "action", "val_terror", "val_drama", "val_action"])




My goal is to create a column referring to the highest value of a genre for each customer (considering only the first three columns at first). For example, in the first row the value for action wins.

But in the second row, we have a tie. Following the df.max(axis=1) function, we will get the first value that it observes as the maximum (which would be for terror). In those cases where there is a tie, I would like to receive the genre with the highest value, for example, in the second row it would be val_drama.

What I have tried:

In the first two lines I create two subsets just for easier referencing. Then I use numpy.where to check whether a rows maximum occurs more than once. If yes, I check the other three columns.
import numpy as np
ss1 = df[["terror", "drama", "action"]]
ss2 = df[["val_terror", "val_drama", "val_action"]]
df["largest_score"] = np.where(ss1.eq(ss1.max(axis=1), axis=0).sum(axis=1) > 1,
                               ss2.idxmax(axis=1).str.replace("val_", ""),
                               ss1.idxmax(axis=1))


But there's a problem. The second condition compares the entire list of ss2, which can result in a problem. For example, for id = 5 if we had val_terror > val_action. The code above would put 'terror' as the largest_score, but 'terror' is not the tiebreaker. Is there any way to compare only the values ​​that have a tie? Thanks for the answer above.
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900