Click here to Skip to main content
15,888,816 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
df = pd.DataFrame({'A': [1,4,4,3,7],
                   'B': [1,2,2,6,4],
                   'C ': range(10, 5, -1)})


A B C
1 1 10
4 2 9
4 2 8
3 6 7
7 4 6

This is my data frame with 5 rows and 3 columns. I have to find a way to drop rows by checking A and B.

Condition: A row is dropped if the datapoint in column A is finding a higher value in column A and the B value of that row is lower than the B value of the quering row.

Example : In this data frame, row 4 has to be dropped because it has higher values in column A with less B value.

What I have tried:

I have a logic to loop through all datapoints and drop if it does not satisfy the condition but dont know how to use them may be queries?
Posted
Updated 2-Jun-22 1:30am
Comments
Richard MacCutchan 2-Jun-22 4:23am    
Your question is not clear. Please show the code you are using and explain why it is not working.
A v Nov2021 2-Jun-22 4:34am    
I do not have a code. I don't know how to implement this logic in python. I start with the first row and check all other rows if there is a higher A value "AND" lower B value, if this is not satisfied I dont drop that row. Here, for the first row there are higher A value but no value is lower than the 1 in B so it should not be dropped. Likewise, iterate all rows.

When doing like this row 4 is dropped because it can see there are higher values than 3 in A and lower values than 6 in B .
Richard MacCutchan 2-Jun-22 4:56am    
OK, so you understand the requirements, you understand what needs to be done, you have the logic to do it: what is the problem?
A v Nov2021 2-Jun-22 5:05am    
I cannot find the libraries that help me do it. How do I loop through and drop?
df = pd.DataFrame({'A': [1,4,4,3,7],
'B': [1,2,2,6,4],
'C ': range(10, 5, -1)})
df2= df.drop(df[(df['A'] < df['A']) & (df['B'] > df['B'])].index)

This code does not work for me.
Richard MacCutchan 2-Jun-22 7:30am    
See my solution below.

1 solution

I am not 100% certain that this solves your problem, but it does what I think it is supposed to do:
Python
print(df)
indexA, indexB = df.idxmax('index').values[:2]
print(F"\n{indexA = }, {indexB = }")
if indexA != indexB:
  df2 = df.drop(indexA)
print(F"\n{df2}")


[edit]
Better use of pandas (still learning).
[/edit]
 
Share this answer
 
v2
Comments
A v Nov2021 2-Jun-22 8:53am    
will modify the question in a application perspective for better clarity. Sorry for my bad presentation skills.

Lets say this is our dataframe df =pd.DataFrame({'resources'[100,200,300,300,400,400,400,500,1000],'score': [1,2,1,2,3,5,6,8,9]})

I want to find a trade-off with resources i use and my score. My priority is to get the best score with less resources. I iterate all combinations and see if a row is eligible to be considered. So basically in this 9 rows, rows 3, 4, 5,6 should be eliminated 3 because 1 gives the same score with less resource, 4 because 2 gives the same score with less resource, 5 and 6 because 7 gives a better score with same resource. I hope this will make my problem more clear
Richard MacCutchan 2-Jun-22 9:02am    
Wll you probably need to do multiple passes over the code to eliminate each row. The first passes you need to eliminate the rows with duplicate scores. Then look for the rows with same resources but best scores. Repeat until there are no more rows to delete.
A v Nov2021 2-Jun-22 9:07am    
Thank you! But your code does not eliminate anything in the first pass
Richard MacCutchan 2-Jun-22 9:11am    
Yes because that was based on your question which, as I suggested, I did not fully understand. Your latest problem description is somewhat different.
A v Nov2021 2-Jun-22 9:12am    
Can you help me solve this

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900