Eliminate rows in a dataframe using queries

Question

0.00/5 (No votes)

See more:

df = pd.DataFrame({'A': [1,4,4,3,7],
                   'B': [1,2,2,6,4],
                   'C ': range(10, 5, -1)})

A B C
1 1 10
4 2 9
4 2 8
3 6 7
7 4 6

This is my data frame with 5 rows and 3 columns. I have to find a way to drop rows by checking A and B.

Condition: A row is dropped if the datapoint in column A is finding a higher value in column A and the B value of that row is lower than the B value of the quering row.

Example : In this data frame, row 4 has to be dropped because it has higher values in column A with less B value.

What I have tried:

I have a logic to loop through all datapoints and drop if it does not satisfy the condition but dont know how to use them may be queries?

Posted 1-Jun-22 22:12pm

A v Nov2021

Updated 2-Jun-22 1:30am

Add a Solution

Comments

Richard MacCutchan 2-Jun-22 4:23am

Your question is not clear. Please show the code you are using and explain why it is not working.

A v Nov2021 2-Jun-22 4:34am

I do not have a code. I don't know how to implement this logic in python. I start with the first row and check all other rows if there is a higher A value "AND" lower B value, if this is not satisfied I dont drop that row. Here, for the first row there are higher A value but no value is lower than the 1 in B so it should not be dropped. Likewise, iterate all rows.

When doing like this row 4 is dropped because it can see there are higher values than 3 in A and lower values than 6 in B .

Richard MacCutchan 2-Jun-22 4:56am

OK, so you understand the requirements, you understand what needs to be done, you have the logic to do it: what is the problem?

A v Nov2021 2-Jun-22 5:05am

I cannot find the libraries that help me do it. How do I loop through and drop?
df = pd.DataFrame({'A': [1,4,4,3,7],
'B': [1,2,2,6,4],
'C ': range(10, 5, -1)})
df2= df.drop(df[(df['A'] < df['A']) & (df['B'] > df['B'])].index)

This code does not work for me.

Richard MacCutchan 2-Jun-22 7:30am

See my solution below.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Richard MacCutchan · Answer 1 · 2022-06-02T01:30:00

Solution 1

I am not 100% certain that this solves your problem, but it does what I think it is supposed to do:

Python

print(df)
indexA, indexB = df.idxmax('index').values[:2]
print(F"\n{indexA = }, {indexB = }")
if indexA != indexB:
  df2 = df.drop(indexA)
print(F"\n{df2}")

[edit]
Better use of pandas (still learning).
[/edit]

Posted 2-Jun-22 1:30am

Richard MacCutchan

Updated 2-Jun-22 2:45am

v2

Comments

A v Nov2021 2-Jun-22 8:53am

will modify the question in a application perspective for better clarity. Sorry for my bad presentation skills.

Lets say this is our dataframe df =pd.DataFrame({'resources'[100,200,300,300,400,400,400,500,1000],'score': [1,2,1,2,3,5,6,8,9]})

I want to find a trade-off with resources i use and my score. My priority is to get the best score with less resources. I iterate all combinations and see if a row is eligible to be considered. So basically in this 9 rows, rows 3, 4, 5,6 should be eliminated 3 because 1 gives the same score with less resource, 4 because 2 gives the same score with less resource, 5 and 6 because 7 gives a better score with same resource. I hope this will make my problem more clear

Richard MacCutchan 2-Jun-22 9:02am

Wll you probably need to do multiple passes over the code to eliminate each row. The first passes you need to eliminate the rows with duplicate scores. Then look for the rows with same resources but best scores. Repeat until there are no more rows to delete.

A v Nov2021 2-Jun-22 9:07am

Thank you! But your code does not eliminate anything in the first pass

Richard MacCutchan 2-Jun-22 9:11am

Yes because that was based on your question which, as I suggested, I did not fully understand. Your latest problem description is somewhat different.

A v Nov2021 2-Jun-22 9:12am

Can you help me solve this

Richard MacCutchan 2-Jun-22 9:21am

I'm not a pandas expert so you need to find some of the DataFrame methods that will get the results you need.