Click here to Skip to main content
15,885,546 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I have the following data structure:

SQL
CREATE TABLE test_table(
 id int8 PRIMARY KEY,
 patientid VARCHAR(255) NOT NULL,
 studyid VARCHAR(255) NOT NULL,
 gid VARCHAR(255) NOT NULL,
 labels int8,
 timestamp TIMESTAMP
 );


With the following data:

SQL
INSERT INTO test_table(id, patientid, studyid, gid, labels, timestamp)
VALUES (1, 1, 1, 1, 0, '2015-01-11 00:51:14'),
        (4, 1, 1, 1, 1, '2015-01-11 00:54:14'),
        (3, 1, 1, 1, 3, '2015-01-11 00:53:14'),
        (2, 1, 1, 1, 0, '2015-01-11 00:52:14'),
        (5, 1, 1, 1, 0, '2015-01-11 00:55:14'),
        (6, 1, 2, 2, 1, '2015-01-12 08:10:14'),
        (7, 1, 2, 2, 0, '2015-01-12 08:11:14'),
        (8, 1, 2, 2, 1, '2015-01-12 08:13:25'),
        (9, 2, 1, 3, 0, '2015-01-12 09:14:25'),
        (10, 2, 1, 3, 0, '2015-01-12 09:15:25'),
        (11, 2, 1, 3, 2, '2015-01-12 09:17:25'),
        (12, 2, 1, 3, 6, '2015-01-12 09:18:25'),
        (13, 3, 1, 4, 0, '2015-01-13 07:14:25'),
        (14, 3, 1, 4, 0, '2015-01-13 07:15:25'),
        (15, 3, 1, 4, 1, '2015-01-13 07:17:25'),
        (16, 3, 1, 4, 0, '2015-01-13 07:18:25'),
        (17, 3, 1, 4, 0, '2015-01-13 07:18:25'),
        (18, 4, 1, 5, 0, '2015-01-13 03:14:25'),
        (19, 4, 1, 5, 0, '2015-01-13 03:15:25'),
        (20, 4, 1, 5, 0, '2015-01-13 03:17:25'),
        (21, 4, 1, 5, 1, '2015-01-13 03:18:25'),
        (22, 4, 2, 6, 0, '2015-01-13 03:12:13'),
        (23, 4, 2, 6, 1, '2015-01-13 03:14:07'),
        (24, 4, 2, 6, 1, '2015-01-13 03:17:05'),
        (25, 4, 2, 6, 0, '2015-01-13 03:19:42'),
        (26, 4, 2, 6, 0, '2015-01-13 03:22:51')
;


I seek to retrieve the first occurring rows with labels=0 grouped by the combination of patientid and studyid.

The desired output contains the rows with the ids:
[1, 2, 7, 9, 10, 13, 14, 18, 19, 20, 22]


What I have tried:

SQL
select rr.* from (
select tt.*,
       -- rank by group and sort
       rank() over (partition by patientid, studyid order by timestamp) rank
from test_table tt where labels = 0) rr
where rr.rank <= 20;


20 is just an arbitrary number, which is great enough to capture enough possible rows with labels=0.
I hope someone can help me out
Posted
Updated 4-May-22 0:39am
Comments
Richard Deeming 4-May-22 3:51am    
Help with what? What's the problem with the code you've tried?

And if you only want the first row, why rank <= 20? If there are any ties for first place, they will all get a rank of 1.

1 solution

The only thing wrong with your code is testing for rank <= 20 instead of rank = 1 (credit @richard-deeming)

Your expected results are wrong.
Quote:
I seek to retrieve the first occurring rows with labels=0 grouped by the combination of patientid and studyid
Row ID 2 is the second row where labels = 0 for Patient Id 1, Study Id 1.
Similarly Row 10 is the second row for Patient Id 2, Study Id 1,
Row 14 second for Patient Id 3, Study Id 1
Row 19 second row for Patient Id 4, Study Id 1
Row 20 third row for Patient Id 4, Study Id 1
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900