Postgresql: define query to get all first occurring events pr group

Question

0.00/5 (No votes)

See more:

I have the following data structure:

SQL

CREATE TABLE test_table(
 id int8 PRIMARY KEY,
 patientid VARCHAR(255) NOT NULL,
 studyid VARCHAR(255) NOT NULL,
 gid VARCHAR(255) NOT NULL,
 labels int8,
 timestamp TIMESTAMP
 );

With the following data:

SQL

INSERT INTO test_table(id, patientid, studyid, gid, labels, timestamp)
VALUES (1, 1, 1, 1, 0, '2015-01-11 00:51:14'),
        (4, 1, 1, 1, 1, '2015-01-11 00:54:14'),
        (3, 1, 1, 1, 3, '2015-01-11 00:53:14'),
        (2, 1, 1, 1, 0, '2015-01-11 00:52:14'),
        (5, 1, 1, 1, 0, '2015-01-11 00:55:14'),
        (6, 1, 2, 2, 1, '2015-01-12 08:10:14'),
        (7, 1, 2, 2, 0, '2015-01-12 08:11:14'),
        (8, 1, 2, 2, 1, '2015-01-12 08:13:25'),
        (9, 2, 1, 3, 0, '2015-01-12 09:14:25'),
        (10, 2, 1, 3, 0, '2015-01-12 09:15:25'),
        (11, 2, 1, 3, 2, '2015-01-12 09:17:25'),
        (12, 2, 1, 3, 6, '2015-01-12 09:18:25'),
        (13, 3, 1, 4, 0, '2015-01-13 07:14:25'),
        (14, 3, 1, 4, 0, '2015-01-13 07:15:25'),
        (15, 3, 1, 4, 1, '2015-01-13 07:17:25'),
        (16, 3, 1, 4, 0, '2015-01-13 07:18:25'),
        (17, 3, 1, 4, 0, '2015-01-13 07:18:25'),
        (18, 4, 1, 5, 0, '2015-01-13 03:14:25'),
        (19, 4, 1, 5, 0, '2015-01-13 03:15:25'),
        (20, 4, 1, 5, 0, '2015-01-13 03:17:25'),
        (21, 4, 1, 5, 1, '2015-01-13 03:18:25'),
        (22, 4, 2, 6, 0, '2015-01-13 03:12:13'),
        (23, 4, 2, 6, 1, '2015-01-13 03:14:07'),
        (24, 4, 2, 6, 1, '2015-01-13 03:17:05'),
        (25, 4, 2, 6, 0, '2015-01-13 03:19:42'),
        (26, 4, 2, 6, 0, '2015-01-13 03:22:51')
;

I seek to retrieve the first occurring rows with labels=0 grouped by the combination of patientid and studyid.

The desired output contains the rows with the ids:

[1, 2, 7, 9, 10, 13, 14, 18, 19, 20, 22]

What I have tried:

SQL

select rr.* from (
select tt.*,
       -- rank by group and sort
       rank() over (partition by patientid, studyid order by timestamp) rank
from test_table tt where labels = 0) rr
where rr.rank <= 20;

20 is just an arbitrary number, which is great enough to capture enough possible rows with labels=0.
I hope someone can help me out

Posted 3-May-22 20:17pm

koter kong

Updated 4-May-22 0:39am

Add a Solution

Comments

Richard Deeming 4-May-22 3:51am

Help with what? What's the problem with the code you've tried?

And if you only want the first row, why rank <= 20? If there are any ties for first place, they will all get a rank of 1.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CHill60 · Answer 1 · 2022-05-04T00:39:00

The only thing wrong with your code is testing for rank <= 20 instead of rank = 1 (credit @richard-deeming)

Your expected results are wrong.

Quote:
I seek to retrieve the first occurring rows with labels=0 grouped by the combination of patientid and studyid

Row ID 2 is the second row where labels = 0 for Patient Id 1, Study Id 1.
Similarly Row 10 is the second row for Patient Id 2, Study Id 1,
Row 14 second for Patient Id 3, Study Id 1
Row 19 second row for Patient Id 4, Study Id 1
Row 20 third row for Patient Id 4, Study Id 1