How do you generate a table with random values taken from other tables in postgresql?

Question

0.00/5 (No votes)

See more:

The following code generates 100000 rows with random values for the Id column with uuid_generate_v4(). However, the nested selects are always choosing the same row so all the inserted rows have the same values for those columns. The goal is to create a table with 100k rows with random values taken from the other sample tables. Each of the sample tables only have two columns (Id and the column from which the values are taken). How can this be archived?

What I have tried:

SQL

insert into "Tag" (

"Id", "Time", "Account", "Name", "Value", "RollUpTableId"

)

select

uuid_generate_v4(),

current_timestamp,

(select "Account" from "AccountSamples" OFFSET floor(random()*358) LIMIT 1),

(select "Name" from "TagNameSamples" OFFSET floor(random()*19) LIMIT 1),

(select "Value" from "TagValueSamples" OFFSET floor(random()*26) LIMIT 1),

uuid_generate_v4()

from generate_series(1, 100000);

I've also tried the following that was suggested to me but gives me nulls(https://codeshare.io/5QD7dq) in many rows (I've no nulls in the sample tables and they only have 358, 19 and 26 rows each in that order.

SQL

insert into "Tag" (

"Id", "Time", "Account", "Name", "Value", "RollUpTableId"
)

select uuid_generate_v4(), current_timestamp, a."Account", tns."Name", tvs."Value", uuid_generate_v4()

from generate_series(1, 1000) x(rn)

left join (

select "Account", row_number() over(order by random()) rn from "AccountSamples"

) a on a.rn = x.rn

left join (

select "Name", row_number() over(order by random()) rn from "TagNameSamples"

) tns on tns.rn = x.rn

left join (

select "Value", row_number() over(order by random()) rn from "TagValueSamples"

) tvs on tvs.rn = x.rn

I've also tried

SQL

select "Account" from "AccountSamples" where "Id" = (trunc(random() * 358)::integer)

Posted 17-Sep-20 13:59pm

TomHunn

Updated 18-Sep-20 15:36pm

v2

Add a Solution

2 solutions

Solution 1

PostgreSQL 8.4: sampling random rows at EXPLAIN EXTENDED[^]

Posted 18-Sep-20 0:23am

Gerry Schmitz

Comments

TomHunn 18-Sep-20 21:37pm

Thanks, the solution provided by pehrs worked. However I'm taking note of your link for future reference in this matter.

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

TomHunn · Accepted Answer · 2020-09-18T15:36:00

Quote:
In your first solution, the problem is that the subselect does not have any external dependency, so it will be optimized to call random only once. You can fix this by adding an external dependency. Try something like this:

SQL
select uuid_generate_v4(),
       current_timestamp,
       (select "Account" from "AccountSamples" WHERE gen=gen OFFSET floor(random()*358) LIMIT 1),
       (select "Name" from "TagNameSamples" WHERE gen=gen OFFSET floor(random()*19) LIMIT 1),
       (select "Value" from "TagValueSamples" WHERE gen=gen OFFSET floor(random()*26) LIMIT 1 ),
       uuid_generate_v4()
  from generate_series(1, 100000) gen;
By the way, the typical way to pick randomly from a table without having to precalculate the size of the table is something like

SQL
SELECT foo FROM bar ORDER BY random() LIMIT 1
It is not overly performant, but it is simple and well understood.

from Reddit.