Click here to Skip to main content
15,887,596 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
The following code generates 100000 rows with random values for the Id column with uuid_generate_v4(). However, the nested selects are always choosing the same row so all the inserted rows have the same values for those columns. The goal is to create a table with 100k rows with random values taken from the other sample tables. Each of the sample tables only have two columns (Id and the column from which the values are taken). How can this be archived?

What I have tried:

SQL
insert into "Tag" (

"Id", "Time", "Account", "Name", "Value", "RollUpTableId"

)

select

uuid_generate_v4(),

current_timestamp,

(select "Account" from "AccountSamples" OFFSET floor(random()*358) LIMIT 1),

(select "Name" from "TagNameSamples" OFFSET floor(random()*19) LIMIT 1),

(select "Value" from "TagValueSamples" OFFSET floor(random()*26) LIMIT 1),

uuid_generate_v4()

from generate_series(1, 100000);


I've also tried the following that was suggested to me but gives me nulls(https://codeshare.io/5QD7dq) in many rows (I've no nulls in the sample tables and they only have 358, 19 and 26 rows each in that order.

SQL
insert into "Tag" (

"Id", "Time", "Account", "Name", "Value", "RollUpTableId"
)

select uuid_generate_v4(), current_timestamp, a."Account", tns."Name", tvs."Value", uuid_generate_v4()

from generate_series(1, 1000) x(rn)

left join (

select "Account", row_number() over(order by random()) rn from "AccountSamples"

) a on a.rn = x.rn

left join (

select "Name", row_number() over(order by random()) rn from "TagNameSamples"

) tns on tns.rn = x.rn

left join (

select "Value", row_number() over(order by random()) rn from "TagValueSamples"

) tvs on tvs.rn = x.rn


I've also tried
SQL
select "Account" from "AccountSamples" where "Id" = (trunc(random() * 358)::integer)
Posted
Updated 18-Sep-20 15:36pm
v2

 
Share this answer
 
Comments
TomHunn 18-Sep-20 21:37pm    
Thanks, the solution provided by pehrs worked. However I'm taking note of your link for future reference in this matter.
Quote:
In your first solution, the problem is that the subselect does not have any external dependency, so it will be optimized to call random only once. You can fix this by adding an external dependency. Try something like this:

SQL
select uuid_generate_v4(),
       current_timestamp,
       (select "Account" from "AccountSamples" WHERE gen=gen OFFSET floor(random()*358) LIMIT 1),
       (select "Name" from "TagNameSamples" WHERE gen=gen OFFSET floor(random()*19) LIMIT 1),
       (select "Value" from "TagValueSamples" WHERE gen=gen OFFSET floor(random()*26) LIMIT 1 ),
       uuid_generate_v4()
  from generate_series(1, 100000) gen;


By the way, the typical way to pick randomly from a table without having to precalculate the size of the table is something like

SQL
SELECT foo FROM bar ORDER BY random() LIMIT 1


It is not overly performant, but it is simple and well understood.


from Reddit.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900