## Introduction

Welcome back. I confess, I am considering dropping the episode numbers, as each article is becoming less sequential and more self contained. I intend to keep writing SQL articles and using the overarching title, just to group them together, and still create the concept of a series that can be worked through as if it were a book, to allow people wanting to learn to stick with one set of information and not just read random articles of varying style, quality and accuracy.

I've also decided to rename the series to 'Thinking in SQL', because SQL does require a different mindset, and that's what all my articles are trying to encourage.

Databases are generally used to store data, in a structured format, so we can rely on them to provide us with the same information every time we make the same request. Nevertheless, sometimes we do want to generate random numbers inside SQL Server, not least when we're trying to populate test data. So, today's article is about how generating random numbers works in SQL Server. As always, it's more interesting and has more points to consider than you might first think.

## Generating a random number

`select rand()`

That's all you need to do, to generate a random number in SQL. This creates a value between 0 and 1, so you can create a range up to xx by multiplying by xx, and you can convert to int to make it a whole number, if you require it. The conversion is also implicit if you just use it somewhere that a whole number is expected.

## Generating a random number sequence

Now run this to get a set of five random numbers:

select rand() union all select rand() union all select rand() union all select rand() union all select rand()

Pretty straightforward, right ? But, it does not scale well if you wanted to create a LOT of them. Enter the tally table. Here is SQL to use a tally table to generate 100 random numbers.

WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), -- 1*10^1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows select rand() from e2 ;

What happened ? SQL Server has optimised our query and decided to call rand() once. It doesn't realise that it returns a different value every time, and so assumes that it's being efficient for us. This is, in general, a good thing, but in this case, it's an issue.

The first way we'll try to abstract out the random number generation, is with a view. Create this:

CREATE VIEW vwRand AS SELECT RAND() as Rand go

And then run it like this:

WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), -- 1*10^1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows select rand() from e2, vwRand ;This is a cross join, but because vwRand has one value in it, we still get 100 values back. Sadly, they are all the same. So, what can we run in SQL that knows not to be optimised away ? The answer is, a user defined function.

CREATE FUNCTION RandNumber() RETURNS float AS BEGIN RETURN (SELECT rand() as RandNumber) END goThis is not allowed. Rand is considered a 'side effecting function' ( perhaps it increments a sequence, all 'random' sequences being not really random at all ). So, we need to do it in two steps, as follows:

-- Function to create random number by selecting from view CREATE FUNCTION Random() RETURNS float AS BEGIN RETURN (SELECT Rand FROM vwRand) END goSo, our view is still creating the random number, but a function is causing the view to be invoked for every row. This is a good lesson in terms of the efficiency of functions in general, but in our case, it's perfect.

WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), -- 1*10^1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows select dbo.random() from e2 ;This finally gives us 100 random numbers. Of course, they are all between 0 and 1. So long as we have this wrapped up at all, may as well make it easy to specify what our range is. Here is a function to do this:

CREATE FUNCTION RandomNumber(@max int) RETURNS float AS BEGIN RETURN (SELECT Rand * @max FROM vwRand) END goAnd now you can call this to get a sequence of random values up to the specified maximum

WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), -- 1*10^1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows select dbo.randomNumber(40) from e2 ; goIf we wanted to get random values that are ints, we could cast to int, or we can take a different approach. This code takes the random number, moves the decimal place across, then uses the % operator ( which is mod ) to strip down to the remainder from the value. As this will return a range of 0 to n-1, we add one to the result.

CREATE FUNCTION RandomInt(@max int) RETURNS int AS BEGIN RETURN (SELECT convert(int, Rand * 100000)% @max FROM vwRand) + 1 END goAnd here's the code to call it.

WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), -- 1*10^1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows select dbo.randomInt(50) from e2 ;

## Benchmarking our results

The big question with any random sequence is, how random is it ? We're going to use SQL to try to find out. Our strategy will be to create a small range of numbers, count them and see if the spread is reasonably even. If we asked for a value between 0 and 5, and it was 4 60% of the time, that would not indicate a reasonable spread of values...WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), -- 1*10^1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows rand as(select dbo.randomInt(5) as r from e2) select r, count(r) from rand group by r order by count(r) desc ;This will return a result along the lines of:

2 26 3 21 1 19 4 18 5 16Which is a reasonable sort of spread ( if they were all 20, that would not be random, either ).

## Another way to create random numbers

Another way to create random numbers, is to start with a GUID. A GUID is, by definition, random. The following code casts a GUID to a binary value of 6 bytes. We then use the mod operator to clamp down to the range we want, maximum possible values exceed 2 billion. This also converts the value to an int, but it could be negative, so we use ABS to strip the negative sign, and again add one to clamp to our range (as above)SELECT ABS(CAST(NEWID() AS binary(6)) %10) + 1The next question is, will this work in inline code ? Run this and you will see it does work, NewId() is called for every row.

WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), -- 1*10^1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows select ABS(CAST(NEWID() AS binary(6)) %10) + 1 from e2 ;However, it's enough of a mouthful to make sense to wrap it in a view and a function. We need a function in order to be able to pass in the maximum value we want our range to be in, otherwise we could just use a view. A function makes more sense though, as we are after a single value. It is worth mentioning that using functions will slow down your SQL, so you should make an intelligent choice based on your needs.

create VIEW vwRandFromId AS select abs(convert(int, CAST(NEWID() AS binary(6)))) as rand go CREATE FUNCTION RandomIntFromId(@max int) RETURNS int AS BEGIN RETURN (SELECT Rand% @max FROM vwRandFromId) ENDNow we can do this where ever we need to:

WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), -- 1*10^1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows select dbo.RandomIntFromId(500) from e2 ;Once again, let's see how random this really is. We're not using functionality designed for randomness, so this is worth looking in to.

WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), -- 1*10^1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows rand as(select dbo.RandomIntFromId(5) as r from e2) select r, count(r) from rand group by r order by count(r) desc ;My results were as follows:

0 28 1 22 3 17 4 17 2 16That seems like a decent spread to me, more than good enough to use instead of the build in function, if we wanted to.