Finding prime numbers

Kenneth Haugland

4.97/5 (65 votes)

Aug 20, 2012

CPOL

17 min read

237058

3497

Different schemas for finding prime numbers explained with code

Introduction

There are many schemas for finding Prime numbers, and how to use them around the web. This article is written to collect some of the most common techniques and explain how they work.

Background

I will take the definition for prime numbers from Clay Mathematics Institute, taken from the Millennium problem regarding the Riemann Hypothesis:

"Some numbers have the special property that they cannot be expressed as the product of two smaller numbers, e.g., 2, 3, 5, 7, etc. Such numbers are called prime numbers, and they play an important role, both in pure mathematics and its applications."

These numbers have been the subject of intense research in mathematics, as they have many interesting abilities. The first to study these problems were the ancient Greek's, and the evidence and logic they built from Euclids Axioms (although he lived in Alexandria around 300 BC, a city founded by Alexander the Great) are (for the most part) still valid today. This method of constructing a solid reference that would always be valid was not shattered until Kurt Gödel's theorem in the beginning of the 20th century, although Gauss and a number of other mathematicians did encounter a lot of problems with the axioms in the 19th century.

Euclid can be said to be the first know source of any Prime number investigations, and also the first contribution in pure number theory. He showed that there were indeed an infinatly number of primes, and the logic taken for the Elements is given below:

Take any finite list of prime numbers p₁, p₂, ..., p_n. It will be shown that at least one additional prime number not in this list exists. Let P be the product of all the prime numbers in the list: P = p₁p₂...p_n. Let q = P + 1. Then, q is either prime or not:

If q is prime then there is at least one more prime than is listed.

If q is not prime then some prime factor p divides q. This factor p is not on our list: if it were, then it would divide P (since P is the product of every number on the list); but as we know, p divides P + 1 = q. If p divides P and q then p would have to divide the difference of the two numbers, which is (P + 1) − P or just 1. But no prime number divides 1 so there would be a contradiction, and therefore p cannot be on the list. This means at least one more prime number exists beyond those in the list.

This proves that for every finite list of prime numbers, there is a prime number not on the list. Therefore there must be infinitely many prime numbers.

He also postulated the fundamental theorem in arithmetic, which states that all integers could be expressed as a product of one or more prime numbers.

Sieve of Eratosthenes

Eratosthenes of Cyrene (276 BC - 195 BC) was another person that was interested in primes, and especially how to find them (although we don't actually have any written sources from his thime to confirm this). What he proposed was a simple method of sieving out the number that isn’t a prime number and to create this I’m just going to break it down in simple programming steps:

Create a boolean list from the number 2 to N, and fill all the entities with the boolean value True
Find out how many sieves you need (this is simply squared root of N).
Cross out all numbers in the list that is not a prime. Lets start at 2, witch is a prime, the numbers 4, 6, 8 etc. is not a prime, and you set the according boolean to false.

You might be tempted to ask: "Why stop at Sqrt(N)? " The answer is that the highest possible combination of two numbers that could be factorized is B*B = N. This means that all the numbers that is not a prime would have factors that are found at or below B. (You could see a proper mathematical way of proving this here).

The sieving of Eratosthenes is beautifully shown in the animated picture from Wikipedia:

The next step is to actually implement the algoritm and find all the prime numbers from 2 to N in actual code. This could be done as follows (The code given in the downloads are different implementation, just differing in speed. They do however go though the same steps as the code below. I kept the code below in the article as it is easyer to understand (or read) than the faster ones.):

''' <summary>
''' Returns all the primes from 2 up to N
''' </summary>
''' <param name="N">Find all primes below N</param>
''' <returns></returns>
''' <remarks></remarks>
Private Function SieveOfEratosthenes(ByVal N As Integer) As List(Of Integer)
    'Stores all the prime numbers
    Dim result As New List(Of Integer)

    'We need an boolean array that indicates if the number is a prime
    Dim IsPrime As New List(Of Boolean)

    'Fill the array with all true values and we will start at the number 2
    For i As Integer = 0 To N - 2
        IsPrime.Add(True)
    Next

    'Find and store how many numbers we need to check
    Dim NumberOfPrimeChecks As Integer = CInt(Math.Sqrt(N))

    'Start checking at the number 2
    Dim CurrentNumber As Integer = 2

    'Loop through all the nessecary checks 
    For i As Integer = 0 To NumberOfPrimeChecks - 1
        If IsPrime(i) Then
            'if the number 2 is prime the number 4 is not etc...
            For j As Integer = i + CurrentNumber To IsPrime.Count - 1 Step CurrentNumber
                IsPrime(j) = False
            Next
        End If
        'Increments the counter
        CurrentNumber += 1
    Next

    'Print the stored result in our list based on the boolean values
    For i As Integer = 0 To IsPrime.Count - 1
        If IsPrime(i) Then
            result.Add(i + 2)
        End If
    Next

    Return result
End Function

This sieving is quite good and actually reasonable fast, so what is the catch? The answer is the required storage you need.

You would actually need to store all the boolean/bit values from 2 to N, and that would make a pretty big list if the primes were large (There is also the case that a Boolean is not always a bit in the actual RAM in .NET, so you could possibly minimize the storage in C++ by using bit/bytes as storage in combination with a simple pointer.). To summorize the results:

Speed	N ln ln N + O(N)
Storage	N

This means that the algorithm I showed you must be modified if you want to use it for larger prime numbers. We device a new algorithm that finds the prime numbers from M to N given that we know all the prime numbers below M. The algorithm looks like this:

''' <summary>
''' Gives out all the primes from 2 to N, given the primes below M and the start finding new primes at M
''' </summary>
''' <param name="M">Start the check fore new primes at M</param>
''' <param name="N">The upper boundary for finding primes N</param>
''' <param name="PiPrime">All the primes below M, includeed M</param>
''' <returns>A list of primes from 2 to N</returns>
''' <remarks>This function requires that you have all the prime numbers below M</remarks>
Private Function SieveOfEratosthenes(ByVal M As Integer, ByVal N As Integer, _
                 ByVal PiPrime As List(Of Integer)) As List(Of Integer)
    'Stores all the prime numbers
    Dim result As New List(Of Integer)
    result = PiPrime
    'We still have to do Sieveing with the lowest prime numbers 
    Dim HigestModPi As New List(Of Integer)
    For i As Integer = 0 To PiPrime.Count - 1
        Dim temp As Integer = CInt(Math.Ceiling(M / PiPrime(i))) * PiPrime(i)

        HigestModPi.Add(temp)

    Next

    'We need an boolean array that indicates if the number is a prime
    Dim IsPrime As New List(Of Boolean)

    'Fill the array with all true values and we will start at the number 2
    For i As Integer = M To N - 1
        IsPrime.Add(True)
    Next
 
    'Find and store how many numbers we need to check
    Dim PrimeCheckStop As Integer = CInt(Math.Sqrt(N))

    For i As Integer = 0 To PiPrime.Count - 1
        If PiPrime(i) > PrimeCheckStop Then
            Exit For
        End If

        For j As Integer = 0 To IsPrime.Count - 1 Step PiPrime(i)
            Dim h As Integer = HigestModPi(i) - M + j
            IsPrime(h) = False
        Next
    Next

    'Check if we have performed all checks fo rthe possible primes primes between M and N
    If M < Math.Sqrt(N) Then
        For i As Integer = M To PrimeCheckStop
            If IsPrime(i) Then
                For j As Integer = i * 2 To N Step i
                    IsPrime(j) = False
                Next
            End If
        Next
    End If

    For k As Integer = 0 To IsPrime.Count - 1
        If IsPrime(k) Then
            result.Add(M + k)
        End If
    Next

    Return result
End Function

The results are the same regarding to speed, but since we dont have infinite RAM, this is a practical way of getting nearly all the primes you want. The specifics of this algorithm is given below:

Speed	(N-M) ln ln (N-M) + O(N-M)
Storage	N-M + Primes

If you are wondering how to find an optimal span, that would be quite tricky, as it would be determined by your RAM capabilities. You would either way, at some point, be forced to check number span's that has no prime numbers in them, and with increasingly high numbers you will eventually have to stop, as some prime numbers are guaranteed (since they go on forever) to be larger than you could store on your computer. But I can at least say that you are guaranteed to find a prime between N and 2*N, as postulated by Bertrand and proven by Chebychev.

It is however not possible to include more than Integer.MaxValue items in an array, but the local memory of your computer are usually the bottleneck.

Breaking the "dark ages" of mathematics

The next algorithm that I’m going to show you was not discovered before 1999 and first published in 2004. To explain how this works I’m going to go through the most relevant history of prime number research you need to know In order to understand how the algorithm works.

After the fall of ancient Greek civilization, there was not any known research on pure number theory until the 17th century. One of the people who finally broke it was Fermat, and he did it in his own spare time as a hobby, although he conversed heavily with Pascal, another great French mathematician. He came up with the following formula that is known as Fermat's little theorem:

a^p-1 = 1 (mod p) , were a < p

For example, if a = 2 and p = 7, 2⁶ = 64, and 64 − 1 = 63 = 7 × 9. And the reminder (or rather Number Mod 7) is 1/7. This could be written several different ways but the general equation and the resulat is always the same.

After this period an explotion of brilliant Mathematicians followed up, among them there are especially five people that had an enourmus impact on the development of prime numbers before the modern area of computers: Euler, Chebychev, Lagrange, Gauss and Riemann.

Euler also designed the first prime number generation polynomial, and I have given it below as a curiosity. Other examples can be seen here:

P(x)= x²-x + 41

p(40) = 1601

All values from 0 to 40 gives you prime numbers, and with the values from p(40) to p(80) it generates 33 primes. If we go out to p(1000), 58% of all the numbers are primes, while only 7% are prime in the continued aretmetic progression in this range. A facinating video by William Dunham that explains some of the research done by Euler could be seen here.

Euler is also the person that first developed what is now is the building block formula, if you will, of the Riemann-Zeta function. But as usual Euler gave another facinating fact about prime numbers (the full derivation of the equation could be seen here, and it is actually a mathematical version of the sieve of Eratosthenes). The spectacular result is this:

With the series written out:

and

Euler also developed the first modern use of Modular arithmetic that means congruent modulo the number. It is a further development and a fomralization of Fermat's little theorem, and it is usually explained as clock arithmetic. Modern explanations of the methods could be viewed from Wikipedia or MathWorld, were the syntax today started with Lagrange.

It was the next mathematician Gauss (a contemporary of Lagrange) how expanded the convergent modulus that we know today. But we will start off with Gauss and the question of how close the primes are, or rather the likelihood that in collection of numbers from 2 to N, what is the probability to hit a prime within the selection? Well, the answer could be shown as the table below, were N is the integers from 1 to N, Pi(N) is the number of primes below N, and probability of a prime below N is 1/P(N):

N	Pi(N)	P(N) = N/Pi(N)
10	4	2.5
100	25	4.0
1000	168	6.0
10000	1,229	8.1
100000	9,592	10.4
1000000	78,498	12.7
10000000	664,579	15.0
100000000	5,761,455	17.4

The small table shows an increase of the natural logarithm of the number 10, ln(10) = 2.3, meaning that each time the arithmetic progression multiply 10, the (1/probability) on average increase by a factor of 2.3. This function could however be slightly modified to an integral representation that gives you the Li (x) function:

If one expands the integral in to a Taylor series as x goes to infinity, the first term in the following series below is the one Gauss conjugated by the age of 15!

The most important thing to realize is how the development of congurent modula formed the next algoritm that is most commenly used today (for fast implementations). The general formulation of congruence is this:

Why am I telling you this, well it is just that the next algorithm I’m going to show the code for uses the research with congruence to generate the primes.

It is also Gauss, named the Prince of mathematics, that asked Riemann to investigate the use of real and imaginary in graph plotting. The result and the generalized pattern would eventually result in Riemann's Zeta function were all the non trivial zeros are on the real line of the Riemann-Zeta function.

The reason that the real vs. imaginary plotting is the same in some sence that convergent modulo. When we multiplying a real number with the imaginary number i (1i) that is similar to a angle rotation of 90 degrees. One finds the zero's of the prime function by setting s= a + it, or as Riemann postulated that all the non trivial zeroes (zeros at the line were t=0 doesn’t count), he postulated that all the zeros would be found at s = 1/2+it.

It is actually really complicated to calculate these zeroes, as they involve some complex arithmetic. It usually involves taking the integral over a complex plane, (called Mellin integral), with the Riemann-Siegel formulation. Its a rather complicated story but you could find some explanations of it on Odlyzko's page (be warned, it involves FFT algorithm and many complex subjects in mathematics, and Odlyzko also holds theorems that have not yet been proven considering the distributions of the zeroes).

The picture below shows the Riemann-Zeta function and the critical line could be seen here were the real part is equal to 1/2 (the read part starts on 1/2).

There is also one more curiosity called the Ulam spiral, after its inventor Stanislaw Ulam discovered it in 1963. The Ulam spiral is used to create a simple visualizing of the prime numbers that reveals the apparent tendency of certain quadratic polynomials to generate unusually large numbers of primes. It was discovered during the presentation of a “long and very boring paper” at a scientific meeting. It is created by writing the numbers in a spiral as shown below, and you can read more about it on Wikipedia:

The spiral could be visualized by spheres that are bigger the more factors the number could be divided into. A picture of an Ulam spiral is shown below and the picture is taken from this website.

Prime numbers are assumed to be random or almost random, and it makes it very hard to find any pattern in them. If you take a look at the sieve of Eratosthenes you have already seen that we could easily find a pattern of numbers that are not primes, but the prime numbers were found only when all the other patterns were exhausted.

There is also some evidence of the randomness in primes in nature as well. In nature the evolutionary trait is so that if a species are seeking randomness, you will often find prime numbers at work. This is also the case with the Riemann equation, were you also get an error term that is consisted with random fluctuations.

The story also reveals that in order to find patterns in prime numbers, we usually visually transform them into some kind of either spiral or circular string of natural numbers. Nearly all the calculations with primes from congruent modulus, Ulm spiral and the complex number with Riemann's zeroes in the zeta function seem to be linked to this fact.

Sieve of Atkin

The sieve of Atkin (created by A.O.L Atkin and Daniel J. Bernstein in 2004) and this sieve is actually the exact opposite of sieve of Eratosthenes in that it marks off the primes insted of the non primes. For a detailed explanation of how it is derived see this article by the creators here. The sieve could be broken into these steps (from Wikipedia):

All remainders are modulo-sixty remainders (divide the number by sixty and return the remainder).
All numbers, including x and y, are positive integers.
Flipping an entry in the sieve list means to change the marking (prime or nonprime) to the opposite marking.
Create a results list, filled with 2, 3, and 5.
Create a sieve list with an entry for each positive integer; all entries of this list should initially be marked nonprime.
For each entry number n in the sieve list, with modulo-sixty remainder r :
If r is 1, 13, 17, 29, 37, 41, 49, or 53, flip the entry for each possible solution to 4x² + y² = n.
If r is 7, 19, 31, or 43, flip the entry for each possible solution to 3x² + y² = n.
If r is 11, 23, 47, or 59, flip the entry for each possible solution to 3x² − y² = n when x > y.
If r is something else, ignore it completely.

Start with the lowest number in the sieve list.
Take the next number in the sieve list still marked prime.
Include the number in the results list.
Square the number and mark all multiples of that square as nonprime.
Repeat steps five through eight.
This results in numbers with an odd number of solutions to the corresponding equation being prime, and an even number being nonprime.

The code is pretty much a converted copy from the pseudocode here (Though it was a huge help to see the implementation in C# here by Torleif Berger, and the code is nearly an exact copy, just converted to VB in this article). A waring to you, the code below will actually be much slower than the Sieve of Eratosthenes algorithm given the code below. The main reason is the complicated calculations and testing that would have to be performed on each of the quadratic equations. The code is just given to show how it works, in principle, and if you really want to implement a fast Sieve of Atkin you should use (or modefy) the C code that could be downloaded here. The implementation is faster due to the pre-calculated tables stored in the program. A little dissapointing perhaps, that the algorithm is essensialy a really good zip file for prime numbers.

''' <summary>
''' Returns a list of Primes from 2 to N
''' </summary>
''' <param name="N">Find prime numbers equal or lower than N</param>
''' <returns>List of primes as List(of BigInteger)</returns>
''' <remarks>For large primes you can get out of memory exceptions</remarks>
Private Function SieveOfAtkin(ByVal N As System.Numerics.BigInteger) As List(Of System.Numerics.BigInteger)
    Dim primes As New List(Of System.Numerics.BigInteger)

    Dim isPrime = New Boolean(N) {}
    Dim sqrt As System.Numerics.BigInteger = Math.Sqrt(N)

    For x As System.Numerics.BigInteger = 1 To sqrt
        For y As System.Numerics.BigInteger = 1 To sqrt
            Dim n1 = 4 * x * x + y * y
            If n1 <= N AndAlso (n1 Mod 12 = 1 OrElse n1 Mod 12 = 5) Then
                isPrime(n1) = isPrime(n1) Xor True
            End If

            n1 = 3 * x * x + y * y
            If n1 <= N AndAlso n1 Mod 12 = 7 Then
                isPrime(n1) = isPrime(n1) Xor True
            End If

            n1 = 3 * x * x - y * y
            If x > y AndAlso n1 <= N AndAlso n1 Mod 12 = 11 Then
                isPrime(n1) = isPrime(n1) Xor True
            End If
        Next
    Next

    For n2 As System.Numerics.BigInteger = 5 To sqrt
        If isPrime(n2) Then
            Dim s = n2 * n2
            Dim k As System.Numerics.BigInteger = s
            While k <= N
                isPrime(k) = False
                k += s
            End While
        End If
    Next

    primes.Add(2)
    primes.Add(3)
    For n3 As System.Numerics.BigInteger = 5 To N Step 2
        If isPrime(n3) Then
            primes.Add(n3)
        End If
    Next
    Return primes
End Function

The specifics results of this algorithm are given below, and we see a huge improvments especially when the primes get big. However for a small segemt of prime numbers, the difference is not huge:

Speed	N/log log N
Storage	N^{1/2 + o(1)}

We now see an improvment in speed and and a vast difference in storage need, although it is much trickier to implement than sieve of Eratosthenes.

History

This article is put together by bits and pieces of information that is scattered all over the web, and if I forgot to include some references that should have been given, please accepts my deepest apologies. The writing of the article was meant to show the sieves to get the primes, and also give you a quick overview of some of the properties of primes and the problems that are still unresolved.

Now you should have an excellent toolbox for finding prime numbers, and you could go straight over to the Project Euler web site and start solving the problems. You might even start to think about cracking coded messages that is encrypted, but you should not forget Terrence Tao's words, that it is actually not mathematical proven that, if you could find a fast formula for factorizing a number, you'll be able to crack coded messages.

The code in the C# and VB downloads are changed thanks to the alternative that was poublished by Clifford Nelson. I have however not changed the code inside the article, as the download basically uses the same steps, just rearranged a little. It also speeds up the implementation a greate deal, at it ustulizes a array instead of a list.

References

Ther book that I used as a primary source is written in 2001, three years before the Sieve of Atkin was published, so it lacks the explanation of recent discoveries. It is, however, still much relevant material and it also describes many practical uses of primes in the real world scenarios.

"Prime numbers - A computational perspective", 2001, Richard Cramdall and Carl Pomerance, Springer

"Music of primes" , 2004, Marcus Du Sautoy, Harper Perennial

Source code and article about Sieve of Atkin:

Other sieves:

There is an excellent video-series on the Riemann Hypothesis (this series is even useful if you only know basic mathematics) at YouTube by David Metzler:

A lecture were congruent modulus and other aspects of prime numbers are given by Richard Taylor:

Primes and Equations

And the video presantation from Marcus Du Sautoy:

The Music of the Primes

Online links on the subject from the CodeProject site: