15,662,570 members
Articles / General Programming / Algorithms
Article
Posted 30 May 2019

8.7K views
3 bookmarked

Integer Factorization: Optimizing Small Factors Checking

Rate me:
5.00/5 (1 vote)
How to optimize small factors checking
In this article, I show how I adapted a mental calculation technique to check some small factors faster than division.

Background

xPrompt.zip contains the binary of the xHarbour Scripting Engine for Windows. This is the app needed to run the prg code.

Usage :
- Copy prg file in same directory than xPrompt
- Launch xPromt
- Type "do FileName.prg" to run the code

xHarbour is freeware and can be dowloaded at xHarbour.org[^]

xHarbour is part of the xBase family languages: xBase - Wikipedia[^]

Introduction

Division is a good general purpose algorithm to check if a prime is factor of an integer, but sometimes we can get faster.

Small primes: Faster than division

When dealing with integer factorization, it is common to have a first stage for checking small factors before starting main algorithm. It happen that it is possible to check some of those factors faster than with a division.

Division is an efficient general purpose algorithm, but faster algorithms exist for particular divisors. Those algorithms resort to Mental Calculation techniques.

Some mental calculation techniques allow you to check easily the divisibility of a number for specific factors. The same techniques can be used on computer to check a few factors faster than with divisions because those thechniques also works with base 2.

Checking the Base

Humans Counting

Base 10 is how we count. Everyone can tell instantly if a number is multiple of 10 without resorting to division because the unit digit is all what matters. And everyone can also deduce if multiple of 2 and 5 as they are factors of 10.

2 is a factor if unit is 0, 2, 4, 6, 8.
5 is a factor if unit is 0, 5.
Ex: 3141592653
unit is 3, so 2 and 5 are not factor of 3141592653.

Cost is constant time. O(n) = 1.

Computer Counting

Base 2 is how computers count. The unit allow to know if 2 is a factor of a number. The fast operation is a bitwise AND.

2 is a factor if unit is 0.
Ex: 3141592653 => 0b10111011010000001110011001001101
unit is 1, so 2 is not a factor of 3141592653.

Cost is constant time. O(n) = 1.

dBase
if (N & 1) = 0
return(2)
endif
C++
if ((Cand & 1) == 0) {
return 2;
}

Checking Other Factors

Humans Counting

Everyone having learned to do additions and multiplications by hand have also learned how to proof check those operations by using the casting out nines method. The method can be used to know if a number have 3 as factor, and also 11 and others with a little teaking. All this is possible because, 3 is a factor of 10 - 1 and 11 is a factor of 100 - 1.

The casting out nines method is about adding the digits of a number and reapply the method until the result is 1 digit.

Factor 3

The techbique is about checking 9 which is base 10 - 1, 3 being only a factor of 9.

3 is a factor if result is 3, 6 or 9.
Ex: 3141592653 => 3+1+4+1+5+9+2+6+5+3 => 39 => 3+9 => 12 => 1+2 => 3
Result is 3, so 3 is a factor of 3141592653.

Cost for n is log(n). O(n) = log(n).

Factor 11

The technique is about checking 11 which is base 10 + 1. We adapt the method with packets of 2 digits.

for 11, we use the fact that 99 (100-1) is a multiple of 11.
11 is a factor if result is 0, 11, 22, 33 ... 88, 99 or if difference of the 2 digits is 0.
3141592653 => 31+41+59+26+53 => 210 => 2+10 => 12
Result is 12, so 11 is not a factor of 3141592653 because 1<>2.

Cost for n is log(n). O(n) = log(n).

Factor 101

The technique is about checking 101 which is base 10^2 + 1. We adapt the method with packets of 4 digits.

for 101, we use the fact that 9999 (10000-1) is a multiple of 101.
101 is a factor if result is 0, 101, 202, 303 ... 9999. or if difference of pairs of digits is 0.
3141592653 => 31+4159+2653 => 6843
So 101 is not a factor of 3141592653 because 68<>43.

Cost for n is log(n). O(n) = log(n).

Those factor check can be combined when the number of digits in a check is a multiple of the number of digit of another.

Because 10000 is a power of 100, the result for factor 101 can be used to check factor 11 and 9.
3141592653 => 31+4159+2653 => 6843 => 101 is not a factor

Because 10000 is a power of 100, the result for 10000 can be reused for 100.
6843 => 68+43 => 111 => 1+11 => 12 => 11 is not a factor

Because 100 is a power of 10, the result for 100 can be reused for 10.
12 => 1+2 => 3 => 3 is a factor

The cost for factors 11 and 3 are in constant time when done after 101.

Computer Counting

The principle of "casting out nines" can be applied to base 2. Processors have very efficient tools that can be used to butcher numbers, those tools are addition/subtraction, bitwise operation and shifting.

Interesting Small Primes

3	2^2-1=3=1*3
5	2^2+1=5=1*5
7	2^3-1=7=1*7
11	2^5+1=33=1*3*11
13	2^6+1=65=1*5*13
17	2^4+1=17=1*17
19	2^9+1=513=1*3*3*3*19
23	2^11-1=2047=1*23*89
29	2^14+1=16385=1*5*29*113
31	2^5-1=31=1*31
37	2^18+1=262145=1*5*13*37*109
41	2^10+1=1025=1*5*5*41
43	2^7+1=129=1*3*43
47	2^23-1=8388607=1*47*178481
53	2^26+1=67108865=1*5*53*157*1613
59	2^29+1=536870913=1*3*59*3033169

Cascade Factors 17, 5 and 3

Ex: 3141592653 to binary 0b10111011010000001110011001001101

Reduce by 2 ^ 16 as intermediate optimization
0b10111011010000001110011001001101 => 0b1011101101000000+0b1110011001001101
=> 0b11010000110001101 => 0b1+0b1010000110001101 => 0b1010000110001110

Check 17: Reduce by 2 ^ 8
0b10100001+0b10001110 => 0b1000101110 => 0b10+0b00101110 => 0b00110000 => 0b0011 0b0000
17 is not a factor of 3141592653 because 0b0011<>0b0000

Check 5: Reduce by 2 ^ 4
0b00110000 => 0b0011+0b0000 => 0b0011 => 0b00 0b11
5 is not a factor of 3141592653 because 0b00<>0b11

Check 3: Reduce by 2 ^ 2
0b0011 => 0b00+0b11 => 0b11
3 is a factor of 3141592653

Cost for n is log(n) + constant. O(n) = log(n).

Intermediate oprimization

In modern processors, registers are 64 bits and simple operation take 1 clockcycle, no matter the size of an operation, the cost is the same. When one want to reduce a 64 bits value to a 8 bits value, it takes 7 operations.

By using intermediate steps, we can divide the size of operation by 2 every time. A 64 bits value is first reduced to 32 bits, the 16 bits, and finally to 8 bits. That 3 operations.

Algorithm

There are 4 cases to handle:

• Factor is a power of 2 minus 1
• Reduce to power of 2.
Compare result with factor.

• Factor is a factor of a power of 2 minus 1
• Reduce to power of 2.
do a modulo.

• Factor is a power of 2 plus 1
• Reduce to double of power of 2.
test if both half are same.

• Factor is a factor of a power of 2 plus 1
• Reduce to double of power of 2.
do a modulo on the difference of both half.

and optimizations:

• Do a topological sort.

• Intermediate optimization
• Add intermediate reduction where needed.

Generator

As things can get complicated, It was easier to automate the code generation.

List of prime to integrate

dBase
*	List of: Prime, power of 2, +-1, is multiple
list={ {3, 2, -1, .F.}, {5, 2, +1, .F.}, {7, 3, -1, .F.}, {11, 5, +1, .T.}, ;
{13, 6, +1, .T.}, {17, 4, +1, .F.}, {19, 9, +1, .T.}, {23, 11, -1, .T.}, ;
{29, 14, +1, .T.}, {31, 5, -1, .F.}, {37, 18, +1, .T.}, {41, 10, +1, .T.}, ;
{43, 7, +1, .T.}, {47, 23, -1, .T.}, {53, 26, +1, .T.}, {59, 29, +1, .T.}}

dBase
*	tri topologique
*	Optimization by chaining operations	for scan=2 to len(list)
ptr= 0
for place= 1 to scan-1
if list[scan,5] % list[place,5] = 0 // scan multiple de place
if list[scan,5] = list[place,5]
tmp= list[scan]
ains(list, place, tmp, .T.)
ptr=0
exit
endif
if ispower(list[scan,5] / list[place,5], 2)= 1
tmp= list[scan]
ains(list, place, tmp, .T.)
ptr=0
exit
endif
endif
if list[place,5] % list[scan,5] = 0
if ispower(list[place,5] / list[scan,5], 2)= 1
ptr= place
endif
endif
next
if ptr != 0
tmp= list[scan]
ains(list, ptr+1, tmp, .T.)
endif
next

dBase
*	pad list with intermediate opeartions
for scan=len(list) to 2 step -1
while list[scan-1,5] % list[scan,5] = 0
tmp= list[scan-1,5] / list[scan,5]
tmpp= ispower(list[scan-1,5] / list[scan,5], 2)
if tmp > 2 .and. tmpp=1
ains(list, scan, {0,0,0,.F.,list[scan,5]*2}, .T.)
else
exit
endif
enddo
while list[scan-1,5] < list[scan,5]
if list[scan,5] < int_size/2
ains(list, scan, {0,0,0,.F.,list[scan,5]*2}, .T.)
else
exit
endif
enddo
next
while list[1,5] < int_size/2
ains(list, 1, {0,0,0,.F.,list[1,5]*2}, .T.)
enddo

Code generator

dBase
*	generate C code
? int_type+" IsSFactNonDiv("+int_type+" Cand) {"
? "  "+int_type+" Mask, Prime, tmp;"
? "  int Size;"
? "  // fact 2"
? "  if ((Cand & 1) == 0) {"
? "    return 2;"
? "  }"
?

for scan=1 to len(list)
if list[scan,1] != 0
? "  Prime="+str(list[scan,1],,,.T.)+";"
endif
if scan=1
org= "Cand"
elseif list[scan,5]*2=list[scan-1,5]
org= "N"+str(list[scan,5]*2,,,.T.)
else
org= "Cand"
endif
if (scan=1 .or. list[scan,5] != list[scan-1,5])
if list[scan,5] < int_size
? "  Size="+str(list[scan,5],,,.T.)+";"
? "  Mask= (0ULL-1) >> ("+str(int_size,,,.T.)+"-Size);"

if list[scan,3]=0
? "  "+int_type+" N"+str(list[scan,5],,,.T.)+"= ("+org+" >> Size) + ("+org+" & Mask);"
else
? "  "+int_type+" N"+str(list[scan,5],,,.T.)+"= FactFold("+org+", Size, Mask);"
endif
else
? "  "+int_type+" N"+str(list[scan,5],,,.T.)+"= "+org+";"
endif
endif

if list[scan,3]=  1 .and. list[scan,4] // P2+1 et multiple
? "  Size="+str(list[scan,2],,,.T.)+";"
? "  Mask= (0ULL-1) >> ("+str(int_size,,,.T.)+"-Size);"
? "  tmp = abs(( N"+str(list[scan,5],,,.T.)+" >> Size) - ( N"+str(list[scan,5],,,.T.)+" & Mask));"
? "  tmp = FactReduce(tmp, Prime);"
? "  if (tmp == 0) {"
? "    return Prime;"
? "  }"
elseif list[scan,3]= 1 // P2+1
? "  Size="+str(list[scan,2],,,.T.)+";"
? "  Mask= (0ULL-1) >> ("+str(int_size,,,.T.)+"-Size);"
? "  if (( N"+str(list[scan,5],,,.T.)+" >> Size) == ( N"+str(list[scan,5],,,.T.)+" & Mask)) {"
? "    return Prime;"
? "  }"
elseif list[scan,3]= -1 .and. list[scan,4] // P2-1 et multiple
? "  tmp = FactReduce( N"+str(list[scan,5],,,.T.)+", Prime);"
? "  if (tmp == 0) {"
? "    return Prime;"
? "  }"
elseif list[scan,3]= -1 // P2-1
? "  if ( N"+str(list[scan,5],,,.T.)+" == Prime) {"
? "    return Prime;"
? "  }"
endif

?
next
? "  return Cand;"
? "}"

Resulting code

dBase
// Fast checking of small primes
function SmallPrimes(N)
//  Check factor 2
if (N & 1) = 0
return(2)
endif

//  Check factor 257
N16= Reduce(N, 16)
if  (N16 >> 8) = (N16 & 255)
return(257)
endif
//  Check factor 17
N8= Reduce(N16, 8)
if  (N8 >> 4) = (N8 & 15)
return(17)
endif
//  Check factor 5
N4= Reduce(N8, 4)
if  (N4 >> 2) = (N4 & 3)
return(5)
endif
//  Check factor 3
N2= Reduce(N4, 2)
if  N2 = 3
return(3)
endif

// Check factor 13
N12= Reduce(N, 12)
Tmp= abs((N12 >> 6) - (N12 & 63))
while Tmp >= 13
Tmp= Tmp- 13
enddo
if  Tmp = 0
return(13)
endif
//  Check factor 7
N3= Reduce(N12, 3)
if  N3 = 7
return(7)
endif

//  Check factor 11
N10= Reduce(N, 10)
Tmp= abs((N10 >> 5) - (N10 & 31))
while Tmp >= 11
Tmp= Tmp- 11
enddo
if  Tmp = 0
return(11)
endif
//  Check factor 31
N5= Reduce(N10, 5)
if  N5 = 31
return(31)
endif

//  Check factor 127
N7= Reduce(N, 7)
if  N7 = 127
return(127)
endif

return 1

// Binary 'casting out nines'
function Reduce(N, Size)
Tmp= N
while (Tmp >> Size) <> 0
tmp= (Tmp >> Size) + (Tmp & Mask)
enddo
return(Tmp)
C++
// My modulo routine
inline long long FactModulo(long long cand, long long fact) {
// my modulo
while (cand >= fact) {
if (cand & 1) {
cand = cand - fact;
}
cand = cand >> 1;
}
return cand;
}

// the reduction routine
inline long long FactFold(long long cand, long long shift, long long Mask) {
// fold integer on itself
do {
cand = (cand >> shift) + (cand & Mask);
} while (cand >> shift);
return cand;
}

// My factor checker, result of automated code generation
long long IsSFactNonDiv(long long Cand) {
int Size;
// fact 2
if ((Cand & 1) == 0) {
return 2;
}

Size = 32;
Mask = (0ULL - 1) >> (64 - Size);
long long N32 = (Cand >> Size) + (Cand & Mask);

Size = 16;
Mask = (0ULL - 1) >> (64 - Size);
long long N16 = (N32 >> Size) + (N32 & Mask);

Prime = 17;
Size = 8;
Mask = (0ULL - 1) >> (64 - Size);
long long N8 = FactFold(N16, Size, Mask);
Size = 4;
Mask = (0ULL - 1) >> (64 - Size);
if ((N8 >> Size) == (N8 & Mask)) {
return Prime;
}

Prime = 5;
Size = 4;
Mask = (0ULL - 1) >> (64 - Size);
long long N4 = FactFold(N8, Size, Mask);
Size = 2;
Mask = (0ULL - 1) >> (64 - Size);
if ((N4 >> Size) == (N4 & Mask)) {
return Prime;
}

Prime = 3;
Size = 2;
Mask = (0ULL - 1) >> (64 - Size);
long long N2 = FactFold(N4, Size, Mask);
if (N2 == Prime) {
return Prime;
}

Size = 48;
Mask = (0ULL - 1) >> (64 - Size);
long long N48 = (Cand >> Size) + (Cand & Mask);

Size = 24;
Mask = (0ULL - 1) >> (64 - Size);
long long N24 = (N48 >> Size) + (N48 & Mask);

Prime = 13;
Size = 12;
Mask = (0ULL - 1) >> (64 - Size);
long long N12 = FactFold(N24, Size, Mask);
Size = 6;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N12 >> Size) - (N12 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}

Size = 6;
Mask = (0ULL - 1) >> (64 - Size);
long long N6 = (N12 >> Size) + (N12 & Mask);

Prime = 7;
Size = 3;
Mask = (0ULL - 1) >> (64 - Size);
long long N3 = FactFold(N6, Size, Mask);
if (N3 == Prime) {
return Prime;
}

Size = 40;
Mask = (0ULL - 1) >> (64 - Size);
long long N40 = (Cand >> Size) + (Cand & Mask);

Prime = 41;
Size = 20;
Mask = (0ULL - 1) >> (64 - Size);
long long N20 = FactFold(N40, Size, Mask);
Size = 10;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N20 >> Size) - (N20 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}

Prime = 11;
Size = 10;
Mask = (0ULL - 1) >> (64 - Size);
long long N10 = FactFold(N20, Size, Mask);
Size = 5;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N10 >> Size) - (N10 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}

Prime = 31;
Size = 5;
Mask = (0ULL - 1) >> (64 - Size);
long long N5 = FactFold(N10, Size, Mask);
if (N5 == Prime) {
return Prime;
}

Prime = 37;
Size = 36;
Mask = (0ULL - 1) >> (64 - Size);
long long N36 = FactFold(Cand, Size, Mask);
Size = 18;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N36 >> Size) - (N36 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}

Prime = 19;
Size = 18;
Mask = (0ULL - 1) >> (64 - Size);
long long N18 = FactFold(N36, Size, Mask);
Size = 9;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N18 >> Size) - (N18 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}

Prime = 23;
Size = 11;
Mask = (0ULL - 1) >> (64 - Size);
long long N11 = FactFold(Cand, Size, Mask);
tmp = FactModulo(N11, Prime);
if (tmp == 0) {
return Prime;
}

Size = 56;
Mask = (0ULL - 1) >> (64 - Size);
long long N56 = (Cand >> Size) + (Cand & Mask);

Prime = 29;
Size = 28;
Mask = (0ULL - 1) >> (64 - Size);
long long N28 = FactFold(N56, Size, Mask);
Size = 14;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N28 >> Size) - (N28 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}

Prime = 43;
Size = 14;
Mask = (0ULL - 1) >> (64 - Size);
long long N14 = FactFold(N28, Size, Mask);
Size = 7;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N14 >> Size) - (N14 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}

Size = 46;
Mask = (0ULL - 1) >> (64 - Size);
long long N46 = (Cand >> Size) + (Cand & Mask);

Prime = 47;
Size = 23;
Mask = (0ULL - 1) >> (64 - Size);
long long N23 = FactFold(N46, Size, Mask);
tmp = FactModulo(N23, Prime);
if (tmp == 0) {
return Prime;
}

Prime = 53;
Size = 52;
Mask = (0ULL - 1) >> (64 - Size);
long long N52 = FactFold(Cand, Size, Mask);
Size = 26;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N52 >> Size) - (N52 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}

Prime = 59;
Size = 58;
Mask = (0ULL - 1) >> (64 - Size);
long long N58 = FactFold(Cand, Size, Mask);
Size = 29;
Mask = (0ULL - 1) >> (64 - Size);
tmp = abs((N58 >> Size) - (N58 & Mask));
tmp = FactModulo(tmp, Prime);
if (tmp == 0) {
return Prime;
}

return Cand;
}

Benchmark

Runtime is so fast that timing is done on repeated execution of routines. Timing done on a i3 3GHz.

Timing
Integer				Factor				Division	My Method
6561				3					9 ms		6 ms
125					5					18.001 ms	4 ms
2401				7					39.002 ms	9 ms
14641				11					41.002 ms	20.001 ms
3601				13					42.002 ms	11 ms
83521				17					52.003 ms	3 ms
49999				49999				147.008 ms	59.003 ms
4611686018427387899	4611686018427387899	143.008 ms	55.003 ms
4611686018427387877	4611686018427387877	143.008 ms	55.003 ms

On last three numbers, my method is 2.5 times faster than division.

Points of Interest

Interest is limited if value is within register size, is value is big integer, the technique can be extended with more gains.

Unfortunately, it works only on a limited set of primes.

History

• 20190501: First draft
• 20201230: Second version

Written By
Database Developer
France
I am a professional programmer.
Problem analyse is certainly what I am best at.
My main programming expertise is in the xBase languages (dBase, Clipper, FoxPro, Harbour, xHarbour), then VBA for Excel and advanced Excel WorkBooks.

I also have knowledge on C/C++, d language, HTML, SVG, XML, XSLT, Javascript, PHP, BASICs, Python, COBOL, Assembly.
My personal interests goes to algorithm optimization and puzzles.

 First Prev Next
 long long FactModulo (long long cand, long long fact) { } Member 117206811-Jan-21 18:11 Member 11720681 1-Jan-21 18:11
 Re: long long FactModulo (long long cand, long long fact) { } Patrice T1-Jan-21 18:19 Patrice T 1-Jan-21 18:19
 Re: long long FactModulo (long long cand, long long fact) { } Member 117206812-Jan-21 5:26 Member 11720681 2-Jan-21 5:26
 Re: long long FactModulo (long long cand, long long fact) { } Patrice T2-Jan-21 6:04 Patrice T 2-Jan-21 6:04
 Re: long long FactModulo (long long cand, long long fact) { } Member 117206812-Jan-21 7:44 Member 11720681 2-Jan-21 7:44
 Re: long long FactModulo (long long cand, long long fact) { } Patrice T4-Jan-21 2:05 Patrice T 4-Jan-21 2:05
 The 'for loops' inside fcn void vitesse (unsigned long long cand) { } do not make sense Member 1172068131-Dec-20 20:23 Member 11720681 31-Dec-20 20:23
 C++ void vitesse(unsigned long long cand) { unsigned long long fact; auto start = high_resolution_clock::now(); for (int i = 0; i < rept; i++) { fact = IsSFactDiv(cand); } auto stop = high_resolution_clock::now(); auto duration = duration_cast(stop - start); cout << cand << " " << fact << " " << duration.count() << " ms" << endl; start = high_resolution_clock::now(); for (int i = 0; i < rept; i++) { fact = IsSFactNonDiv(cand); } stop = high_resolution_clock::now(); duration = duration_cast(stop - start); cout << cand << " " << fact << " " << duration.count() << " ms" << endl; }
 Re: The 'for loops' inside fcn void vitesse (unsigned long long cand) { } do not make sense Patrice T31-Dec-20 22:25 Patrice T 31-Dec-20 22:25
 Re: The 'for loops' inside fcn void vitesse (unsigned long long cand) { } do not make sense Member 117206811-Jan-21 6:49 Member 11720681 1-Jan-21 6:49
 My 5. CPallini30-Dec-20 19:56 CPallini 30-Dec-20 19:56
 Re: My 5. Patrice T30-Dec-20 21:46 Patrice T 30-Dec-20 21:46
 Last Visit: 31-Dec-99 18:00     Last Update: 28-May-23 0:04 Refresh 1