Click here to Skip to main content
15,993,913 members
Articles / General Programming / Algorithms

Integer math: beyond the limits

Rate me:
Please Sign up or sign in to vote.
4.79/5 (22 votes)
6 Jul 2016CPOL6 min read 19.4K   238   18   11
A coarse approach to operate huge integers.

Introduction

How often do we encounter numbers that exeed MAXUINT32 or even MAXUINT64 ? Indeed, 2^32 - 1 = 4294967295 and 2^64 - 1 = 18446744073709551615 are extremelly huge numbers. But what if you need to handle numbers like 2^3232 or greater without loss of precision ? That can never happen but in case you will encounter this it may to surprise you. I was suprised with the problem of computing such extra huge values while solving one exotic problem. So I thing my experience may someone come in handy.

Background

Lets look at arbitrary integer written on paper: 123456 for instance. We, human beings, use special numeral symbols from set [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and don't think a lot how much of such symbols to use to write a number on a paper or desk. Actually we are limited only with available empty space to write next digit. Even in case the line is over, we boldly keep note using next line below. When we need to add/substract/multiply/divide - we use our smart electronic helpers. Nowadays we are usually look at so called scientific notation of numbers because of length of number devices can handle is limited by system architecture. Scientific notation of numbers lets us to evaluate results of operations, exponent of number for instance. But what if results of huge and extra huge numbers arithmetics must be exact and precision is not a tradeoff ? In this case we need to build some lines of code to solve the problem.

Model

First of all, recall the ancient knowledge. CPU (hereinafter x86) uses binary code in it's work. Number 123456 (in decimal numeral system) looks like 1 1110 0010 0100 0000 in binary numeral system. When this number is stored in process memory at address N, it will look as follow:

N: [0100 0000]     N + 1: [1110 0010]     N+2: [0000 0001]

We have 3 bytes. As you know, byte is minimum memory storage unit (yes, it consists of 8 bits, but you can't add bit #5 to bit#7 atomically by only instruction). So arbitrary number being stored in memory splits into bytes and more significant bytes are stored at higher memory addresses. CPU can operate numbers in memory, but atomicity of this operations is limited. 32 - bit processor can handle 32 - bit operand (64/128 bits for some cases) by one instruction. As a finite state machine CPU changes it's state accordingly to result of perfomed operation.
Now it's time to refresh in mind binary arithmetics knowledges.

Let try to add two numbers, 17 (10001) and 18 (10010). I use small numbers to be brief and to the point.

   10001 (17)
+  10010 (18)
   -------
  100011 (35)

As you can see, when adding bits #4 there was an overflow and bit #5 is come in sight. Result width was expanded. If we would limit with the 5 bits the result was 00011 with overflow. In substraction there is situation when we have to "borrow" 1 from more significant bit and it is known as underflow.  Thereby in case of splitting number in memory we can't add (substract) next parts without taking as another operand state of overflow (underflow). Processor gives us such opporunity - adc and sbb instructions from "classic" instructions set are come in handy (snippet is for demonstration purposes only):
 
ASM
mov    ecx, [Size]
xor    esi, esi
clc
lahf
_AddLoop:
mov    ebx, [Addend]
mov    dl, [ebx][esi]
mov    ebx, [AddResult]
sahf
adc    [ebx][esi], dl 
lahf
inc    esi
loop    __AddLoop

Substraction operation is implemented similarly.
Now lets talk about the multiplication and try to multiply 17 (10001) and 18 (10010):

         10010 (18)
      *  10001 (17)
         --------
         10010
        00000
       00000
      00000
    010010
-----------------
    0100110010 (306)

As you see I just decompose multiply operation into sequence of shifts and additions. We shift left first operand everytime next bit of second operand is being checked. If second operand's bit is non-zero addition is involved.
It's pretty straighforward. Result width is sum of the operands widths. The implementation of multiply operation can be found at Source code.
And the last operation is division. As usually, do it by hands, pen and on the paper:

100110001 (305)       1110 (14)
01110                10101 (21) - Quotient
-------
0010100
0001110
---------
000011001
000001110
-----------
000001011 - Remainder (11)    

Indeed, 21 * 14 + 11 = 305.

So as you can see, divide operation can be decomposed into sequence of shifts, comparings and substractions and this algorithm is not too complex. Dividend and quotient are shifting left until it's most significant part is greater than divider (I call it "to alignment heads"). When divider is less than that shifted chunk - substraction is performed and 1 is set into quotient. Remainder is the bits are remained after all shifts and substractions. Divident is copied and handled into the separate double sized buffer.

Mechanics

All this functions are in Source code implemented with common assembler instructions and are subject to improve. My approach is straighforward and there are some confines: 

  • all numbers are unsigned. It lets to avoid some difficulties with negative numbers handling. 
  • all numbers are equal sized. 
  • all numbers are DWORDs arrays. 

Ways to increase perfomance.

First and simpliest - declare functions as naked and __fastcall. It reduces overheads for prologue/epilogue code execution and passing arguments via stack. Then - use instructions that handle max sized operands. (I used instructions from ancient "classic" command set). But before do that make sure they are supported by processor. CPUID is a good approach for this. Second and more sophisticated - review algorithms control flow. In case of add and sub routines it seems there are nothing to enchance in such way, but mul and div are subjects for such analysis and improvements. For mul algorithms first of all we can reduce number of first operand shifts. We can calculate the "distance" to next non-zero bit in second operand and shift first operand by that number of bits. For instance, following code snippet is consequetly checks second operands bits for set ones and calculates such "distance":

ASM
        mov        ebx, [lpUIntR]
        xor        eax, eax                //    Bit #no for test
        mov        esi, eax                //    lpUIntR DWORD part #no
        mov        edi, eax                //    Bits count for shift buffer
__L_BitTest:
        bt        [ebx][esi * 4], eax
        jc        __L_BitIsNonZero
        inc        edi
__L_BitTestLoop:
        inc        eax
        cmp        eax, 32
        jne        __L_BitTest
        xor        eax, eax
        inc        esi
        cmp        esi, [cnSize]
        jne        __L_BitTest
        jmp        __L_MulIsDone

But such manipulation can impact perfomance in case of "dense" second operand.

The div algorithms even more prone to improvements. As you remember, before iterations starts dividend and divider must be "aligned" to each other. This startup "align" operation can be made by one "pass" (before iterations). This can improve perfomance in case of dividing numbers with near-valued exponents but not in case of dividing numbers such as in source. Another enchancement is to calculate distance to shift buffer left to align it to the divider inside the iterations loop. But the advantages of such improvement are subtle: buffer is changed by every iteration.

Anyway you will find appropriate solution experimentally in case of interest. So in Source code you will find coarse algoritms implementations.

And now let me try to impress you. Look at results of test from source:

Dividend: 3121748550315992231381597229793166305748598142664971150859156959625371738819765620120306103063491971159826931121406622895447975679288285306290175
Divider: 100000
Result: 31217485503159922313815972297931663057485981426649711508591569596253717388197656201203061030634919711598269311214066228954479756792882853062
Remainder: 90175

Exponents of handled integers are impressive as for me.

Conclusions

There is nothing impossible with assembler. It gives us all the power of system, but we must to use it with extremelly care. As in case of integer operations you can let yourself to forget about loss of precision in integer calculations. But don't forget: we are always have to pay for everything. In this case you will pay with your application perfomance.

P.s. Sorry for my poor English.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Engineer
Ukraine Ukraine
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionDo not understand well the multiply algorithm Pin
Andrea Simonassi27-Oct-16 3:09
Andrea Simonassi27-Oct-16 3:09 
AnswerRe: Do not understand well the multiply algorithm Pin
Victor Soldatov28-Oct-16 0:02
professionalVictor Soldatov28-Oct-16 0:02 
GeneralRe: Do not understand well the multiply algorithm Pin
Andrea Simonassi28-Oct-16 1:23
Andrea Simonassi28-Oct-16 1:23 
GeneralRe: Do not understand well the multiply algorithm Pin
Victor Soldatov28-Oct-16 7:16
professionalVictor Soldatov28-Oct-16 7:16 
You are welcome.
Indeed, 12 x 34 = (12 x 4) * 10^0 + (12 x 3) * 10^1. Try to expand expression:

(10 + 2) x 4 * 10^0 + (10 + 2) x 3 * 10^1 = 10 x 4 * 10^0 + 2 x 4 * 10^0 + 10 x 3 * 10^1 + 2 x 3 * 10^1.
Grouping results to 1 x 3 * 10^2 + (1 x 4 + 2 x 3) * 10^1 + 2 x 4 * 10^0.

Actually, generic case of two numbers multiply in arbitrary numeral system is like follow:

(XnXn-1Xn-2...X0) * (YmYm-1Ym-2...Y0) = SUM(i = 0 to m)(Bi * SUM(j = 0 to n)(Xi * B^j) * B^i), where B is numeral system base value you work with, (XnXn-1Xn-2...X0) and (YmYm-1Ym-2...Y0) are notations of X and Y numbers in positional numeral system with base value of B.

As I understand you try to split both multipliers to separate digits to handle them than. In this case you can simply multiply corresponding digits atomically by CPU instructions and add partial multiplication result to result of whole operation (if numbers length is above CPU atomicity limit). You do not need split both numbers. But if you will try to group or optimize intermediate multiplications, your algorithm's impementation performance will significantly decrease in contrast with straighforward implementations and you must have really good reasons to introduce extra logic here.

You are wrong with note "12x4+12x3". Proper note is 12x4+12x3x10.

Really, numeral system base value is no matter while you work with positional numeral system with constant base value. System base value affects number's view only: 100 in decimal is 64 in hex, 144 in oct and 1100100 in binary. Number value is not changed. Length of CPU general purposes registers (CPU atomicity) is enough for most kinds of calculations. But result of mul instruction is stored into 2 registers. What if you will need to multiply this value with result of another multiplication, i.e. number in 2 registers with another same length number ? That's why I wrote this article.
GeneralRe: Do not understand well the multiply algorithm Pin
Andrea Simonassi28-Oct-16 7:57
Andrea Simonassi28-Oct-16 7:57 
GeneralRe: Do not understand well the multiply algorithm Pin
Victor Soldatov28-Oct-16 10:58
professionalVictor Soldatov28-Oct-16 10:58 
QuestionInteresting subject (5 starts) Pin
Jose A Pascoa7-Sep-16 21:08
Jose A Pascoa7-Sep-16 21:08 
AnswerRe: Interesting subject (5 starts) Pin
Victor Soldatov14-Sep-16 8:24
professionalVictor Soldatov14-Sep-16 8:24 
GeneralRe: Interesting subject (5 starts) Pin
Jose A Pascoa15-Sep-16 7:00
Jose A Pascoa15-Sep-16 7:00 
GeneralRe: Interesting subject (5 starts) Pin
Victor Soldatov15-Sep-16 23:38
professionalVictor Soldatov15-Sep-16 23:38 
GeneralRe: Interesting subject (5 starts) Pin
Jose A Pascoa17-Sep-16 22:15
Jose A Pascoa17-Sep-16 22:15 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.