Dealing with very small numbers in C++

Question

0.00/5 (No votes)

See more:

0 down vote
favorite
Im dealing with a code which uses very small numbers of order 10^-15 to 10^-25, i tried using double and long double but i get a wrong answer as either 0.000000000000000000001 is rounded off to 0 or a number like 0.00000000000000002 is represented as 0.00000000000000001999999999999, as even a small fraction of 1/1000000 makes a significant difference in my final answers, please suggest me an appropriate fix. Thank you

What I have tried:

C++

#include <iostream>
     #include<math.h>
     #include<stdlib.h>
     #include<iomanip>
     using namespace std;
     int main()
     {
        double  sum, a, b, c,d;
        a=1;
        b=1*pow(10,-15);
        c=2*pow(10,-14);
        d=3*pow(10,-14);
        sum=a+b+c+d;
        cout<<fixed;
        cout<<setprecision(30);
        cout<<" a   : "<<a<<endl<<" b   : "<<b<<endl<<" c   : "<<c<<endl
            <<" d   : "<<d<<endl; 
        cout<<" sum : "<<sum<<endl<<endl;
        a=a/sum;
        b=b/sum;
        c=c/sum;
        d=d/sum;
        sum=a+b+c+d;
        cout<<" a   : "<<a<<endl<<" b   : "<<b<<endl<<" c   : "<<c<<endl
            <<" d   : "<<d<<endl; 
        cout<<" sum2: "<<sum<< endl;
        return 0;
}

The expected output should be
a : 1.000000000000000000000000000000
b : 0.000000000000001000000000000000
c : 0.000000000000020000000000000000
d : 0.000000000000030000000000000000
sum : 1.000000000000051000000000000000

a : 1.000000000000000000000000000000
b : 0.000000000000001000000000000000
c : 0.000000000000020000000000000000
d : 0.000000000000030000000000000000
sum1: 1.000000000000051000000000000000

But, the output i get is
a : 1.000000000000000000000000000000
b : 0.000000000000001000000000000000
c : 0.000000000000020000000000000000
d : 0.000000000000029999999999999998
sum : 1.000000000000051100000000000000

a : 0.999999999999998787999878998887
b : 0.000000000000000999999997897899
c : 0.000000000000019999999999999458
d : 0.000000000000029999999999996589
sum1: 0.999999999999989000000000000000
I tried double, long double and even boost_dec_float, but the output which i get is similar.

Posted 9-Jan-17 23:44pm

Member 12942893

Updated 10-Jan-17 0:41am

Add a Solution

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Jochen Arndt · Answer 1 · 2017-01-10T00:41:00

Most floating point values can not be represented exactly. As a result the stored values differ slightly from the real values. With double precision, 16 significant decimal digits can be represented. That means counting from the first non-zero digit, all digits being more than 16 positions to the right are not relevant (random when printed out).

When performing now operations with floating point values, bigger errors are introduced. If you for example add a smaller value to a larger one, the resulting precision is defined by the larger one:

  1.000 000 000 000 000 000
+ 0.000 000 000 000 001 xxx yyy
= 1.000 000 000 000 001 zzz

The digits marked with x and y in the above example will be lost and z becomes random (zero in the above example).

Performing more operations with the result will increase the errors. That is what you are seeing.

The only solution when the errors are too large is using a more precise number format. While long double might be used, you should check if it is supported on your platform. Microsoft Visual Studio for example does not use long double (the long double type is in fact a double).

You should also know about (and probably use) the scientifc format for floating point numbers.

You can use it for example to replace the pow() calls:

//b=1*pow(10,-15);
//c=2*pow(10,-14);
//d=3*pow(10,-14);
b = 1e-15;
c = 2e-14;
d = 3e-14;

It can be also used when printing values using the printf function. Such output is often better readable than a lot of trailing or leading zeroes. Especially the G format is useful (it will use the scientific format only for small and large numbers):

C++

printf("d: %.16G\n", d);

In the above example the precision is limited to 16 digits so that non-relevant digits won't be printed.

So you should try to use the above formatting within your program. If the results are as expected then (because non-relevant digits are not printed and the output is rounded instead), all is OK.

CPallini · Answer 2 · 2017-01-10T00:19:00

Solution 1

Have a look at boost's float128 - 1.63.0[^].

Posted 10-Jan-17 0:19am

CPallini