The article describes how to build a numeric library that performs calculations with quadruple floating-point precision and how to access the library from MSVC C/C++ code. The technique is illustrated by an example. This example demonstrates a dramatic increase in precision of the calculation compared to those performed with thestandard double precision.
Most of existing commercial compilers support the double precision calculation as the highest precision possible. This is due to the hardware restrictions of CPU. Extending the precision demands to emulate the precise calculations on the existing hardware thus leading to vast increase of the computation time.
The multi-precision libraries such as GNU MPFR (http://www.mpfr.org/) can perform the calculations with arbitrary precision. However, the usage of these libraries requires the user to rewrite the existing C or FORTRAN code in large extent. The quadruple floating point method is a good compromise between the double precision and the multi-precision calculations since it does not require the rewriting of the existing code (assuming it is supported by the compiler).
It is suggested to implement the quadruple floating-point calculations in C/C++ or FORTRAN dynamic or static library. The advantages of the precision increase is shown by the example of the polynomial roots calculation based on the LAPACK library usage. For this purpose the LAPACK library is compiled with the quadruple floating-point precision.
Floating-point number formats and variable types
The floating point numbers are presented in the format defined by IEEE Standard for Floating-Point Arithmetic (IEEE 754), see Wiki (https://en.wikipedia.org/wiki/IEEE_754). A floating-point number is presented as following:
| sign bit (S) |exponent bits (exp) | mantissa bits (man) |
The presented number is (-1)S * 1.mantissa * 2(exp) where 'mantissa' consists of binary digits and 'exp' is considered as a singed number.
We shall examine the following basic formats:
|Format||Description||Mantissa bits||Exponent bits||Decimal digits||Decimal exponent range|
The corresponding variable types in C++/FORTRAN are shown in the following table:
__float128 (GNU quadmath)
|FORTRAN (real)||REAL*4||REAL*8||REAL*16|| |
|FORTRAN (complex)||COMPLEX*8||COMPLEX*16||COMPLEX*32|| |
The popular compilers that support the quadruple floating point emulation on x86 and x86-64 CPU are Intel C/C++ and Fortran and GNU GCC and GFORTRAN with quadlib library. The Microsoft VC compiler does not support this type and is not seem to support it in the nearest future!
LAPACK library quadruple floating-point compilation
The LAPACK library is a standard numerical package (http://www.netlib.org/lapack/) that implements the linear algebra algorithms. The library is used in this article to calculate the roots of a polynomial. The library is FORTRAN written but is has it's C-port CLAPACK. The standard version of this library provides only the double precision functionality. The lack of higher precision implementation is most likely due to the fact that the double precision arithmetic is enough for most applications.
The two ways of the LAPACK modification for quadruple precision are proposed:
- Modification of LAPACK FORTRAN source
- Modification of CLAPACK C source
In the first case type declarations for variables
REAL*8, REAL, DOUBLE PRECISION, COMPLEX should be replaced by
If the FORTRAN90 standard is used, the types should be replaced by
real (kind=selected_real_kind (32)) and
complex (kind=selected_real_kind (32)) correspondingly.
COMPLEX*16 versions of in-build functions by their
COMPLEX*32versions. This procedure yet seems to require substantial modification of the code but it is rather straightforward.
In the second case, the include file f2c.h should be modified in order to redefine the basic types. The prototypes of the standard functions also should be replaced by their quadruple versions.
Polynomial roots calculation
The polynomial roots are calculated using the method of companion matrix. The companion matrix eigenvalues are the roots of the original polynomial. The companion matrix is calculated from the polynomial coefficients and the matrix eigenvalues are calculated using LAPACK function
Using the Code
Here is a brief description of how to use the article for coding.
The example code consists of the following blocks:
test_qroot.exe : Win32 executable, build by MSVC
qroot.dll : DLL, quadruple precision utilities,
build by GNU C++ with QuadMath or Intel C/C++ or Intel Visual Fortran
lapack_qd_w32.dll : DLL, quadruple precision LAPACK library,
build by GNU C++ with QuadMath or Intel C/C++ or Intel Visual Fortran
The quadruple floating-point type in qroot.dll is declared as follows
typedef _Quad real;
typedef _Quad doublereal;
typedef __float128 real;
typedef __float128 doublereal;
The prototypes of exported functions in qroot.dll are defined as follows:
__declspec(dllexport) void qroots(doublecomplex *r, doublecomplex *p, long *degP);
The same types in the caller application are declared as follows
typedef struct CMyQuad MyQuad;
typedef MyQuad* PQuad;
typedef struct CMyCQuad MyCQuad;
typedef MyCQuad* PCQuad;
The prototypes of the same imported functions in test_qroot.exe are defined as:
void qroots(PCQuad r, PCQuad p, long *degP);
It is critical here to preserve the same type size! The functions parameters should be passed by pointer, this also preserves the compatibility with FORTRAN.
Now, you must set addition compilation options.
For Intel C/C++ compiler the options are below. They enable the usage of the
For Intel Visual Fortran compiler the options as following. They interpret default real, complex, double precision, and double complex variable types and constants as 64-bit real, 128-bit complex, 128-bit real, and 256-bit complex. Note that the definite types should be replaced manually. Also it is recommended to increase the allowed line length for the LAPACK compilation.
/real-size:128 /Qimf-precision:high /extend_source:132
For the GNU C++ compiler you should only command to link with QuadMath library:
For the GNU Gfortran compiler the options are:
The intermediate results of
PCQuad type may be passed to the other utility functions. Finally a utility function may convert the results to standard
A tip: If DLLs are built by GCC, then when linking the example by MSVC set /SAFESEH:NO in Advanced MSVC Linker Options
The calculation with standard double precision is performed in MATLAB by the following script:
% Create polynom P(x) = ((x-x00)(x-conj(x00)) )^9
% Calculate the roots and compare with the original root x00.
ar = 0.00001;
ai = 10;
x0 = ar + ai*j;
x1 = ar - ai*j;
va = [1, -2*ar,ar^2+ai^2];
vb = va;
vb2 = conv(va,va);
vb4 = conv(vb2,vb2);
vb8 = conv(vb4,vb4);
vb9 = conv(vb8,va);
vb9r = real(vb9);
aa = vb9r;
xx = roots( aa);
yy(cnt) = polyval(aa,xx(cnt));
The results are
% 0.112215172144653 +10.131442757995252i
% 0.112215172144653 -10.131442757995252i
% 0.000010235951691 +10.173357532550744i
% 0.000010235951691 -10.173357532550744i
% -0.112194813184584 +10.131443062600157i
% -0.112194813184584 -10.131443062600157i
% -0.169232753684938 +10.027588249761465i
% -0.169232753684938 -10.027588249761465i
% 0.169252831553145 +10.027587788028235i
% 0.169252831553145 -10.027587788028235i
% -0.146212177557002 + 9.913350066605950i
% -0.146212177557002 - 9.913350066605950i
% 0.146231942318667 + 9.913349664139163i
% 0.146231942318667 - 9.913349664139163i
% -0.057077609506728 + 9.840940518344002i
% -0.057077609506728 - 9.840940518344002i
% 0.057097171965100 + 9.840940359975059i
% 0.057097171965100 - 9.840940359975059i
% 1.0e+004 *
% -3.584000000000000 + 0.515000000000000i
% -3.584000000000000 - 0.515000000000000i
% -7.155200000000000 + 0.000062243652344i
% -7.155200000000000 - 0.000062243652344i
% -5.324800000000000 - 0.705800000000000i
% -5.324800000000000 + 0.705800000000000i
% -4.300800000000000 - 1.023400000000000i
% -4.300800000000000 + 1.023400000000000i
% -4.838400000000000 + 0.984000000000000i
% -4.838400000000000 - 0.984000000000000i
% -6.412800000000000 - 1.089000000000000i
% -6.412800000000000 + 1.089000000000000i
% -6.464000000000000 + 1.140400000000000i
% -6.464000000000000 - 1.140400000000000i
% -3.520000000000000 - 0.276900000000000i
% -3.520000000000000 + 0.276900000000000i
% -5.030400000000000 + 0.330800000000000i
% -5.030400000000000 - 0.330800000000000i
The same calculations in the present example give roots
and there evaluation by the original polynomial P(roots):
We can see that absolute root error of 0.16 decreased to 0.002; the polynomial evaluation of 20000 decreased to 1e-14.
Points of Interest
Before I chose the described technique, I tried the multi-precision LAPACK library (MLAPACK). I did not succeed since function
zgeevx was not implemented and only some functions that deals with symmetrical matrices worked.
I also tried to use new object-oriented FORTRAN standards and incapsulate standart types by the implementation that uses the MFPR library. I also failed due to incorrect functionality of destructors in GNU GFORTRAN implementation.
It turned out that Intel Compiler has an advantage over MSVC by supporting the
_Quad variable type. Moreover, the numeric programming in Intel Visual Fortran is much simpler than that of Intel C/C++. It supports debugging, viewing and printing the quadruple variables while C++ environment has no these debugging features. The correct code high-lighting for
_Quad type variables is missing also.
But the main force of FORTRAN is direct operations on the complex numbers and working with matrices and arrays. It seems to be a great heritage of the nostalgic DEC compiler but the most of developers barf from FORTRAN now :).
And if you do not want to pay for a commercial compiler, use GNU C/C++/GFORTRAN.
P.S. It is interesting that the precision of the computation of multiple
polynomial roots may be increased with double precision calculations using
the algebraic technique developed by ZHONGGANG ZEN and implemented
in his MultRoot package:
 ZHONGGANG ZENG, "COMPUTING MULTIPLE ROOTS OF INEXACT POLYNOMIALS",
MATHEMATICS OF COMPUTATION,
Volume 74, Number 250, Pages 869-903
Article electronically published on July 22, 2004
 ZHONGGANG ZENG, "MultRoot - A Matlab package computing polynomial roots and multiplicities"
Journal ACM Transactions on Mathematical Software (TOMS), Volume 30 Issue 2, June 2004 , Pages 218-236
Software Engineer and Control Systems Expert at Digital Feedback Technologies Ltd.
PhD, Tel-Aviv University, 2009
M.S. , Faculty of Mechanics and Mathematics of Lomonosov Moscow State University,1994
Blog page: http://csafonov.blogspot.co.il/
Personal Open-source project: TMC Compiler, MATLAB to C code converter (https://sourceforge.net/projects/tmc-m-files-to-c-compiler)