Floating Point Multiplication - 32 and 64 bit differences

Floating Point Multiplication - 32 and 64 bit differences

Post by brady_fh » Thu, 20 May 2010 00:39:29


Hello,

I need some help understanding floating point operations. I am seeing
a small difference in a program when it is compiled for 32 and 64 bit
machines. Here is the program:

--------------------------------------------------------
PROGRAM DPStudy
IMPLICIT NONE
REAL(8) :: FVF
REAL(8) :: THR
REAL(8) :: Dum1, Dum2

FVF = 3.3D0
THR = 3.0D0

Dum1 = FVF * THR * DSQRT(THR)
Dum2 = FVF * DSQRT(THR) * THR

1 FORMAT(A, ES28.22)
WRITE(*,*) ''
WRITE(*,1) 'Dum1 = ', Dum1
WRITE(*,1) 'Dum2 = ', Dum2
WRITE(*,*) ''
STOP
ENDPROGRAM DPStudy
--------------------------------------------------------

Pretty simple. I am using gfortran 4.3.3 on Ubuntu 9.04. The following
shows the different compiler options I use and the resulting output
from the above program:


--------------------------------------------------------
COMPILER:
gfortran -m32 -O0 main.f90

OUTPUT:
Dum1 = 1.7147302994931884256857E+01
Dum2 = 1.7147302994931884256857E+01
--------------------------------------------------------

--------------------------------------------------------
COMPILER:
gfortran -m64 -O0 main.f90

OUTPUT:
Dum1 = 1.7147302994931880704144E+01
Dum2 = 1.7147302994931884256857E+01
--------------------------------------------------------

--------------------------------------------------------
COMPILER:
gfortran -m32 -O1 main.f90

OUTPUT:
Dum1 = 1.7147302994931880704144E+01
Dum2 = 1.7147302994931884256857E+01
--------------------------------------------------------

--------------------------------------------------------
COMPILER:
gfortran -m64 -O1 main.f90

OUTPUT:
Dum1 = 1.7147302994931880704144E+01
Dum2 = 1.7147302994931884256857E+01
--------------------------------------------------------


So it appears the order of multiplication yields a different result
which doesn't surprise me all that much. However, it looks like when
optimization is turned off for the 32 bit case the compiler re-orders
the order of multiplication. I tried going through the gfortran
documentation to understand what happens when -O1 is used but I
couldn't find a single flag that triggers this behavior. Can anybody
help me understand what is going on here?

For what it's worth - similar behavior was observed using the Intel
Fortran Compiler 11.1 for Windows.

Thank you!
 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by robi » Thu, 20 May 2010 01:24:31


| Hello,
|
| I need some help understanding floating point operations. I am seeing
| a small difference in a program when it is compiled for 32 and 64 bit
| machines. Here is the program:

In the first place you specified a number kind (viz. 8).
That is implementation dependent and hence non-portable.
Whether it means the same to both compilers, I don't know,
but I suspect that it doesn't.

In the second place, Runs #2-4 have given you
two results that agree to 16 significant digits,
which is what I would expect for 64-bit float precision.

As for the slightly differing results in run #2 --
further investigtion is suggested. Have you tried,
for example, substituting SQRT for DSQRT?

| --------------------------------------------------------
| PROGRAM DPStudy
| IMPLICIT NONE
| REAL(8) :: FVF
| REAL(8) :: THR
| REAL(8) :: Dum1, Dum2
|
| FVF = 3.3D0
| THR = 3.0D0
|
| Dum1 = FVF * THR * DSQRT(THR)
| Dum2 = FVF * DSQRT(THR) * THR
|
| 1 FORMAT(A, ES28.22)
| WRITE(*,*) ''
| WRITE(*,1) 'Dum1 = ', Dum1
| WRITE(*,1) 'Dum2 = ', Dum2
| WRITE(*,*) ''
| STOP
| ENDPROGRAM DPStudy
| --------------------------------------------------------
|
| Pretty simple. I am using gfortran 4.3.3 on Ubuntu 9.04. The following
| shows the different compiler options I use and the resulting output
| from the above program:
|
|
| --------------------------------------------------------
| COMPILER:
| gfortran -m32 -O0 main.f90
|
| OUTPUT:
| Dum1 = 1.7147302994931884256857E+01
| Dum2 = 1.7147302994931884256857E+01
| --------------------------------------------------------
|
| --------------------------------------------------------
| COMPILER:
| gfortran -m64 -O0 main.f90
|
| OUTPUT:
| Dum1 = 1.7147302994931880704144E+01
| Dum2 = 1.7147302994931884256857E+01
| --------------------------------------------------------
|
| --------------------------------------------------------
| COMPILER:
| gfortran -m32 -O1 main.f90
|
| OUTPUT:
| Dum1 = 1.7147302994931880704144E+01
| Dum2 = 1.7147302994931884256857E+01
| --------------------------------------------------------
|
| --------------------------------------------------------
| COMPILER:
| gfortran -m64 -O1 main.f90
|
| OUTPUT:
| Dum1 = 1.7147302994931880704144E+01
| Dum2 = 1.7147302994931884256857E+01
| --------------------------------------------------------
|
|
| So it appears the order of multiplication yields a different result
| which doesn't surprise me all that much. However, it looks like when
| optimization is turned off for the 32 bit case the compiler re-orders
| the order of multiplication. I tried going through the gfortran
| documentation to understand what happens when -O1 is used but I
| couldn't find a single flag that triggers this behavior. Can anybody
| help me understand what is going on here?
|
| For what it's worth - similar behavior was observed using the Intel
| Fortran Compiler 11.1 for Windows.
|
| Thank you!

 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by Gordon San » Thu, 20 May 2010 02:23:05

On 2010-05-18 12:39:29 -0300, brady_fht < XXXX@XXXXX.COM > said:


What is a 32 or 84 bit machine? Usually it is a discussion of the length
of memory addresses and has nothing to do with arithmetic.


Without parens, what order do you think is "correct"? Can you justify
your claim? Have you tried it with explicit parens?


Most flags controlling low level compilation issuses for the GNU compilers
are given to the back end and are not language specific. Look up GCC.
 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by glen herrm » Thu, 20 May 2010 02:23:29


On most machines now in use, double precision floating point
has a 53 bit significand, for 53*log10(2.) or about 16 digits.

(snip)


(snip)

The use of constant KIND values is non-portable, but most
likely in this case gives the double precision you expect.

Many compilers optimize constant expressions, or expressions
that they can figure out are constant, by computing the value
at compile time. Often that is different than the result that
would be obtained if the expressions were not constant.

You might try again, with

READ(*,*) FVF, THR

and then typing in the appropriate values.


(snip of gfortran -m32)



As I said, it is possible that the compiler is doing the
calculation at compile time, more likely with -O1.

In any case, compile with the -S option and post the
resulting .s file. (I would remove the unneeded write
statements first.)

(snip of additional tests with the same result.)


There are a few possibilities. One, though not as likely as it
used to be, is using the x87 floating point instructions which
compute to 64 significant bits, and then rounding the result to 53.
That often is one LSB different than multiplying the 53 bit values.
On machines not built by Cray, floating point multiplication is
usually commutative, but not always associative.


The usual rule is that you should not depend on it being better
than that. The differences you see are in the least significant
bit, or close to it, for a 53 bit significand.

-- glen
 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by Tobias Bur » Thu, 20 May 2010 02:41:45


By default GCC on x84 (ia32, i686, ...) uses the floating-point
coprocessor "x87" (-mfpmath=386) while on x86-64 (AMD64, Intel64, em64t,
...) by default SSE is used. While SSE uses proper rounding, the x87
processor has some excess precision, which is used for intermediate
results. One way to force the rounding is to store the intermediate
result in memory (opposed to the x87 registers); in GCC/gfortran you can
use -ffloat-store to force this.*

When you use -O1 (or higher), the compiler plugs in the values of the
variables and calculates the result at compile time (with proper rounding).

If you want to see what the compiler does, you can use the
-fdump-tree-all option, which dumps a (simplified, C-syntax-like)
internal representation. The "original" and "optimized" are probably
most interesting.


(Newer) Intel compilers default to SSE.

* = GCC 4.5 and 4.6 also offer -fexcess-precision=, but I think this
option is not yet supported for GCC's Fortran front end. Additionally,
GCC 4.5 and 4.6 have now a compiler compile-time configuration option,
which allows to enable SSE by default (--with-fpmath=sse).
For details, see http://www.yqcomputer.com/

* * *

General remark: Don't look too closely at the last digits in
floating-point calculations and do not assume that changing the order
won't change the result. I would also suggest you to read the Goldberg
paper.

a) Goldberg paper:
http://www.yqcomputer.com/ +What+Every+Computer+Scientist+Should+Know+About+Floating-Point+Arithmetic

b) Monniaux paper:
http://www.yqcomputer.com/

c) Semantics of Floating Point Math in GCC
http://www.yqcomputer.com/

d) Note on the x87 coprocessor (for GCC)
http://www.yqcomputer.com/
http://www.yqcomputer.com/

e) IEEE 754:2008 "IEEE Standard for Floating-Point Arithmetic"
http://www.yqcomputer.com/
http://www.yqcomputer.com/ (last
publicly available draft I found)

Tobias
 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by brady_fh » Thu, 20 May 2010 02:45:06

Okay - I have changed the program slightly (Real declarations, added
parentheses, and eliminated one variable):

--------------------------------------------------
PROGRAM DPStudy
IMPLICIT NONE
REAL(KIND=KIND(0.D0)) :: FVF
REAL(KIND=KIND(0.D0)) :: THR
REAL(KIND=KIND(0.D0)) :: Dum1

FVF = 3.3D0
THR = 3.0D0

Dum1 = (FVF * THR) * DSQRT(THR)

1 FORMAT(A, ES28.22)
WRITE(*,1) 'Dum1 = ', Dum1
STOP
ENDPROGRAM DPStudy
--------------------------------------------------

Now I want to focus on only two scenarios:

--------------------------------------------------------
COMPILER:
gfortran -m32 -O0 main.f90
OUTPUT:
Dum1 = 1.7147302994931884256857E+01
--------------------------------------------------------

--------------------------------------------------------
COMPILER:
gfortran -m64 -O0 main.f90
OUTPUT:
Dum1 = 1.7147302994931880704144E+01
--------------------------------------------------------


So, neglecting optimization the 32 and 64 bit versions of this program
yield a different result. And the program now uses parentheses to
force the order of multiplication.

Also, here is the binary representation (all 64 bits) of each result
(1st line is 32 bit program result, and 2nd line is the 64 bit program
result):

0110101110101011100101000110010110101101101001001000110000000010
1010101110101011100101000110010110101101101001001000110000000010

So the only two bits that are different are the two least significant
bits.

I guess I am just confused as to how this number is being computed
differently. Maybe I'm missing something obvious here or maybe
somebody is already slapping me in the face with the answer, but I
can't wrap my head around it.

Thanks!
 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by brady_fh » Thu, 20 May 2010 02:49:56


Thanks Tobias! A lot of good info here that I will dive into. The "-
ffloat-store" flag did the trick and your explanation makes sense.

Just FYI - I couldn't get the Intel compiler to give the same behavior
using the program I posted but I have a much larger program that was
exhibiting similar behavior and turning optimization on eliminated the
differences in results. I will have to look at it closer.
 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by Tim Princ » Thu, 20 May 2010 03:04:18


When you use ifort, by default, the x87 precision mode is set to 53-bit,
so you won't see as many differences between SSE2 and x87 double
precision as you would with gfortran. x87 single precision x87
expressions are still promoted to double implicitly.
You may also be interested in the list of options which have to be set
to make ifort comply with standards:
http://www.yqcomputer.com/


--
Tim Prince
 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by glen herrm » Thu, 20 May 2010 03:19:41


(snip)



Note, though, that if your program is sensitive to the difference
in results in the last bit of a product, then you should fix the
program.

If you are trying to compare different versions of a program,
then the comparison should be such that LSB bit errors are
tolerated.

-- glen
 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by robi » Thu, 20 May 2010 12:18:47


| Okay - I have changed the program slightly (Real declarations, added
| parentheses, and eliminated one variable):

But you haven't changed DSQRT to SQRT.
Also, you didn't read in the variables.
Have you tried examining the bits of 3.3? in both cases?

| --------------------------------------------------
| PROGRAM DPStudy
| IMPLICIT NONE
| REAL(KIND=KIND(0.D0)) :: FVF
| REAL(KIND=KIND(0.D0)) :: THR
| REAL(KIND=KIND(0.D0)) :: Dum1
|
| FVF = 3.3D0
| THR = 3.0D0
|
| Dum1 = (FVF * THR) * DSQRT(THR)
|
| 1 FORMAT(A, ES28.22)
| WRITE(*,1) 'Dum1 = ', Dum1
| STOP
| ENDPROGRAM DPStudy
| --------------------------------------------------
|
| Now I want to focus on only two scenarios:
|
| --------------------------------------------------------
| COMPILER:
| gfortran -m32 -O0 main.f90
| OUTPUT:
| Dum1 = 1.7147302994931884256857E+01
| --------------------------------------------------------
|
| --------------------------------------------------------
| COMPILER:
| gfortran -m64 -O0 main.f90
| OUTPUT:
| Dum1 = 1.7147302994931880704144E+01
| --------------------------------------------------------
|
|
| So, neglecting optimization the 32 and 64 bit versions of this program
| yield a different result. And the program now uses parentheses to
| force the order of multiplication.
|
| Also, here is the binary representation (all 64 bits) of each result
| (1st line is 32 bit program result, and 2nd line is the 64 bit program
| result):
|
| 0110101110101011100101000110010110101101101001001000110000000010
| 1010101110101011100101000110010110101101101001001000110000000010
|
| So the only two bits that are different are the two least significant
| bits.
|
| I guess I am just confused as to how this number is being computed
| differently. Maybe I'm missing something obvious here or maybe
| somebody is already slapping me in the face with the answer, but I
| can't wrap my head around it.
|
| Thanks!
 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by Eli Oshero » Fri, 28 May 2010 20:26:00


> REAL(KIND=KIND(0.D0)) >: FVF
> REAL(KIND=KIND(0.D>)) :: THR
> REAL(KIND=KIND(>.D>)) :: Dum1
>
> gt;FVF = 3.3D0 >> > THR = 3.0D0
>
> Dum1 > (>VF * THR) * DSQRT(THR)
>
> > >RITE(*,1) 'Dum1 = ', >um1
> STOP
> ENDPROGRAM DPStudy
> ---------------->-->------------------------------
>
> Now I wa>t >o focus on only two scenarios:
>
> ---------------------->------------>--------------------
> COMPI>ER:
> gfo>tran -m32 -O0 main.f90
> OUTPUT:
> D>m1 = 1.7147302994931884256857E+01
> ---------------------->-->------------------------------
>
> ---------------------->------------>--------------------
> COMPI>ER:
> gfo>tran -m64 -O0 main.f90
> OUTPUT:
> D>m1 = 1.7147302994931880704144E+01
> ---------------------->-->------------------------------
>
> So, neglecting optimization the 32 a>d 64 bit versions of this program
> yield a different result. And t>e program now uses parentheses to
> >or>e the order of multiplication.
>
> Also, here is the binary represent>tion (all 64 bits) of each result
> (1st line is 32 bit program result, >nd 2nd line>is>the 64 bit program
> result):
>
> 011010111010101110010100011001>110101101101001001000110000000010
> 101010111010101110010100011001>11>101101101001001000110000000010
>
> So the only two bits that are diffe>ent are >he>two least significant
> bits.
>
> I guess I am just confused as to>how this number is being computed
> differently. Maybe I'm missi>g something obvious here or maybe
> somebody is already slapping me >n the face with the answer, but >
>
> Thanks!

As somebody suggested earlier, use -S flag to see the assembly code.
I bet you will discover that -m32 causes gfortran to use x87
instructions which -m64 results in SSE code.
Which explains the difference.
 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by mecej » Fri, 28 May 2010 21:06:11


You should consider changing your viewpoint. We know that 64 bit IEEE reals
have 53 bits of precision, which is approximately 15 to 16 decimal digits
(=log10(53)). Your two "different" results agree to 16 digits. To me, they
_are_ _the_ _same_!

Digits beyond the 16th are trash (artefacts of how the numbers are processed
on a specific CPU with a specific compiler with a specific set of options).
Once you condition yourself to regarding these numbers as being equal, the
futility of printing more than 17 significant digits will be apparent. The
seven *** th (or earlier digits, depending on the algorithm) may also be
junk, but seeing them being different on different processors may be
comforting.

-- mecej4
 
 
 

Floating Point Multiplication - 32 and 64 bit differences

Post by Eli Oshero » Fri, 28 May 2010 22:19:52


I think the OP demonstrated that those 53 bits are not the same (the
last two being different).
I believe that -m32 and -m64 flags result in different code: x87 and
SSE respectively.