## SP on a 64-bit CPU and DP on a 32-bit CPU

### SP on a 64-bit CPU and DP on a 32-bit CPU

Hi guys. I've read around but have not seen a definitive answer as to
whether evaluating a precision-sensitive function in a 64-bit CPU is
better than doing the same process in a 32-bit CPU (or vice versa).

I did some tests on my own, where I compared the output of the simple
Fortran program below on an AMD XP (32 bit) CPU and an AMD64 CPU. The
outputs were different. I'm using Compaq Visual Fortran in Windows XP
for the 32-bit tests and the Intel Fortran compiler in SuSE (64 bit)
9.2 for the 64-bit tests.

What is responsible for the discrepancy, and would this have
ramifications for precision-sensitive computations?

Results on a 32-bit AMD XP CPU:
30 1073741822.88889
31 2147483646.88889
32 -1.11111111193895
33 -1.11111111193895
34 -1.11111111193895

Results on 64-bit AMD64 CPU:
30 1073741824.00000
31 2147483648.00000
32 -1.11111116409302
33 -1.11111116409302
34 -1.11111116409302

Program test

double precision:: a30,a31,a32,a33,a34

a30 = 2**30-1-(1./9.)
a31 = 2**31-1-(1./9.)
a32 = 2**32-1-(1./9.)
a33 = 2**33-1-(1./9.)
a34 = 2**34-1-(1./9.)

Print *, '30',a30
Print *, '31',a31
Print *, '32',a32
Print *, '33',a33
Print *, '34',a34

end program test

### SP on a 64-bit CPU and DP on a 32-bit CPU

Best things is not to do the calculations with default reals and
integers in the first place.

e.g. compare the output of

a34=2.0D0**34-1.0D0-(1.0D0/9.0D0)

on both machines.

### SP on a 64-bit CPU and DP on a 32-bit CPU

In Fortran literal numeric constants
are single precision.

try the following

a30 = 2**30-1-(1.0d0/9.0d0)
a31 = 2**31-1-(1.0d0/9.0d0)
a32 = 2**32-1-(1.0d0/9.0d0)
a33 = 2**33-1-(1.0d0/9.0d0)
a34 = 2**34-1-(1.0d0/9.0d0)

abd rerun

### SP on a 64-bit CPU and DP on a 32-bit CPU

OK, the results of a34 on both machines were identical
(17179869182.8889). The output of:

a36=2.0**34-1.0-(1.0/9.0)

on the AMD64 machine was 17179869182.8889, while the output on the
32-bit AMD machine was correct (unchanged).

So I guess the AMD64 machine is more prone to rounding errors than the
32 bit machine? Or was it caused by the O/S and/or compiler?

How will this affect the output of finite element LS Dyna simulations?
(I have no access to the source code to know if the programmers took
the precautions Joost pointed out - hoping someone who knows does) I've
found that unsurprisingly, rounding errors tend to accrue as the
simulation progresses.

### SP on a 64-bit CPU and DP on a 32-bit CPU

oops

I didn't spot the integer exponentiation
overflow.

replace 2**30

with 2.0d0**30

### SP on a 64-bit CPU and DP on a 32-bit CPU

In article < XXXX@XXXXX.COM >,

x87 vs SSE?

-- g

### SP on a 64-bit CPU and DP on a 32-bit CPU

Yes, the changes make the results on both processors consistent.

Why should the 32-bit system produce correct results while the 64-bit
one doesn't? Is it the CPU, O/S or compiler?

Thanks.

### SP on a 64-bit CPU and DP on a 32-bit CPU

a30 = 2**30-1-(1./9.)
a31 = 2.0d0**30-1-(1./9.)
a32 = 2.0d0**30-1-(1.d0/9.d0)

32-bit AMD:
30 1073741822.88889
31 1073741822.88889
32 1073741822.88889

64-bit AMD:
30 1073741824.00000
31 1073741822.88889
32 1073741822.88889

### SP on a 64-bit CPU and DP on a 32-bit CPU

In article <434e8312\$0\$142\$ XXXX@XXXXX.COM >,
"Ian Chivers" < XXXX@XXXXX.COM > writes:

Most 32-bit architectures have a 32-bit default INTEGER kind.

In file rt.f90:7

a32 = 2**32-1-(1.0d0/9.0d0)
1
Error: Arithmetic overflow at (1)
In file rt.f90:8

a33 = 2**33-1-(1.0d0/9.0d0)
1
Error: Arithmetic overflow at (1)
In file rt.f90:9

a34 = 2**34-1-(1.0d0/9.0d0)
1
Error: Arithmetic overflow at (1)
In file rt.f90:16

--
Steve
http://www.yqcomputer.com/ ~kargl/

### SP on a 64-bit CPU and DP on a 32-bit CPU

Fundamentally this question has nothing (well, almost nothing - but
that's a better approximation than the other ones) to do with the CPU.

It is a very common mistake for people to assume that 64-bit CPU
translates directly into anything at all in terms of Fortran (or other
programming languages). This is a mistake for multiple reasons.

1. Fortran is implemented by a compiler. The compiler makes the relevant
implementation choices. Hardware can make some choices simpler for the
compiler, so there is certainly an effect, sometimes a pretty big
effect. But hardware does not dictate anything directly Fortran related.
Fortran has been implemented on CPUs that didn't have any floating point
capability built in at all (quite a few cases of that, actually). There
exist cases where compilers deliberately make choices that might not
seem at first obvious to you if you focused only on a specific CPU
capability. For example, it is quite common for compilers to make 32-bit
floating point the default, even on systems where you might think that a
64-bit default would be most natural; the usual reason for that is
compatibility with a large volume of code written with an assumption of
32-bit defaults.

2. Even if you restrict attention to the hardware, ignoring the role of
the compiler, the distinction between 32-bit and 64-bit CPUs does *NOT*
in general have much to do with floatting point precision. It has a lot
more to do with memory addressing. Descriptions like 64-bit CPU (or
64-bit system) have been largely taken over by marketting folk and, as a
result, you can't actually count on them having any specific technical
meaning. The meaning changes over the years, depending on whatever helps
sales. I personally have copies of manufacturer documentation that
refers to the same CPU as having different bit sizes. Nothing changed in
the hardware; the difference was only in time of publication of the
douments and in the manufacturer's changed definition of the terms. If
you want actual technical details, you need to look for actual technical
specs, which does not include terms like "64-bit CPU". Today, that
details of exactly what it means varying depending on who is doing the
selling.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain

### SP on a 64-bit CPU and DP on a 32-bit CPU

No, that's not true. 1.D0 is adouble precision numeric literal.

cheers,

Rich

### SP on a 64-bit CPU and DP on a 32-bit CPU

That's what I suspect. The SSE floating point unit, while having
parallel capabilities, doesn't have the temporary real with the 64 bit
mantissa that the x87 has-- intermediate values will have 12 less bits if
you're using double precision and 40 less for single precision reals. On
the compilers that accept gcc switches, you can specify -msse or -mno-sse
to enable and disable SSE respectively.

Andy

### SP on a 64-bit CPU and DP on a 32-bit CPU

While this is a plausible, aven believable, explanation, I don't
think it's the right explanation. Looking at the assembly output
generated by CVF, we see:

REAL8 041E0000000000000r ; REAL8 2147483648.00000
REAL8 0BFF1C71C80000000r ; REAL8 -1.11111116409302
REAL8 041D0000000000000r ; REAL8 1073741824.00000

So it looks like the compiler is computing the values in advance,
and I doubt that SSE vs. x87 is an issue in that case because the
compiler has to run on a processor without SSE. I feel that the
difference lies with the compilers themselves, and even the (not
given) switches chosen for compilation.

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

### SP on a 64-bit CPU and DP on a 32-bit CPU

As Greg appears to be expecting us to figure out, the compiler the OP
used for the 32-bit case supports only x87 code, where all expressions
are evaluated in double precision. Compilers for x86-64 and Windows x64
default to SSE/SSE2 code, where there is no automatic promotion of
expressions to double precision. It's more a question of the compilers
and options specified, which the OP didn't see fit to disclose, than the
32- vs 64- bitness of the hardware or the OS.

### SP on a 64-bit CPU and DP on a 32-bit CPU

This is almost correct, except that the OP used a Windows compiler,
which initialized x87 precision mode to 53 bits. So, there is no extra
precision beyond standard double precision, except for values outside
the range of standard double.