Author Topic: Max / Min ranges of float, double and extended data types  (Read 2801 times)

0 Members and 1 Guest are viewing this topic.

Arnold

  • Guest
Max / Min ranges of float, double and extended data types
« on: October 07, 2018, 04:42:00 AM »
Hello,

with the help of Charles I was able to determine the max/min representation of float/single, double and extended data types which I compared with float.h. I was not able to get a representation of long doubles in Windows using tcc or gcc. In particular I do not know if a more precise value for the minimum positive extended value is possible with Oxygen. But probably nobody will use a value with 4000 zeros following the decimal point.

Roland

Code: OxygenBasic
  1. ' Max limits of representation for float/single, double, extended
  2. ' comparing with TCC/float.h  and
  3. ' //http://www.naic.edu/~phil/hardware/vertex/sharemegsvertex/lcu/pcr/c600/include/FLOAT.H
  4.  
  5. $ filename "RangeLimits.exe"
  6.  
  7. 'uses rtl32
  8. 'uses rtl64
  9.  
  10. uses console
  11.  
  12. SetConsoleTitle("Max Representation of Float, Double, Extended")
  13. string fmt(int v){ return right(hex(v,8),8) }
  14.  
  15. float f, double d, extended e
  16. int  *i, quad *q
  17.  
  18. 'FLT_MAX        3.402823466e+38F        /* max value */
  19. f=2^127*1.9999999
  20. @i=@f
  21. printl "Max  float   =  " f  tab "hex: " hex(i,8)
  22. f=-f
  23. printl "Max -float   = " f  tab "hex: " right(hex(i),8)
  24. 'FLT_MIN        1.175494351e-38F        /* min positive value */
  25. f=2^-126
  26. printl "Min positive =  " f  tab "hex: " hex(i,8)
  27. printl
  28.  
  29. 'DBL_MAX        1.7976931348623158e+308 /* max value */
  30. d=2^1023*1.9999999999999998
  31. @q=@d
  32. printl "Max  double   =  " d  tab "hex: " hex(q)
  33. d=-d
  34. printl "Max -double   = " d  tab "hex: " hex(q)
  35. 'DBL_MIN        2.2250738585072014e-308 /* min positive value */
  36. d=2^-1022
  37. printl "Min positive  =  " d  tab "hex: " hex(q,16)
  38. printl
  39.  
  40. 'LDBL_MAX       1.189731495357231765e+4932L /* max value */  'a bit different
  41. e=2^16384
  42. @i=@e
  43. printl "Max  extended =   " e  tab "hex: " fmt(i[3]and 0xffff) "  " fmt(i[2]) "  " fmt(i[1])
  44. e=-e
  45. printl "Max -extended =  " e  tab "hex: " fmt(i[3]and 0xffff) "  " fmt(i[2]) "  " fmt(i[1])
  46. 'LDBL_MIN       3.3621031431120935063e-4932L /* min positive value */  'quite different
  47. e=2^-16330
  48. printl "Min positive  =   " e  tab "hex: " fmt(i[3]and 0xffff) "  " fmt(i[2]) "  " fmt(i[1])
  49. printl
  50.  
  51. printl "Enter ..." : waitkey  
  52.  

Mike Lobanovsky

  • Guest
Re: Max / Min ranges of float, double and extended data types
« Reply #1 on: October 07, 2018, 05:17:43 AM »
There are no long doubles proper under x86 or x64 Windows. MS VC, Intel C++, tcc, gcc et al. are all supposed to use ordinary 64-bit doubles instead of long doubles on both 32- and 64-bit platforms.

Built-in 80-bit (10 byte) extended precision calc as in VB6, PB and O2 is incompatible with C libraries and an overwhelming majority of other languages and as such, is of limited benefit to the end user under either bitness. Extended precision is the simplest special case of arbitrary precision calc and should be handled with specialized instruments (libraries, frameworks, etc) under Windows.

On i386 CPUs, Linux uses natively 64-bit doubles under 32 bits, and 80-bit long doubles under 64 bits.

Thus we can say that 64-bit Linux and Windows calc is supposed to be incompatible by design. And I suspect this has been done by the Linuxoids on purpose, just for the hell of being different from the dreaded OS they are jumping out of their pants to rival -- so far, unsuccessfully. ;)

jack

  • Guest
Re: Max / Min ranges of float, double and extended data types
« Reply #2 on: October 07, 2018, 05:58:37 AM »
@Arnold
gcc does support long double intrinsics but lacks  IO without a special directive, see http://www.jose.it-berater.org/smfforum/index.php?topic=5276.msg22963#msg22963

Charles Pegge

  • Guest
Re: Max / Min ranges of float, double and extended data types
« Reply #3 on: October 07, 2018, 06:09:19 AM »
o2 performs all floating point calculations on the FPU stack (@80bits), thus minimising loss of precision from using 64bit intermediates. The 64bit significand also enables quad integers to be used in the same calculations.

Mike Lobanovsky

  • Guest
Re: Max / Min ranges of float, double and extended data types
« Reply #4 on: October 07, 2018, 09:13:35 AM »
gcc does support long double intrinsics but lacks  IO without a special directive

Thanks Jack,

But the use of special directives switches gcc to a non-standard mode of operation that requires either static linking or a gcc specific runtime.

What's the point in using a compiler feature that's incompatible with the operating system's standard library and IO interface?

o2 performs all floating point calculations on the FPU stack (@80bits), thus minimising loss of precision from using 64bit intermediates. The 64bit significand also enables quad integers to be used in the same calculations.

Thanks Charles,

But extra calc precision can often be detrimental to the resultant expected accuracy. I've seen such calc drawing weird fractals where compliant BASICs would draw predictable patterns, and I've seen it yield wrong results in geodesic tasks that would differ from what's written in the official reference manuals.

By all means you're free to go your way. But if I were you, I'd prefer to stay as compatible with C as possible, and especially should I be considering a possibility to delegate some of the language features to the system helpers.

JRS

  • Guest
Re: Max / Min ranges of float, double and extended data types
« Reply #5 on: October 07, 2018, 09:26:12 AM »
Quote from: Mike
By all means you're free to go your way. But if I were you, I'd prefer to stay as compatible with C as possible, and especially should I be considering a possibility to delegate some of the language features to the system helpers.

I agree!

DLLC

Charles Pegge

  • Guest
Re: Max / Min ranges of float, double and extended data types
« Reply #6 on: October 07, 2018, 12:09:56 PM »
I see that the precision is configurable (fldcw/fstcw): bits 8 and 9

That means I don't have to rewrite a large section of o2 :)

Code: [Select]
PRECISION CONTROL FIELD
8,9 PC ->  Single Precision   (24-bit) = $00B
           Double Precision   (53-bit) = $10B
           Extended Precision (64-bit) = $11B
           Reserved                    = $01B

ROUNDING MODE
10,11 RC ->  Round to nearest even      = $00B
             Round down toward infinity = $01B
             Round up toward infinity   = $10B
             Round toward zero (trunc)  = $11B

http://www.efg2.com/Lab/Library/Delphi/MathFunctions/FPUControlWord.Txt

JRS

  • Guest
Re: Max / Min ranges of float, double and extended data types
« Reply #7 on: October 07, 2018, 02:09:40 PM »
No matter what people say about O2 compatibility, you more than make up for it with avoiding limitations.

Mike Lobanovsky

  • Guest
Re: Max / Min ranges of float, double and extended data types
« Reply #8 on: October 07, 2018, 03:30:59 PM »
Charles,

The C language that's the native language of perhaps 99% of modern MS Windows system facilities follows the rules of type promotions and conversions.

You are not supposed to improvise on your calc precision if you want to stay compatible with the operating system. You may not promote voluntarily your every calc operation to 80 bits if you don't want your fractals curl in the directions opposite to the rest of MS Windows and its BASIC world. ;)

Charles Pegge

  • Guest
Re: Max / Min ranges of float, double and extended data types
« Reply #9 on: October 07, 2018, 08:57:14 PM »
Hi Mike,

o2 broadly follows those rules, and converting parameter expressions to fit polymorphic functions gets really interesting :)

Most floating point hardware, including x86 SIMD is 64bit. So it makes sense to lock into this standard. What a shame.

This code sets the FPU mode to 64bit:

Code: [Select]
  'SET FPU TO 64BIT FLOAT (53BIT PRECISION)
  int cw
  fstcw cw
  and cw,0xfffcff
  or cw,0x200
  fldcw cw

and back to its default 80bit:

Code: [Select]
  'SET FPU TO 80BIT FLOAT (64BIT PRECISION)
  int cw
  fstcw cw
  or cw,0x300
  fldcw cw

« Last Edit: October 07, 2018, 09:31:17 PM by Charles Pegge »