Hilfe Warenkorb Konto Anmelden
 
 
   Schnellsuche   
     zur Expertensuche                      
Handbook of Floating-Point Arithmetic
  Großes Bild
 
Handbook of Floating-Point Arithmetic
von: Jean-Michel Muller, Nicolas Brisebarre, Florent de Dinechin, Claude-Pierre Jeannerod, Vincent Lefèvr
Birkhäuser Basel, 2009
ISBN: 9780817647056
572 Seiten, Download: 6119 KB
 
Format:  PDF
geeignet für: Apple iPad, Android Tablet PC's Online-Lesen PC, MAC, Laptop

Typ: B (paralleler Zugriff)

 

 
eBook anfordern
Inhaltsverzeichnis

  Contents 5  
  Preface 14  
  List of Figures 16  
  List of Tables 19  
  I Introduction, Basic Definitions, and Standards 22  
     1 Introduction 23  
        1.1 Some History 23  
        1.2 Desirable Properties 26  
        1.3 Some Strange Behaviors 27  
           1.3.1 Some famous bugs 27  
           1.3.2 Difficult problems 28  
     2 Definitions and Basic Notions 33  
        2.1 Floating-Point Numbers 33  
        2.2 Rounding 40  
           2.2.1 Rounding modes 40  
           2.2.2 Useful properties 42  
           2.2.3 Relative error due to rounding 43  
        2.3 Exceptions 45  
        2.4 Lost or Preserved Properties of the Arithmetic on the Real Numbers 47  
        2.5 Note on the Choice of the Radix 49  
           2.5.1 Representation errors 49  
           2.5.2 A case for radix 10 50  
        2.6 Tools for Manipulating Floating-Point Errors 52  
           2.6.1 The ulp function 52  
           2.6.2 Errors in ulps and relative errors 57  
           2.6.3 An example: iterated products 57  
           2.6.4 Unit roundoff 59  
        2.7 Note on Radix Conversion 60  
           2.7.1 Conditions on the formats 60  
           2.7.2 Conversion algorithms 63  
        2.8 The Fused Multiply-Add (FMA) Instruction 71  
        2.9 Interval Arithmetic 71  
           2.9.1 Intervals with floating-point bounds 72  
           2.9.2 Optimized rounding 72  
     3 Floating-Point Formats and Environment 74  
        3.1 The IEEE 754-1985 Standard 75  
           3.1.1 Formats specified by IEEE 754-1985 75  
           3.1.2 Little-endian, big-endian 79  
           3.1.3 Rounding modes specified by IEEE 754-1985 80  
           3.1.4 Operations specified by IEEE 754-1985 81  
           3.1.5 Exceptions specified by IEEE 754-1985 85  
           3.1.6 Special values 88  
        3.2 The IEEE 854-1987 Standard 89  
           3.2.1 Constraints internal to a format 89  
           3.2.2 Various formats and the constraints between them 90  
           3.2.3 Conversions between floating-point numbers and decimal strings 91  
           3.2.4 Rounding 92  
           3.2.5 Operations 92  
           3.2.6 Comparisons 93  
           3.2.7 Exceptions 93  
        3.3 The Need for a Revision 93  
           3.3.1 A typical problem: ``double rounding'' 94  
           3.3.2 Various ambiguities 96  
        3.4 The New IEEE 754-2008 Standard 98  
           3.4.1 Formats specified by the revised standard 99  
           3.4.2 Binary interchange format encodings 100  
           3.4.3 Decimal interchange format encodings 101  
           3.4.4 Larger formats 111  
           3.4.5 Extended and extendable precisions 111  
           3.4.6 Attributes 112  
           3.4.7 Operations specified by the standard 116  
           3.4.8 Comparisons 118  
           3.4.9 Conversions 118  
           3.4.10 Default exception handling 119  
           3.4.11 Recommended transcendental functions 122  
        3.5 Floating-Point Hardware in Current Processors 123  
           3.5.1 The common hardware denominator 123  
           3.5.2 Fused multiply-add 123  
           3.5.3 Extended precision 123  
           3.5.4 Rounding and precision control 124  
           3.5.5 SIMD instructions 125  
           3.5.6 Floating-point on x86 processors: SSE2 versus x87 125  
           3.5.7 Decimal arithmetic 126  
        3.6 Floating-Point Hardware in Recent GraphicsProcessing Units 127  
        3.7 Relations with Programming Languages 128  
           3.7.1 The Language Independent Arithmetic (LIA) standard 128  
           3.7.2 Programming languages 129  
        3.8 Checking the Environment 129  
           3.8.1 MACHAR 130  
           3.8.2 Paranoia 130  
           3.8.3 UCBTest 134  
           3.8.4 TestFloat 135  
           3.8.5 IeeeCC754 135  
           3.8.6 Miscellaneous 135  
  II Cleverly Using Floating-Point Arithmetic 136  
     4 Basic Properties and Algorithms 137  
        4.1 Testing the Computational Environment 137  
           4.1.1 Computing the radix 137  
           4.1.2 Computing the precision 139  
        4.2 Exact Operations 140  
           4.2.1 Exact addition 140  
           4.2.2 Exact multiplications and divisions 142  
        4.3 Accurate Computations of Sums of Two Numbers 143  
           4.3.1 The Fast2Sum algorithm 144  
           4.3.2 The 2Sum algorithm 147  
           4.3.3 If we do not use rounding to nearest 149  
        4.4 Computation of Products 150  
           4.4.1 Veltkamp splitting 150  
           4.4.2 Dekker's multiplication algorithm 153  
        4.5 Complex numbers 157  
           4.5.1 Various error bounds 158  
           4.5.2 Error bound for complex multiplication 159  
           4.5.3 Complex division 162  
           4.5.4 Complex square root 167  
     5 The Fused Multiply-Add Instruction 169  
        5.1 The 2MultFMA Algorithm 170  
        5.2 Computation of Residuals of Division and Square Root 171  
        5.3 Newton--Raphson-Based Division with an FMA 173  
           5.3.1 Variants of the Newton--Raphson iteration 173  
           5.3.2 Using the Newton--Raphson iteration for correctly rounded division 178  
        5.4 Newton--Raphson-Based Square Root with an FMA 185  
           5.4.1 The basic iterations 185  
           5.4.2 Using the Newton--Raphson iteration for correctly rounded square roots 186  
        5.5 Multiplication by an Arbitrary-Precision Constant 189  
           5.5.1 Checking for a given constant C if Algorithm 5.2 will always work 190  
        5.6 Evaluation of the Error of an FMA 193  
        5.7 Evaluation of Integer Powers 195  
     6 Enhanced Floating-Point Sums, Dot Products, and Polynomial Values 198  
        6.1 Preliminaries 199  
           6.1.1 Floating-point arithmetic models 200  
           6.1.2 Notation for error analysis and classical error estimates 201  
           6.1.3 Properties for deriving validated running error bounds 204  
        6.2 Computing Validated Running Error Bounds 205  
        6.3 Computing Sums More Accurately 207  
           6.3.1 Reordering the operands, and a bit more 207  
           6.3.2 Compensated sums 209  
           6.3.3 Implementing a ``long accumulator'' 216  
           6.3.4 On the sum of three floating-point numbers 216  
        6.4 Compensated Dot Products 218  
        6.5 Compensated Polynomial Evaluation 220  
     7 Languages and Compilers 222  
        7.1 A Play with Many Actors 222  
           7.1.1 Floating-point evaluation in programming languages 223  
           7.1.2 Processors, compilers, and operating systems 225  
           7.1.3 In the hands of the programmer 226  
        7.2 Floating Point in the C Language 226  
           7.2.1 Standard C99 headers and IEEE 754-1985 support 226  
           7.2.2 Types 227  
           7.2.3 Expression evaluation 230  
           7.2.4 Code transformations 233  
           7.2.5 Enabling unsafe optimizations 234  
           7.2.6 Summary: a few horror stories 235  
        7.3 Floating-Point Arithmetic in the C++ Language 237  
           7.3.1 Semantics 237  
           7.3.2 Numeric limits 238  
           7.3.3 Overloaded functions 239  
        7.4 FORTRAN Floating Point in a Nutshell 240  
           7.4.1 Philosophy 240  
           7.4.2 IEEE 754 support in FORTRAN 243  
        7.5 Java Floating Point in a Nutshell 244  
           7.5.1 Philosophy 244  
           7.5.2 Types and classes 245  
           7.5.3 Infinities, NaNs, and signed zeros 247  
           7.5.4 Missing features 248  
           7.5.5 Reproducibility 249  
           7.5.6 The BigDecimal package 250  
        7.6 Conclusion 251  
  III Implementing Floating-Point Operators 253  
     8 Algorithms for the Five Basic Operations 254  
        8.1 Overview of Basic Operation Implementation 254  
        8.2 Implementing IEEE 754-2008 Rounding 256  
           8.2.1 Rounding a nonzero finite value with unbounded exponent range 256  
           8.2.2 Overflow 258  
           8.2.3 Underflow and subnormal results 259  
           8.2.4 The inexact exception 260  
           8.2.5 Rounding for actual operations 260  
        8.3 Floating-Point Addition and Subtraction 261  
           8.3.1 Decimal addition 264  
           8.3.2 Decimal addition using binary encoding 265  
           8.3.3 Subnormal inputs and outputs in binary addition 266  
        8.4 Floating-Point Multiplication 266  
           8.4.1 Normal case 267  
           8.4.2 Handling subnormal numbers in binary multiplication 267  
           8.4.3 Decimal specifics 268  
        8.5 Floating-Point Fused Multiply-Add 269  
           8.5.1 Case analysis for normal inputs 269  
           8.5.2 Handling subnormal inputs 273  
           8.5.3 Handling decimal cohorts 274  
           8.5.4 Overview of a binary FMA implementation 274  
        8.6 Floating-Point Division 277  
           8.6.1 Overview and special cases 277  
           8.6.2 Computing the significand quotient 278  
           8.6.3 Managing subnormal numbers 279  
           8.6.4 The inexact exception 280  
           8.6.5 Decimal specifics 280  
        8.7 Floating-Point Square Root 280  
           8.7.1 Overview and special cases 280  
           8.7.2 Computing the significand square root 281  
           8.7.3 Managing subnormal numbers 282  
           8.7.4 The inexact exception 282  
           8.7.5 Decimal specifics 282  
     9 Hardware Implementation of Floating-Point Arithmetic 283  
        9.1 Introduction and Context 283  
           9.1.1 Processor internal formats 283  
           9.1.2 Hardware handling of subnormal numbers 284  
           9.1.3 Full-custom VLSI versus reconfigurable circuits 285  
           9.1.4 Hardware decimal arithmetic 286  
           9.1.5 Pipelining 287  
        9.2 The Primitives and Their Cost 288  
           9.2.1 Integer adders 288  
           9.2.2 Digit-by-integer multiplication in hardware 294  
           9.2.3 Using nonstandard representations of numbers 294  
           9.2.4 Binary integer multiplication 295  
           9.2.5 Decimal integer multiplication 297  
           9.2.6 Shifters 298  
           9.2.7 Leading-zero counters 298  
           9.2.8 Tables and table-based methods for fixed-point function approximation 300  
        9.3 Binary Floating-Point Addition 302  
           9.3.1 Overview 302  
           9.3.2 A first dual-path architecture 303  
           9.3.3 Leading-zero anticipation 305  
           9.3.4 Probing further on floating-point adders 309  
        9.4 Binary Floating-Point Multiplication 310  
           9.4.1 Basic architecture 310  
           9.4.2 FPGA implementation 310  
           9.4.3 VLSI implementation optimized for delay 312  
           9.4.4 Managing subnormals 315  
        9.5 Binary Fused Multiply-Add 316  
           9.5.1 Classic architecture 317  
           9.5.2 To probe further 319  
        9.6 Division 319  
           9.6.1 Digit-recurrence division 320  
           9.6.2 Decimal division 323  
        9.7 Conclusion: Beyond the FPU 323  
           9.7.1 Optimization in context of standard operators 324  
           9.7.2 Operation with a constant operand 325  
           9.7.3 Block floating point 327  
           9.7.4 Specific architectures for accumulation 327  
           9.7.5 Coarser-grain operators 331  
        9.8 Probing Further 334  
     10 Software Implementation of Floating-Point Arithmetic 335  
        10.1 Implementation Context 336  
           10.1.1 Standard encoding of binary floating-point data 336  
           10.1.2 Available integer operators 337  
           10.1.3 First examples 340  
           10.1.4 Design choices and optimizations 342  
        10.2 Binary Floating-Point Addition 343  
           10.2.1 Handling special values 344  
           10.2.2 Computing the sign of the result 346  
           10.2.3 Swapping the operands and computing the alignment shift 347  
           10.2.4 Getting the correctly rounded result 349  
        10.3 Binary Floating-Point Multiplication 355  
           10.3.1 Handling special values 355  
           10.3.2 Sign and exponent computation 357  
           10.3.3 Overflow detection 359  
           10.3.4 Getting the correctly rounded result 360  
        10.4 Binary Floating-Point Division 363  
           10.4.1 Handling special values 364  
           10.4.2 Sign and exponent computation 365  
           10.4.3 Overflow detection 368  
           10.4.4 Getting the correctly rounded result 369  
        10.5 Binary Floating-Point Square Root 375  
           10.5.1 Handling special values 376  
           10.5.2 Exponent computation 378  
           10.5.3 Getting the correctly rounded result 379  
  IV Elementary Functions 387  
     11 Evaluating Floating-Point Elementary Functions 388  
        11.1 Basic Range Reduction Algorithms 392  
           11.1.1 Cody and Waite's reduction algorithm 392  
           11.1.2 Payne and Hanek's algorithm 394  
        11.2 Bounding the Relative Error of Range Reduction 395  
        11.3 More Sophisticated Range Reduction Algorithms 397  
           11.3.1 An example of range reduction for the exponential function 399  
           11.3.2 An example of range reduction for the logarithm 400  
        11.4 Polynomial or Rational Approximations 401  
           11.4.1 L2 case 402  
           11.4.2 L, or minimax case 403  
           11.4.3 ``Truncated'' approximations 405  
        11.5 Evaluating Polynomials 406  
        11.6 Correct Rounding of Elementary Functions tobinary64 407  
           11.6.1 The Table Maker's Dilemma and Ziv's onion peelingstrategy 407  
           11.6.2 When the TMD is solved 408  
           11.6.3 Rounding test 409  
           11.6.4 Accurate second step 413  
           11.6.5 Error analysis and the accuracy/performance tradeoff 414  
        11.7 Computing Error Bounds 415  
           11.7.1 The point with efficient code 415  
           11.7.2 Example: a ``double-double'' polynomial evaluation 416  
     12 Solving the Table Maker's Dilemma 418  
        12.1 Introduction 418  
           12.1.1 The Table Maker's Dilemma 419  
           12.1.2 Brief history of the TMD 423  
           12.1.3 Organization of the chapter 424  
        12.2 Preliminary Remarks on the Table Maker's Dilemma 425  
           12.2.1 Statistical arguments: what can be expected in practice 425  
           12.2.2 In some domains, there is no need to find worst cases 429  
           12.2.3 Deducing the worst cases from other functions or domains 432  
        12.3 The Table Maker's Dilemma for Algebraic Functions 433  
           12.3.1 Algebraic and transcendental numbers and functions 433  
           12.3.2 The elementary case of quotients 435  
           12.3.3 Around Liouville's theorem 437  
           12.3.4 Generating bad rounding cases for the square root using Hensel 2-adic lifting 438  
        12.4 Solving the Table Maker's Dilemma for Arbitrary Functions 442  
           12.4.1 Lindemann's theorem: application to some transcendental functions 442  
           12.4.2 A theorem of Nesterenko and Waldschmidt 443  
           12.4.3 A first method: tabulated differences 445  
           12.4.4 From the TMD to the distance between a grid and a segment 447  
           12.4.5 Linear approximation: Lefèvre's algorithm 449  
           12.4.6 The SLZ algorithm 456  
           12.4.7 Periodic functions on large arguments 461  
        12.5 Some Results 462  
           12.5.1 Worst cases for the exponential, logarithmic, trigonometric, and hyperbolic functions 462  
           12.5.2 A special case: integer powers 471  
        12.6 Current Limits and Perspectives 471  
  V Extensions 473  
     13 Formalisms for Certifying Floating-Point Algorithms 474  
        13.1 Formalizing Floating-Point Arithmetic 474  
           13.1.1 Defining floating-point numbers 475  
           13.1.2 Simplifying the definition 477  
           13.1.3 Defining rounding operators 478  
           13.1.4 Extending the set of numbers 481  
        13.2 Formalisms for Certifying Algorithms by Hand 482  
           13.2.1 Hardware units 482  
           13.2.2 Low-level algorithms 483  
           13.2.3 Advanced algorithms 484  
        13.3 Automating Proofs 485  
           13.3.1 Computing on bounds 486  
           13.3.2 Counting digits 488  
           13.3.3 Manipulating expressions 490  
           13.3.4 Handling the relative error 494  
        13.4 Using Gappa 495  
           13.4.1 Toy implementation of sine 495  
           13.4.2 Integer division on Itanium 499  
     14 Extending the Precision 503  
        14.1 Double-Words, Triple-Words… 504  
           14.1.1 Double-word arithmetic 505  
           14.1.2 Static triple-word arithmetic 508  
           14.1.3 Quad-word arithmetic 510  
        14.2 Floating-Point Expansions 513  
        14.3 Floating-Point Numbers with Batched Additional Exponent 519  
        14.4 Large Precision Relying on Processor Integers 520  
           14.4.1 Using arbitrary-precision integer arithmetic for arbitrary-precision floating-point arithmetic 522  
           14.4.2 A brief introduction to arbitrary-precision integer arithmetic 523  
  VI Perspectives and Appendix 527  
     15 Conclusion and Perspectives 528  
     16 Appendix: Number Theory Tools for F-P Arithmetic 529  
        16.1 Continued Fractions 529  
        16.2 The LLL Algorithm 532  
  Bibliography 537  
  Index 574  


nach oben


  Mehr zum Inhalt
Kapitelübersicht
Kurzinformation
Inhaltsverzeichnis
Leseprobe
Blick ins Buch
Fragen zu eBooks?

  Navigation
Belletristik / Romane
Computer
Geschichte
Kultur
Medizin / Gesundheit
Philosophie / Religion
Politik
Psychologie / Pädagogik
Ratgeber
Recht
Reise / Hobbys
Sexualität / Erotik
Technik / Wissen
Wirtschaft

  Info
Hier gelangen Sie wieder zum Online-Auftritt Ihrer Bibliothek
© 2008-2020 ciando GmbH | Impressum | Kontakt | F.A.Q. | Datenschutz