- Floating-point arithmetic
- Arithmetic algorithms
- Linear equations solution algorithms
- Matrix diagonalization
- Wavelets
Floating-point arithmetic is arithmetic in which real numbers are represented approximately to a fixed number of significant digits and scaled using an exponent in some fixed base:
r = significand × baseexponent
.
📝
- The relative error due to rounding is uniform, i.e. it is independent of the magnitude of the number.
- The binary-based floating-point system has the smallest possible wobble (a range of relative errors).
🔗
- Floating-point arithmetic – Wikipedia
- C.Moler. Floating point arithmetic before IEEE 754 (2019)
📄
- J.Gustafson, I.Yonemoto. Beating floating point at its own game: Posit arithmetic
IEEE 754 is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE).
📝
- If two (non-extended) floating-point numbers in the same format are ordered, then they are ordered the same way when their bits are reinterpreted as sign-magnitude integers.
- NaNs are endowed with a field of bits into which software can record, say, how and/or where the NaN came into existence; no software exists now to exploit this feature.
🔗
- IEEE 754 – Wikipedia
- D.Goldberg. What every computer scientist should know about floating-point arithmetic (1991)
- How many unique values are there between 0 and 1 of a standard float? – Stack Overflow
- Why does IEEE 754 reserve so many NaN values? – Stack Overflow
- How many normalized numbers can be represented using IEEE-754 single precision? – Stack Overflow
📄
- W.Kahan. Lecture notes on the status of IEEE standard 754 for binary floating-point arithmetic (1997)
- C.Allison. Where did all my decimals go? (2006)
📖
- M.L.Overton. Numerical computing with IEEE floating point arithmetic – SIAM (2001)
- Sec. 2.5: Floating-point arithmetic – D.H.Eberly. GPGPU programming for games and science – CRC Press (2014)
- C.Allison. Floating-point numbers aren’t real – K.Henney. 97 things every programmer should know (2010)
🎥
- J.Farrier. Demystifying floating point – CppCon (2015)
- J.Gustafson. Beating floats at their own game – HPC Advisory Council Australia Conference (2017)
🔗
- Denormal number – Wikipedia
- C.Moler. Floating point denormals, insignificant but controversial (2014)
🎥
- D.Kohlbrenner. On subnormal floating point and abnormal timing – IEEE Symposium on Security and Privacy (2015)
To compute the arithmetic mean
μ = 1 / n ∑ xi
in a numerically stable way, use the following recurrence relation:μn = μn - 1 + 1 / n (xn - μn - 1)
.
🔗
- D.Assencio. Numerically stable computation of arithmetic means (2015)
- T.Finch. Incremental calculation of weighted mean and variance (2009)
🔗
- M.Dominus. How to calculate binomial coefficients
- M.Dominus. How to calculate binomial coefficients, again
🔗
- I.Kaplan. Integer division (1996)
Horner’s method is a polynomial evaluation method expressed by
p(x) = a0 + a1 x + a2 x2 + ... + an xn = a0 + x (a1 + x (a2 + ... + x (an) ... ))
.
🔗
- Horner’s method – Wikipedia
🔗
- Kahan summation algorithm – Wikipedia
- Kahan summation – Stack Overflow
🔗
- G.Strang. Iterative methods (2006)
📖
- Sec. 20.5: Relaxation methods for boundary value problems – W.H.Press et al. Numerical recipes: The art of scientific computing (2007)
A recurrence relation:
xk+1 = D-1 (D - A) xk + D-1 b
, where the preconditionerD
is the diagonal part ofA
:D = diag(A)
.
🔗
- Jacobi method – Wikipedia
- Jacobi method – Wolfram MathWorld
🎥
- G.Strang. Lec. 15: Iterative methods and preconditioners – MIT 18.086 Mathematical methods for engineers II (2006)
🔗
- Jacobi eigenvalue algorithm – Wikipedia
- J.Lambers. Jacobi methods – CME 335 (2010)
📖
- Sec. 11.1: Jacobi transformations of a symmetric matrix – W.H.Press et al. Numerical recipes: The art of scientific computing (2007)
- Sec. 8.5: Jacobi methods – G.H.Golub, C.F.Van Loan. Matrix computations – SIAM (2013)
- H.Rutishause. Contrib. II/1: The Jacobi method for real symmetric matrices – J.H.Wilkinson, C.Reinsch. Handbook for automatic computation. Vol. II: Linear algebra (1971)
🎥
- G.Strang. Lec. 27: Multiresolution, wavelet transform and scaling function – MIT 18.085 Computational science and engineering I (2008?)
- G.Strang. Lec. 28: Splines and orthogonal wavelets: Daubechies construction – MIT 18.085 Computational science and engineering I (2008?)
📖
- Sec. 11.1: Jacobi transformations of a symmetric matrix – W.H.Press et al. Numerical recipes: The art of scientific computing (2007)