- unsigned
- two's complement
- floating-point
- virtual address space
- a conceptual image presented to the machine-level program
- consists of:
- DRAM
- flash memory
- disk storage
- special hardware
- OS software
float
- single precisiondouble
- double precisionsizeof(T)
- number of bytes required to store object of typeT
void inplace_swap(int* x, int* y) {
*y = *x ^ *y;
*x = *x ^ *y;
*y = *x ^ *y;
}
- treat any nonzero argument as representing
true
and argument0
asfalse
- right shift
- logical - fills the left end with
k
zeros - arithmetic - fills the left end with
k
repetitions of the most significant bit- used by almost all compiler/machine combos for signed data
- logical - fills the left end with
- For two's complement,
~x + 1 = -x
- Bit pattern of form
[0,...0,1,...1]
withw - k
zeros followed byk
ones- can be represented as
(1 << k) - 1
- can be represented as
- format macros
PRId32
- format conversion specifier to output a signed decimal integer value of typestd::int32_t
PRIu64
- format conversion specifier to output an unsigned decimal integer value of typestd::uint64_t
// these two are equivalent
printf("x = %" PRId32 ", y = %" PRIu64 "\n", x, y);
printf("x = %d, y = %lu\n", x, y);
-
<limits.h>
INT_MAX
INT_MIN
UINT_MAX
-
From Tow's complement to unsigned:
-
$T2U_w(x) = x + 2^w$ ifx < 0
-
$T2U_w(x) = x$ ifx >= 0
-
-
From Unsigned to Two's complement:
-
$U2T_w(u) = u$ ifu <= TMax_w
-
$U2T_w(u) = u - 2^w$ ifu > TMax_w
-
-
when operation between an unsigned and a signed, C implicitly casts the signed argument to unsigned
- then assumes both operands are nonnegative
/* prototype */ size_t strlen(const char* s); /* BUGGY!!! */ /** This uses unsigned arithmetic; when s is shorter than t, strlen(s) - strlen(t) should be negative, but instead it will result in a large, nonnegative number; To fix this: `return strlen(s) > strlen(t);` */ int strlonger(char *s, char*t) { return strlen(s) - strlen(t) > 0; }
- one way to avoid this ^^^, NEVER use unsigned numbers
-
in
<limits.h>
,#define INT_MAX 2147483647 #define INT_MIN (-INT_MAX - 1)
-
relative order from one data size to another:
- first, change the size
- then, change the type
$x + y = x + y - 2^{w}$ if $2^{w-1} <= x + y$ - positive overflow
$x + y = x + y$ if $-2^{w-1} <= x + y < 2^{w-1}$
$x + y = x + y + 2^{w}$ if $x + y < -2^{w-1}$ - negative overflow
- for
x
andy
in the rangeTMin_w <= x, y <= TMax_w
, ands = x + y
s
has positive overflow if and only ifx > 0
andy > 0
buts <= 0
s
has negative overflow if and only ifx < 0
andy < 0
buts >= 0
-
TMin
's additive inverse is itself-
$TMin_w + TMin_w = -2^{w-1} - 2^{w-1} = -2^w$ , which causes negative overflow - hence
$TMin_w + TMin_w = -2^w + 2^w = 0$
-
~x = -x - 1
- Unsigned:
x' = x mod 2^k
, wherex'
andx
are signed equivalent of bit vectors of result and original
- Two's complement
x' = U2T_k(x mod 2^k)
, wherex'
andx
are signed equivalent of bit vectors of result and original
$x + y = x + y$ if $x + y < 2^{w}$
$x + y = x + y - 2^{w}$ if $2^{w} <= x + y < 2^{w+1}$
- for
x
andy
in the range0 <= x, y <= UMax_w
, ands = x + y
s
has overflow if and only ifs < x
(ors < y
)
- Truncating an unsigned number to
w
bits is === computing its value modulo$2^w$ $x * y = (x⋅y) mod 2^w$
- in C, signed multiplication performed by truncating the
$2w$ -bit product tow
bits- first, compute its value modulo
$2w$ - then, convert from unsigned to two's complement
- first, compute its value modulo
-
$x * y = U2T_w((x⋅y) mod 2^w)$ -
$U2T_w$ : unsigned to two's complement
-
/**
Determine whether two signed arguments can be multiplied without causing overflow
*/
int tmult_ok(int x, int y) {
int p = x * y;
// Either x is zero, or dividing p by x gives y
return !x || p/x == y;
}
- Historically, multiplication slower than addition/subtraction/shifting
x * 14
:(x << 3) + (x << 2) + (x << 1)
(x << 4) - (x << 1)
- Integer division is even slower than integer multiplication
- 30+ clock cycles!
- integer division always round toward zero
- Two's complement division
(x + (1 << k) - 1) >> k
yieldsx / 2^k
- adding a bias
x / 2^k
===(x < 0 ? x + (1<<k) - 1 : x) >> k
-
IEEE Floating-Point Representation
$V = (-1)^S × M × 2^E$ -
M
: significand
- most common case
-
E
- when
exp
is neither all0
s nor all1
s - the exponent field interpreted as a signed integer in biased form
- exponent value,
E
:E = e - Bias
-
e
has bit representation$e_{k-1},e_{k-2}...e_1,e_0$ - here
e
is the actual bits in the bit representation of the value
- here
-
Bias
==$2^{k-1} - 1$
- when
-
frac
,0 <= f < 1
- binary representation
$0.f_{n-1}...f_1f_0$ - the actual value it represents:
$f / 2^n$ , wheren
is number of f bits
- binary representation
-
significand:
$M = 1 + f$ - implied leading 1
- bit representation:
$1.f_{n-1}...f_1f_0$ 1 <= M < 2
- when exponent all zeros
$E = 1 - Bias$ $M = f$ - Two purposes:
- represent numeric
0
- sign bit is0
, exponent field all0
s, fraction all0
s-
-0.0
when sign bit is1
but all others are0
s
-
- numbers very close to
0.0
- gradual underflow
- represent numeric
- when exponent all
1
s - when fraction field all
0
s:+∞
when sign bit0
-∞
when sign bit1
- when fraction nonzero:
NaN
- 4 different rounding modes
- round to even (round to closest)
- round toward zero
- round down
- round up
$+∞ - ∞ = NaN$ - With single-precision,
$1e20 × (1e20 - 1e20) = 0$ $1e20 × 1e20 - 1e20 × 1e20 = NaN$
- use the round to even (closest) mode, on machines supporting IEEE
- In GCC, if:
#define _GNU_SOURCE 1 #include <math.h>
- then
INFINITY
is+∞
-
NAN
is$NaN$
- then
- when casting:
-
double
tofloat
: can overflow to+∞
or-∞
-
double
/float
toint
: round to zero; may overflow- On Intel-compatible arch, bit pattern
[1....0]
($TMin_w$ for word sizew
) - as integer indefinite
-
(int)+1e10
yields-21483648
- On Intel-compatible arch, bit pattern
-