-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert to base 10 is significantly slower than similar libraries #315
Comments
The current algorithm came in #216, and I mentioned then:
Help is welcome! |
I'm interested in helping, The first step is to merge #282 or something like that since the above algorithm requires a fast multiplication algorithm. |
EDIT: I should have started by reading the work by @HKalbasi on PR #316, it addresses pretty much everything I mentioned in my comment 😅 I ran into this issue as well. Here are my findings, for reference. I'm converting use num_bigint::BigUint;
fn main() {
(BigUint::from(1u32) << (1u32 << 23)).to_string();
} Here's the corresponding timing, and the comparison with python 3.13: $ time ./target/release/bigint
8,91s user 0,01s system 99% cpu 8,923 total
$ time PYTHONINTMAXSTRDIGITS=0 python -c "str(1 << (1 << 23))"
0,27s user 0,01s system 99% cpu 0,287 total Interestingly enough, the python conversion is implemented in pure python: It does leverage the power of the decimal module though. Here is the simplified code, which performs identically to the previous benchmark (i.e about 300 ms): import decimal
import functools
def convert(n: int) -> str:
@functools.cache
def power10(k: int) -> int:
return decimal.Decimal(2) ** k
def inner(n, w):
if w <= 200:
return decimal.Decimal(n)
q, r = divmod(w, 2)
w1, w2 = q + r, q
hi = n >> w2
lo = n & ((1 << w2) - 1)
return inner(lo, w2) + inner(hi, w1) * power10(w2)
unbounded_dec_context = decimal.getcontext().copy()
unbounded_dec_context.prec = decimal.MAX_PREC
unbounded_dec_context.Emax = decimal.MAX_EMAX
with decimal.localcontext(unbounded_dec_context):
return str(inner(n, n.bit_length()))
if __name__ == "__main__":
x = 1 << (1 << 23)
s = convert(x)
assert s == str(x) However it also provides a fallback implementation that doesn't rely on the Here is the simplified corresponding code: import math
import functools
def convert(n: int) -> str:
@functools.cache
def power10(k: int) -> int:
return 10**k
def inner(n, w):
if w <= 1000:
return str(n)
q, r = divmod(w, 2)
w1, w2 = q + r, q
hi, lo = divmod(n, power10(w2))
return inner(hi, w1) + inner(lo, w2).zfill(w2)
w = int(n.bit_length() * math.log10(2) + 1)
s = inner(n, w)
if s[0] == "0" and n:
s = s.lstrip("0")
return s
if __name__ == "__main__":
x = 1 << (1 << 23)
s = convert(x)
assert s == str(x) It is much slower, but still faster than the $ time PYTHONINTMAXSTRDIGITS=0 python -O test.py
3,06s user 0,02s system 99% cpu 3,102 total I ported it to rust with the following implementation: use num_bigint::BigUint;
use num_integer::Integer;
use std::collections::HashMap;
fn bigint_to_string(n: &BigUint) -> String {
let mut cache: HashMap<u32, BigUint> = HashMap::new();
fn inner(n: &BigUint, w: u32, cache: &mut HashMap<u32, BigUint>) -> String {
if w <= 1000 {
return n.to_string();
}
let q = w / 2;
let w1 = q + w % 2;
let w2 = q;
let power10 = cache
.entry(w2)
.or_insert_with_key(|k| BigUint::from(10u32).pow(*k));
let (hi, lo) = n.div_rem(power10);
let hi_str = inner(&hi, w1, cache);
let lo_str = inner(&lo, w2, cache);
format!(
"{}{}{}",
hi_str,
"0".repeat(w2 as usize - lo_str.len()),
lo_str
)
}
let w = (n.bits() as f64 * 2f64.log10() + 1f64) as u32;
let s = inner(n, w, &mut cache);
if s.starts_with('0') && n != &BigUint::from(0u32) {
s.trim_start_matches('0').to_string()
} else {
s
}
}
fn main() {
let x = BigUint::from(1u32) << (1u32 << 23);
bigint_to_string(&x);
// assert_eq!(bigint_to_string(&x), x.to_string());
} However it does not perform better than the $ time ./target/release/bigint
8,91s user 0,01s system 99% cpu 8,929 total I suspect that this is because the EDIT: I created the corresponding issue, see #323. This new implementation is now 10 times faster than the current $ time target/release/bigint
0,82s user 0,01s system 99% cpu 0,838 total |
Other libraries like GMP use a divide and conquer algorithm which find a power of 10 with size near half of the digits of the number, use that as the base and convert the original number to a 2 digit number in that base, convert each digit recursively to base 10 and finally concat the results. Assuming that we have a
O(nlogn)
multiplication routine, this algorithm would beO(nlog^2 n)
and would be faster than the currentO(n sqrt n)
algorithm.The text was updated successfully, but these errors were encountered: