Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to base 10 is significantly slower than similar libraries #315

Open
HKalbasi opened this issue Dec 13, 2024 · 3 comments
Open

Convert to base 10 is significantly slower than similar libraries #315

HKalbasi opened this issue Dec 13, 2024 · 3 comments

Comments

@HKalbasi
Copy link

Other libraries like GMP use a divide and conquer algorithm which find a power of 10 with size near half of the digits of the number, use that as the base and convert the original number to a 2 digit number in that base, convert each digit recursively to base 10 and finally concat the results. Assuming that we have a O(nlogn) multiplication routine, this algorithm would be O(nlog^2 n) and would be faster than the current O(n sqrt n) algorithm.

@cuviper
Copy link
Member

cuviper commented Dec 13, 2024

The current algorithm came in #216, and I mentioned then:

I honestly expect there's still room for improvement,

Help is welcome!

@HKalbasi
Copy link
Author

I'm interested in helping, num-bigint is the default choice for arbitrary precision integers in Rust so I think it is important to make it fast.

The first step is to merge #282 or something like that since the above algorithm requires a fast multiplication algorithm.

@vxgmichel
Copy link

vxgmichel commented Mar 5, 2025

EDIT: I should have started by reading the work by @HKalbasi on PR #316, it addresses pretty much everything I mentioned in my comment 😅


I ran into this issue as well. Here are my findings, for reference.

I'm converting 2^2^23 to a string using the following code in rust:

use num_bigint::BigUint;

fn main() {
    (BigUint::from(1u32) << (1u32 << 23)).to_string();
}

Here's the corresponding timing, and the comparison with python 3.13:

$ time ./target/release/bigint
8,91s user 0,01s system 99% cpu 8,923 total
$ time PYTHONINTMAXSTRDIGITS=0 python -c "str(1 << (1 << 23))"
0,27s user 0,01s system 99% cpu 0,287 total

Interestingly enough, the python conversion is implemented in pure python:
https://github.com/python/cpython/blob/2905690a91bf72cdf0fb919b5193849bb67732e2/Lib/_pylong.py#L178-L179

It does leverage the power of the decimal module though. Here is the simplified code, which performs identically to the previous benchmark (i.e about 300 ms):

import decimal
import functools


def convert(n: int) -> str:

    @functools.cache
    def power10(k: int) -> int:
        return decimal.Decimal(2) ** k

    def inner(n, w):
        if w <= 200:
            return decimal.Decimal(n)
        q, r = divmod(w, 2)
        w1, w2 = q + r, q
        hi = n >> w2
        lo = n & ((1 << w2) - 1)
        return inner(lo, w2) + inner(hi, w1) * power10(w2)

    unbounded_dec_context = decimal.getcontext().copy()
    unbounded_dec_context.prec = decimal.MAX_PREC
    unbounded_dec_context.Emax = decimal.MAX_EMAX
    with decimal.localcontext(unbounded_dec_context):
        return str(inner(n, n.bit_length()))


if __name__ == "__main__":
    x = 1 << (1 << 23)
    s = convert(x)
    assert s == str(x)

However it also provides a fallback implementation that doesn't rely on the decimal module:
https://github.com/python/cpython/blob/2905690a91bf72cdf0fb919b5193849bb67732e2/Lib/_pylong.py#L187-L190

Here is the simplified corresponding code:

import math
import functools


def convert(n: int) -> str:

    @functools.cache
    def power10(k: int) -> int:
        return 10**k

    def inner(n, w):
        if w <= 1000:
            return str(n)
        q, r = divmod(w, 2)
        w1, w2 = q + r, q
        hi, lo = divmod(n, power10(w2))
        return inner(hi, w1) + inner(lo, w2).zfill(w2)

    w = int(n.bit_length() * math.log10(2) + 1)
    s = inner(n, w)
    if s[0] == "0" and n:
        s = s.lstrip("0")
    return s


if __name__ == "__main__":
    x = 1 << (1 << 23)
    s = convert(x)
    assert s == str(x)

It is much slower, but still faster than the num-bigint implementation (about 3 seconds, vs 9 seconds):

$ time PYTHONINTMAXSTRDIGITS=0 python -O test.py
3,06s user 0,02s system 99% cpu 3,102 total

I ported it to rust with the following implementation:

use num_bigint::BigUint;
use num_integer::Integer;
use std::collections::HashMap;

fn bigint_to_string(n: &BigUint) -> String {
    let mut cache: HashMap<u32, BigUint> = HashMap::new();

    fn inner(n: &BigUint, w: u32, cache: &mut HashMap<u32, BigUint>) -> String {
        if w <= 1000 {
            return n.to_string();
        }
        let q = w / 2;
        let w1 = q + w % 2;
        let w2 = q;
        let power10 = cache
            .entry(w2)
            .or_insert_with_key(|k| BigUint::from(10u32).pow(*k));
        let (hi, lo) = n.div_rem(power10);
        let hi_str = inner(&hi, w1, cache);
        let lo_str = inner(&lo, w2, cache);
        format!(
            "{}{}{}",
            hi_str,
            "0".repeat(w2 as usize - lo_str.len()),
            lo_str
        )
    }

    let w = (n.bits() as f64 * 2f64.log10() + 1f64) as u32;
    let s = inner(n, w, &mut cache);
    if s.starts_with('0') && n != &BigUint::from(0u32) {
        s.trim_start_matches('0').to_string()
    } else {
        s
    }
}

fn main() {
    let x = BigUint::from(1u32) << (1u32 << 23);
    bigint_to_string(&x);
    // assert_eq!(bigint_to_string(&x), x.to_string());
}

However it does not perform better than the num-bigint implementation:

$ time ./target/release/bigint                        
8,91s user 0,01s system 99% cpu 8,929 total

I suspect that this is because the n.div_rem(power10) operation is slower than the corresponding python implementation. I might open another issue about this.

EDIT: I created the corresponding issue, see #323.
I also used the rust int_divmod from #323 (comment) to improve on bigint_to_string, see this gist.

This new implementation is now 10 times faster than the current to_string() implementation:

$ time target/release/bigint
0,82s user 0,01s system 99% cpu 0,838 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants