Skip to content

char_indices is slower than while loop over ascii Chars, but faster for unicode #141855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hkBst opened this issue Jun 1, 2025 · 9 comments
Open
Labels
A-Unicode Area: Unicode C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@hkBst
Copy link
Member

hkBst commented Jun 1, 2025

I tried this code:

use std::ops::Range;

#[derive(Debug, PartialEq, Eq)]
pub enum EscapeError {
    /// Raw '\r' encountered.
    BareCarriageReturn,
    /// Raw '\r' encountered in raw string.
    BareCarriageReturnInRawString,
}

pub fn check_raw_str_while(
    src: &str,
    mut callback: impl FnMut(Range<usize>, Result<char, EscapeError>),
) {
    let mut chars = src.chars();

    while let Some(c) = chars.next() {
        let start = src.len() - chars.as_str().len() - c.len_utf8();
        let res = match c {
            '\r' => Err(EscapeError::BareCarriageReturn),
            _ => Ok(c),
        };
        let end = src.len() - chars.as_str().len();
        callback(start..end, res);
    }
}

pub fn check_raw_str_char_indices(
    src: &str,
    mut callback: impl FnMut(Range<usize>, Result<char, EscapeError>),
) {
    src.char_indices().for_each(|(pos, c)| {
        callback(
            pos..pos + c.len_utf8(),
            if c == '\r' {
                Err(EscapeError::BareCarriageReturn)
            } else {
                Ok(c)
            },
        );
    });
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn same() {
        let s = "abcdefghijklmnopqrstuvwxyz0123456789";
        let (mut r1, mut r2) = (vec![], vec![]);
        check_raw_str_while(&s, |r, c| r1.push((r, c)));
        check_raw_str_char_indices(&s, |r, c| r2.push((r, c)));
        assert_eq!(r1, r2);
    }
}

with these benches:

#![feature(test)]

extern crate test;

use bench_char_indices::*;

use std::iter::repeat_n;
const LEN: usize = 10_000;

macro_rules! fn_bench_check_raw {
    ($name:ident, $unit:ty, $check_raw:ident) => {
        fn $name(b: &mut test::Bencher, s: &str, expected: $unit) {
            let input: String = test::black_box(repeat_n(s, LEN).collect());
            assert_eq!(input.len(), LEN * s.len());
            b.iter(|| {
                let mut output = vec![];

                $check_raw(&input, |range, res| output.push((range, res)));
                assert_eq!(output.len(), LEN);
                assert_eq!(output[0], ((0..s.len()), Ok(expected)));
            });
        }
    };
}

fn_bench_check_raw!(bench_check_raw_str_while, char, check_raw_str_while);
fn_bench_check_raw!(
    bench_check_raw_str_char_indices,
    char,
    check_raw_str_char_indices
);

#[bench]
fn bench_check_raw_str_ascii_while(b: &mut test::Bencher) {
    bench_check_raw_str_while(b, "a", 'a');
}

#[bench]
fn bench_check_raw_str_ascii_char_indices(b: &mut test::Bencher) {
    bench_check_raw_str_char_indices(b, "a", 'a');
}

I expected to see this happen: NO performance difference

Instead, there is a more than 10% difference:

test bench_check_raw_str_ascii_char_indices ... bench:      27,733.96 ns/iter (+/- 103.24)
test bench_check_raw_str_ascii_while        ... bench:      24,525.01 ns/iter (+/- 582.28)

Tested on: rustc 1.88.0-nightly (2e6882a 2025-05-05)

@hkBst hkBst added the C-bug Category: This is a bug. label Jun 1, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 1, 2025
@hkBst
Copy link
Member Author

hkBst commented Jun 1, 2025

@rustbot: label +I-slow

@hkBst
Copy link
Member Author

hkBst commented Jun 1, 2025

Remarkably, if you remove the unused second option in the EscapeError enum, namely BareCarriageReturnInRawString, then the results become even more different:

test bench_check_raw_str_ascii_char_indices ... bench:      28,308.71 ns/iter (+/- 69.40)
test bench_check_raw_str_ascii_while        ... bench:      17,260.44 ns/iter (+/- 60.75)

@solson
Copy link
Member

solson commented Jun 1, 2025

I was curious so I tried this on my machine and got opposite results:

test bench_check_raw_str_ascii_char_indices ... bench:      11,329.42 ns/iter (+/- 870.58)
test bench_check_raw_str_ascii_while        ... bench:      13,474.87 ns/iter (+/- 553.02)

(on rustc 1.89.0-nightly (4d08223c0 2025-05-31), slightly newer)

@hkBst
Copy link
Member Author

hkBst commented Jun 1, 2025

@solson Interesting! I'm running on Linux x86_64. You?

In fact adding unicode benches I get the opposite as well:

#[bench]
fn bench_check_raw_str_unicode_while(b: &mut test::Bencher) {
    bench_check_raw_str_while(b, "🦀", '🦀');
}

#[bench]
fn bench_check_raw_str_unicode_char_indices(b: &mut test::Bencher) {
    bench_check_raw_str_char_indices(b, "🦀", '🦀');
}
test bench_check_raw_str_ascii_char_indices   ... bench:      27,720.16 ns/iter (+/- 128.21)
test bench_check_raw_str_ascii_while          ... bench:      23,548.37 ns/iter (+/- 898.86)
test bench_check_raw_str_unicode_char_indices ... bench:      35,898.00 ns/iter (+/- 509.55)
test bench_check_raw_str_unicode_while        ... bench:      39,119.22 ns/iter (+/- 134.90)

@solson
Copy link
Member

solson commented Jun 1, 2025

Yep, Linux x86_64 here too.

test bench_check_raw_str_ascii_char_indices   ... bench:      11,355.42 ns/iter (+/- 107.96)
test bench_check_raw_str_ascii_while          ... bench:      13,397.35 ns/iter (+/- 87.63)
test bench_check_raw_str_unicode_char_indices ... bench:      24,234.65 ns/iter (+/- 311.14)
test bench_check_raw_str_unicode_while        ... bench:      24,189.68 ns/iter (+/- 214.49)

@hkBst
Copy link
Member Author

hkBst commented Jun 1, 2025

After updating to the new nightly (1.89.0-nightly (4d08223 2025-05-31)), I indeed get different results:

test bench_check_raw_str_ascii_char_indices   ... bench:      27,876.82 ns/iter (+/- 60.59)
test bench_check_raw_str_ascii_while          ... bench:      29,549.55 ns/iter (+/- 1,212.18)
test bench_check_raw_str_unicode_char_indices ... bench:      36,247.43 ns/iter (+/- 559.77)
test bench_check_raw_str_unicode_while        ... bench:      41,801.26 ns/iter (+/- 736.20)

mostly because the while variant has become slower in this version???

@hkBst
Copy link
Member Author

hkBst commented Jun 4, 2025

On rustc 1.89.0-nightly (59aa1e8 2025-06-03) results are back to "normal":

test bench_check_raw_str_ascii_char_indices   ... bench:      27,674.92 ns/iter (+/- 148.05)
test bench_check_raw_str_ascii_while          ... bench:      23,855.94 ns/iter (+/- 627.58)
test bench_check_raw_str_unicode_char_indices ... bench:      36,272.82 ns/iter (+/- 1,479.99)
test bench_check_raw_str_unicode_while        ... bench:      39,030.70 ns/iter (+/- 119.96)

@hkBst hkBst changed the title char_indices is slower than manual while loop char_indices is slower than while loop over ascii Chars, but faster for unicode Jun 4, 2025
@hkBst
Copy link
Member Author

hkBst commented Jun 4, 2025

Now if I add strings which alternate between ascii and non-ascii, and change the benchmark code to accomodate that, like so:

#![feature(test)]

extern crate test;

use bench_char_indices::*;

use std::iter::repeat_n;
const LEN: usize = 10_000;

macro_rules! fn_bench_check_raw {
    ($name:ident, $unit:ty, $check_raw:ident) => {
        fn $name(b: &mut test::Bencher, s: &str, expected: Vec<$unit>) {
            let input: String = test::black_box(repeat_n(s, LEN).collect());
            assert_eq!(input.len(), LEN * s.len());
            b.iter(|| {
                let mut output = vec![];

                $check_raw(&input, |range, res| output.push((range, res)));
                assert_eq!(output.len(), LEN * s.chars().count());
                for ((i, &e), (p, c)) in expected.iter().enumerate().zip(s.char_indices()) {
                    assert_eq!(output[i], ((p..p + c.len_utf8()), Ok(e)));
                }
            });
        }
    };
}

fn_bench_check_raw!(bench_check_raw_str_while, char, check_raw_str_while);
fn_bench_check_raw!(
    bench_check_raw_str_char_indices,
    char,
    check_raw_str_char_indices
);

#[bench]
fn bench_check_raw_str_ascii_while(b: &mut test::Bencher) {
    bench_check_raw_str_while(b, "a", repeat_n('a', LEN).collect());
}

#[bench]
fn bench_check_raw_str_ascii_char_indices(b: &mut test::Bencher) {
    bench_check_raw_str_char_indices(b, "a", repeat_n('a', LEN).collect());
}

#[bench]
fn bench_check_raw_str_unicode_while(b: &mut test::Bencher) {
    bench_check_raw_str_while(b, "🦀", repeat_n('🦀', LEN).collect());
}

#[bench]
fn bench_check_raw_str_unicode_char_indices(b: &mut test::Bencher) {
    bench_check_raw_str_char_indices(b, "🦀", repeat_n('🦀', LEN).collect());
}

#[bench]
fn bench_check_raw_str_mixed_while(b: &mut test::Bencher) {
    bench_check_raw_str_while(
        b,
        "a🦀",
        (0..2 * LEN)
            .map(|i| if i % 2 == 0 { 'a' } else { '🦀' })
            .collect(),
    );
}

#[bench]
fn bench_check_raw_str_mixed_char_indices(b: &mut test::Bencher) {
    bench_check_raw_str_char_indices(
        b,
        "a🦀",
        (0..2 * LEN)
            .map(|i| if i % 2 == 0 { 'a' } else { '🦀' })
            .collect(),
    );
}

then I get these puzzling results:

test bench_check_raw_str_ascii_char_indices   ... bench:      22,114.24 ns/iter (+/- 1,117.53)
test bench_check_raw_str_ascii_while          ... bench:      26,663.63 ns/iter (+/- 118.35)
test bench_check_raw_str_mixed_char_indices   ... bench:      57,742.56 ns/iter (+/- 529.01)
test bench_check_raw_str_mixed_while          ... bench:      58,221.94 ns/iter (+/- 568.38)
test bench_check_raw_str_unicode_char_indices ... bench:      37,808.79 ns/iter (+/- 259.83)
test bench_check_raw_str_unicode_while        ... bench:      37,642.51 ns/iter (+/- 701.60)

Note how now bench_check_raw_str_ascii_char_indices is suddenly much faster than bench_check_raw_str_ascii_while, seemingly exchanging timings with it...

@lolbinarycat lolbinarycat added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 5, 2025
@lolbinarycat
Copy link
Contributor

I'd be curious to compare the length of the generated assembly, as longer asm with less jumps tends to perform better in microbenchmarks but suffers in real application code because of icache congestion.

@lolbinarycat lolbinarycat added A-Unicode Area: Unicode C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Unicode Area: Unicode C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants