Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: google/re2j
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: re2j-1.7
Choose a base ref
...
head repository: google/re2j
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
  • 7 commits
  • 16 files changed
  • 5 contributors

Commits on Jun 30, 2022

  1. Configuration menu
    Copy the full SHA
    3e685d9 View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2022

  1. Add fast path for ASCII case folding

    One of our production services uses re2j to match several hundred mostly
    case-insensitive patterns of varying complexity against text.
    We observed that approximately 12% of CPU time was being spent in
    toLowerCase() as called from simpleFold(), due to the necessity of doing
    at least one character data lookup per Inst.Rune in the common case that
    the input rune being examined did not match the instruction.
    
    As a fix, implement a method equalsIgnoreCase() that performs
    Unicode-aware case-insensitive comparison between two runes, with a fast
    path for the common case where both input runes are ASCII, and use it in
    Inst for single-rune case-insensitive comparison. This takes character
    data lookups out of the hot path.
    
    The existing re2j benchmarks did not exercise case-insensitive patterns,
    so add a new benchmark that executes a mostly ASCII regex pattern on a
    text containing a mix of ASCII and Unicode characters (generated using
    a Hungarian "lorem ipsum" text generator).
    
    Also add unit tests for the new equality comparison logic.
    
    Signed-off-by: Máté Szabó <[email protected]>
    mszabo-wikia authored and sjamesr committed Jul 10, 2022
    Configuration menu
    Copy the full SHA
    dc7d6e5 View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2023

  1. Configuration menu
    Copy the full SHA
    e3c736d View commit details
    Browse the repository at this point in the history

Commits on Aug 21, 2023

  1. Add support for (?<name>expr).

    This follows google/re2@6148386 (and
    golang/go@ee61186) to some extent.
    junyer authored and sjamesr committed Aug 21, 2023
    Configuration menu
    Copy the full SHA
    9b3f052 View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2023

  1. Reduce the incidence of infinite loops while case folding

    dc7d6e5 unfortunately increases the
    incidence of infinite loops during case folding if re2j is running on a
    JVM newer than the version used to generate the bundled
    UnicodeTables.java and the input contains a rune that would require
    special case folding rules to form a closed fold loop. \u1C80 (Cyrillic
    Small Letter Rounded Ve) is an example of such a rune.
    
    Workaround the issue by inverting the order of parameters passed to
    equalsIgnoreCase() so that the rune from the pattern being matched,
    rather than the input content, undergoes case folding instead. This does
    not fully eliminate the possibility of an infinite loop in this
    scenario, since the pattern may well contain one of the problematic
    runes, but it effectively restores the situation as it was pre
    dc7d6e5, since the previous logic also
    performed case folding on the rune from the pattern and not on the
    content.
    
    Signed-off-by: Máté Szabó <[email protected]>
    mszabo-wikia authored and sjamesr committed Aug 29, 2023
    Configuration menu
    Copy the full SHA
    10ba78d View commit details
    Browse the repository at this point in the history

Commits on Oct 24, 2023

  1. Minor test fixes to enable running internally with Blaze.

    PiperOrigin-RevId: 574899560
    herbyderby authored and sjamesr committed Oct 24, 2023
    Configuration menu
    Copy the full SHA
    7339d54 View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2023

  1. Use an explicit "UTF-8" character set argument when creating Strings …

    …from bytes. The platform default character set is guaranteed to be UTF-8.
    
    PiperOrigin-RevId: 576577338
    herbyderby authored and sjamesr committed Oct 25, 2023
    1 Configuration menu
    Copy the full SHA
    97df44e View commit details
    Browse the repository at this point in the history
Loading