Skip to content

Allow GCI.escapeHTML to take a custom escape table #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

byroot
Copy link
Member

@byroot byroot commented Feb 26, 2025

In some cases you may want to escape a string in a different way than the default behavior.

For instance, if you are trying to make some JSON save to include in a <script> tag, you may want to escape less, and using JavaScript codepoints:

>> CGI.escapeHTML('Hello </script>', ">" => '\u003e', "<" => '\u003c', "&" => '\u0026')
=> "Hello \\u003c/script\\u003e"

Of course you can always use gsub for that, but CGI.escapeHTML being specialized is able to be very significantly faster:

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                gsub    82.135k i/100ms
          escapeHTML   221.405k i/100ms
Calculating -------------------------------------
                gsub    821.890k (± 2.2%) i/s    (1.22 μs/i) -      4.189M in   5.099152s
          escapeHTML      2.330M (± 0.5%) i/s  (429.22 ns/i) -     11.734M in   5.036770s

Comparison:
          escapeHTML:  2329816.5 i/s
                gsub:   821889.7 i/s - 2.83x  slower

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                gsub    36.235k i/100ms
          escapeHTML   171.347k i/100ms
Calculating -------------------------------------
                gsub    359.528k (± 1.5%) i/s    (2.78 μs/i) -      1.812M in   5.040422s
          escapeHTML      1.812M (± 0.7%) i/s  (551.84 ns/i) -      9.081M in   5.011762s

Comparison:
          escapeHTML:  1812105.3 i/s
                gsub:   359527.5 i/s - 5.04x  slower
require "benchmark/ips"
require "cgi"

ESCAPE = {
  ">" => '\u003e', "<" => '\u003c', "&" => '\u0026',
}

ESCAPE_B = {
  ">".b => '\u003e'.b,
  "<".b => '\u003c'.b,
  "&".b => '\u0026'.b,
}
ESCAPE_REGEX = Regexp.union(ESCAPE_B.keys)

str = ("a" * 1024).freeze
Benchmark.ips do |x|
  x.report("gsub") do
    b = str.b
    b.gsub!(ESCAPE_REGEX, ESCAPE_B)
    b.force_encoding(str.encoding)
  end

  x.report("escapeHTML") do
    CGI.escapeHTML(str, ESCAPE)
  end

  x.compare!
end

str = (("a" * 1023) + "<").freeze
Benchmark.ips do |x|
  x.report("gsub") do
    b = str.b
    b.gsub!(ESCAPE_REGEX, ESCAPE_B)
    b.force_encoding(str.encoding)
  end

  x.report("escapeHTML") do
    CGI.escapeHTML(str, ESCAPE)
  end

  x.compare!
end

NB: I haven't implemented the Java version, but can do it if there is interest in this feature.

In some cases you may want to escape a string in a different way than
the default behavior.

For instance, if you are trying to make some JSON save to include
in a `<script>` tag, you may want to escape less, and using JavaScript
codepoints:

```ruby
>> CGI.escapeHTML('Hello </script>', ">" => '\u003e', "<" => '\u003c', "&" => '\u0026')
=> "Hello \\u003c/script\\u003e"
```

Of course you can always use `gsub` for that, but `CGI.escapeHTML` being
specialized is able to be very significantly faster:

```
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                gsub    82.135k i/100ms
          escapeHTML   221.405k i/100ms
Calculating -------------------------------------
                gsub    821.890k (± 2.2%) i/s    (1.22 μs/i) -      4.189M in   5.099152s
          escapeHTML      2.330M (± 0.5%) i/s  (429.22 ns/i) -     11.734M in   5.036770s

Comparison:
          escapeHTML:  2329816.5 i/s
                gsub:   821889.7 i/s - 2.83x  slower

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                gsub    36.235k i/100ms
          escapeHTML   171.347k i/100ms
Calculating -------------------------------------
                gsub    359.528k (± 1.5%) i/s    (2.78 μs/i) -      1.812M in   5.040422s
          escapeHTML      1.812M (± 0.7%) i/s  (551.84 ns/i) -      9.081M in   5.011762s

Comparison:
          escapeHTML:  1812105.3 i/s
                gsub:   359527.5 i/s - 5.04x  slower
```

```ruby

require "benchmark/ips"
require "cgi"

ESCAPE = {
  ">" => '\u003e', "<" => '\u003c', "&" => '\u0026',
}

ESCAPE_B = {
  ">".b => '\u003e'.b,
  "<".b => '\u003c'.b,
  "&".b => '\u0026'.b,
}
ESCAPE_REGEX = Regexp.union(ESCAPE_B.keys)

str = ("a" * 1024).freeze
Benchmark.ips do |x|
  x.report("gsub") do
    b = str.b
    b.gsub!(ESCAPE_REGEX, ESCAPE_B)
    b.force_encoding(str.encoding)
  end

  x.report("escapeHTML") do
    CGI.escapeHTML(str, ESCAPE)
  end

  x.compare!
end

str = (("a" * 1023) + "<").freeze
Benchmark.ips do |x|
  x.report("gsub") do
    b = str.b
    b.gsub!(ESCAPE_REGEX, ESCAPE_B)
    b.force_encoding(str.encoding)
  end

  x.report("escapeHTML") do
    CGI.escapeHTML(str, ESCAPE)
  end

  x.compare!
end
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant