Compser is a parser builder library for Ruby inspired by elm-parser. Take a look at the JSON parser and Calculator to get a glimpse of the syntax and building blocks you can use to build complex parsers.
Add the following line to your Gemfile:
gem 'compser', '~> 0.2'
More details at https://rubygems.org/gems/compser.
The following result is a benchark of a JSON parser implemented with Compser compared against Parsby, Parslet and JSON.parse
.
The benchmark parses a 1,5kb payload 100 times.
Implementation | Time | Compared to JSON.parse |
---|---|---|
JSON.parse |
0.00067s | - |
Compser::Json with YJIT |
0.216s | 322x slower |
Compser::Json |
0.268s | 400x slower |
Parslet::MyJson with YJIT |
0.43s | 641x slower |
Parslet::MyJson |
0.57s | 850x slower |
Parsby::Example::JsonParser with YJIT |
24.19s | 36100x slower |
Parsby::Example::JsonParser |
27.22s | 40626x slower |
Compser is far behind JSON.parse
with the native C implementation, but results seem pretty good compared to other Ruby implementations.
Discard results produced by the parser.
parser = drop(:token, '[').take(:integer).drop(:token, ']')
parser.parse('[150]') # => Good<150>
parser.parse('[0]') # => Good<0>
parser.parse('1234') # => Bad<...>
parser.parse('[]') # => Bad<...>
parser.parse('[900') # => Bad<...>
Parse integers.
parser = take(:integer)
parser.parse('1') # => Good<1>
parser.parse('1234') # => Good<1234>
parser.parse('-500') # => Bad<...>
parser.parse('1.34') # => Bad<...>
parser.parse('1e31') # => Bad<...>
parser.parse('123a') # => Bad<...>
parser.parse('0x1A') # => Bad<...>
# support negative integers with '-' prefix
def my_integer
take(:one_of, [
map(->(x) { x * -1 }).drop(:token, '-').take(:integer),
take(:integer)
])
end
Parse floating points as BigDecimal.
parser = take(:decimal)
parser.parse('0.00009') # => Good<0.00009>
parser.parse('-0.00009') # => Bad<...>
parser.parse('bad') # => Bad<...>
parser.parse('1e31') # => Bad<...>
parser.parse('123a') # => Bad<...>
Parses the token from source.
parser = take(:token, 'module')
parser.parse('module') # => Good<'module'>
parser.parse('modules') # => Good<'module'> or use `keyword` if you want a failure in this case.
parser.parse('modu') # => Bad<...>
parser.parse('Module') # => Bad<...>
Parses the keyword from source. Similar to token
, but the next character after the keyword must be a space, symbol or number.
parser = take(:keyword, 'let')
parser.parse('let') # => Good<'let'>
parser.parse('letter') # => Bad<...>
parser.parse('Let') # => Bad<...>
parser.parse('le') # => Bad<...>
Parses a string between double quotes. Line breaks and tabs are supported.
parser = take(:double_quoted_string)
parser.parse('"Hello, world!"') # => Good<'Hello, world!'>
parser.parse('"line1\nline2"') # => Good<'line1\\nline2'>
parser.parse('"Hello, \\"world\\"!"') # => Good<'Hello, "world"!'>
parser.parse('foo') # => Bad<...>
parser.parse('foo "bar"') # => Bad<...>
parser.parse('"foo') # => Bad<...>
Calls the map function with the taken values in the current pipeline if it succeeds. The output from map becomes the output of the parser, that is, any parser with a map can be chained into other parsers.
Important: The arity of the map function should be equal to the amount of taken values in the pipeline.
Sum = ->(a, b) { a + b }
parser = map(Sum).take(:integer).drop(:token, '+').take(:integer)
parser.parse('1+1') # => Good<2>
parser.parse('1+') # => Bad<...>
Attempts to parse each branch in the order they appear in the list. If all branches fail then the parser fails.
Important: one_of
will fail on the current branch it had a partial success before failing. The branch has to fail
early without chomping any character from source .
parser = take(:one_of, [ take(:integer), take(:double_quoted_string) ])
parser.parse('2023') # => Good<2023>
parser.parse('"Hello, world!"') # => Good<'Hello, world!'>
parser.parse('true') # => Bad<...>
Iterates over the parser until done
is called. We don't know in advance how many values are gonna be taken,
so the map
call should use single splat operator to receive a list with all values taken in the loop.
ToList = ->(*integers) { integers }
CommaSeparatedInteger = ->(continue, done) do
take(:integer)
.drop(:spaces)
.take(:one_of, [
drop(:token, ',').drop(:spaces).and_then(continue),
done
])
end
parser = map(ToList).take(:sequence, CommaSeparatedInteger)
parser.parse('12, 23, 34') # => Good<[12, 23, 34]>
parser.parse('123') # => Good<[123]>
parser.parse('12,') # => Bad<...>
parser.parse(',12') # => Bad<...>
Wraps a parser in a lazy-evaluated proc. Use lazy
to build recursive parsers.
ToList = ->(*integers) { integers }
CommaSeparatedInteger = -> do
take(:integer)
.drop(:spaces)
.take(:one_of, [
drop(:token, ',').drop(:spaces).take(:lazy, CommaSeparatedInteger),
succeed
])
end
parser = map(ToList).take(CommaSeparatedInteger.call())
parser.parse('12, 23, 34') # => Good<[12, 23, 34]>
parser.parse('123') # => Good<[123]>
parser.parse('12,') # => Bad<...>
parser.parse(',12') # => Bad<...>
Chompes zero or more blankspaces, line breaks and tabs. Always succeeds.
Compser::State.new(' \nfoo').tap { take(:spaces).call(_1) } # => State<good?: true, offset: 5, chomped: ' \n'>
Chomps a single character from source if predicate returns true. Otherwise, a bad result is pushed to state.
parser = take(:chomp_if, ->(ch) { ch == 'a' })
Compser::State.new('aaabb').tap { parser.call(_1) } # => State<good?: true, offset: 1, chomped: 'a'>
Compser::State.new('cccdd').tap { parser.call(_1) } # => State<good?: false, offset: 0, chomped: ''>
Chomps characters from source as long as predicate returns true. This parser always succeeds even if predicate returns false for the first character. It is a zero-or-more loop.
parser = take(:chomp_while, ->(ch) { ch == 'a' })
Compser::State.new('aaabb').tap { parser.call(_1) } # => State<good?: true, offset: 3, chomped: 'aaa'>
Compser::State.new('cccdd').tap { parser.call(_1) } # => State<good?: true, offset: 0, chomped: ''>