A Yacc-like, procedural macro-based parser generator for Rust supporting LR(1), LALR(1), and GLR parsing strategies.
RustyLR enables you to define context-free grammars (CFGs) directly in Rust using macros or build scripts. It constructs deterministic finite automata (DFA) at compile time, ensuring efficient and reliable parsing.
- Multiple Parsing Strategies: Supports LR(1), LALR(1), and GLR parsers.
- Procedural Macros: Define grammars using lr1! macro for compile-time parser generation.
- Build Script Integration: Generate parsers via build scripts for complex grammars with detailed error messages.
- Custom Reduce Actions: Define custom actions during reductions to build ASTs or perform computations.
- Grammar Conflict Detection: Automatically detects shift/reduce and reduce/reduce conflicts during parser generation, providing informative diagnostics to help resolve ambiguities.
Add RustyLR to your Cargo.toml
:
[dependencies]
rusty_lr = "..."
To use buildscript tools:
[build-dependencies]
rusty_lr = { version = "...", features = ["build"] }
Or you want to use executable version (optional):
cargo install rustylr
rusty_lr
is designed for use with auto-generated code,
either through lr1!
macro (default), a build script (with build
feature), or the rustylr
executable.
When using a buildscript or executable, you can get beautiful and detailed messages generated from your grammar.
Define your grammar using the lr1!
macro:
// this define `EParser` struct
// where `E` is the start symbol
lr1! {
%userdata i32; // userdata type passed to parser
%tokentype char; // token type; sequence of `tokentype` is fed to parser
%start E; // start symbol; this is the final value of parser
%eof '\0'; // eof token; this token is used to finish parsing
// ================= Token definitions =================
%token zero '0';
%token one '1';
...
%token nine '9';
%token plus '+';
%token star '*';
%token space ' ';
%left [plus star]; // reduce-first for token 'plus', 'star'
// ================= Production rules =================
Digit(char): [zero-nine]; // character set '0' to '9'
Number(i32) // production rule `Number` holds `i32` value
: space* Digit+ space* // `Number` is one or more `Digit` surrounded by zero or more spaces
{ Digit.into_iter().collect::<String>().parse().unwrap() }; // this will be the value of `Number` (i32) by this production rule
A(f32)
: A plus a2=A {
*data += 1; // access userdata by `data`
println!( "{:?} {:?} {:?}", A, plus, a2 ); // any Rust code can be written here
A + a2 // this will be the value of `A` (f32) by this production rule
}
| M
;
M(f32): M star m2=M { M * m2 }
| Number { Number as f32 } // Number is `i32`, so cast to `f32`
;
E(f32) : A ; // start symbol
}
This defines a simple arithmetic expression parser.
For complex grammars, you can use a build script to generate the parser. This will provide more detailed error messages when conflicts occur.
1. Create a grammar file (e.g., src/parser.rs
) with the following content:
// Rust code of `use` and type definitions
%% // start of grammar definition
%tokentype u8;
%start E;
%eof b'\0';
%token a b'a';
%token lparen b'(';
%token rparen b')';
E: lparen E rparen
| a;
...
2. Setup build.rs
:
// build.rs
use rusty_lr::build;
fn main() {
println!("cargo::rerun-if-changed=src/parser.rs");
let output = format!("{}/parser.rs", std::env::var("OUT_DIR").unwrap());
build::Builder::new()
.file("src/parser.rs") // path to the input file
.build(&output); // path to the output file
}
3. Include the generated source code:
include!(concat!(env!("OUT_DIR"), "/parser.rs"));
4. Use the parser in your code:
let mut parser = parser::EParser::new(); // create <StartSymbol>Parser class
let mut context = parser::EContext::new(); // create <StartSymbol>Context class
let mut userdata: i32 = 0;
for b in input.chars() {
match context.feed(&parser, b, &mut userdata) {
Ok(_) => {}
Err(e) => {
eprintln!("error: {}", e);
return;
}
}
}
println!("{:?}", context);
context.feed(&parser, 0 as char, &mut userdata).unwrap(); // feed EOF
let result:i32 = context.accept(); // get value of start 'E'
The generated code will include several structs and enums:
<Start>Parser
: A struct that holds the parser table.<Start>Context
: A struct that maintains the current state and the values associated with each symbol.<Start>State
: A type representing a single parser state and its associated table.<Start>Rule
: A type representing a single production rule.<Start>NonTerminals
: A enum representing all non-terminal symbols in the grammar (and its data).
Note that the actual definitions are bit different if you are building GLR parser.
RustyLR offers built-in support for Generalized LR (GLR) parsing, enabling it to handle ambiguous or nondeterministic grammars that traditional LR(1) or LALR(1) parsers cannot process. See GLR.md for details.
- Calculator: A calculator using
u8
as token type. - lua 5.4 syntax parser
- Bootstrap: rusty_lr syntax parser is written in rusty_lr itself.
build
: Enable build script tools.fxhash
: Use FXHashMap instead ofstd::collections::HashMap
for parser tables.tree
: Enable automatic syntax tree construction (For debugging purposes).error
: Enable detailed parsing error messages (For debugging purposes).
RustyLR's grammar syntax is inspired by traditional Yacc/Bison formats. See SYNTAX.md for details of grammar-definition syntax.
- Any contribution is welcome.
- Please feel free to open an issue or pull request.
Either of
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
It is highly recommended to use buildscipt tools or executable instead of procedural macros, to generate readable error messages.