forked from rust-bakery/nom
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
First draft of the upgrade doc for nom 4
- Loading branch information
Showing
1 changed file
with
221 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,221 @@ | ||
% Upgrading to nom 4.0 | ||
|
||
# Upgrading to nom 4.0 | ||
|
||
The nom 4.0 is a nearly complete rewrite of nom's internal structures, along with a cleanup of a lot of parser and combinators whose semantics were unclear. Upgrading from previous nom versions can require a lot of changes, especially if you have a lot of unit tests. But most of those changes are pretty straightforward. | ||
|
||
## Changes in internal structures | ||
|
||
Previous versions of nom all generated parsers with the following signature: | ||
|
||
``rust,ignore | ||
fn parser(input: I) -> IResult<I,O> { ... } | ||
``` | ||
With the following definition for `IResult`: | ||
```rust,ignore | ||
pub enum IResult<I,O,E=u32> { | ||
/// remaining input, result value | ||
Done(I,O), | ||
/// indicates the parser encountered an error. E is a custom error type you can redefine | ||
Error(Err<E>), | ||
/// Incomplete contains a Needed, an enum than can represent a known quantity of input data, or unknown | ||
Incomplete(Needed) | ||
} | ||
pub enum Needed { | ||
/// needs more data, but we do not know how much | ||
Unknown, | ||
/// contains the required total data size | ||
Size(usize) | ||
} | ||
// if the "verbose-errors" feature is active | ||
pub type Err<E=u32> = ErrorKind<E>; | ||
// if the "verbose-errors" feature is active | ||
pub enum Err<P,E=u32>{ | ||
/// An error code, represented by an ErrorKind, which can contain a custom error code represented by E | ||
Code(ErrorKind<E>), | ||
/// An error code, and the next error | ||
Node(ErrorKind<E>, Vec<Err<P,E>>), | ||
/// An error code, and the input position | ||
Position(ErrorKind<E>, P), | ||
/// An error code, the input position and the next error | ||
NodePosition(ErrorKind<E>, P, Vec<Err<P,E>>) | ||
} | ||
``` | ||
|
||
The new design uses the `Result` type from the standard library: | ||
|
||
````rust,ignore | ||
pub type IResult<I, O, E = u32> = Result<(I, O), Err<I, E>>; | ||
pub enum Err<I, E = u32> { | ||
/// There was not enough data | ||
Incomplete(Needed), | ||
/// The parser had an error (recoverable) | ||
Error(Context<I, E>), | ||
/// The parser had an unrecoverable error | ||
Failure(Context<I, E>), | ||
} | ||
pub enum Needed { | ||
/// needs more data, but we do not know how much | ||
Unknown, | ||
/// contains the required additional data size | ||
Size(usize) | ||
} | ||
// if the "verbose-errors" feature is active | ||
pub enum Context<I, E = u32> { | ||
Code(I, ErrorKind<E>), | ||
} | ||
// if the "verbose-errors" feature is active | ||
pub enum Context<I, E = u32> { | ||
Code(I, ErrorKind<E>), | ||
List(Vec<(I, ErrorKind<E>)>), | ||
} | ||
``` | ||
With this new design, the `Incomplete` case is now part of the error case, and we get a `Failure` | ||
case representing an unrecoverable error (combinators like `alt!` will not try another branch). | ||
The "verbose" error management is now a truly additive feature above the "simple" one (adding a | ||
case to an enum). Error management types also get smaller and more efficient. We can now return | ||
the related input as part of the error in all cases. | ||
All of this will likely not affect your existing parsers, but require changes to the surrounding | ||
code that manipulates parser results. | ||
## Replacing parser result matchers | ||
Whenever you use pattern matching on the result of a parser, or compare it to another parser | ||
result (like in a unit test), you will have to perform the following changes: | ||
For the correct result case: | ||
```rust,ignore | ||
IResult::Done(i, o) | ||
// becomes | ||
Ok((i, o)) | ||
``` | ||
For the error case (note that argument position for `error_position` and other such macros was changed | ||
to match the rest of the code): | ||
```rust,ignore | ||
IResult::Error(error_position!(ErrorKind::OneOf, input)), | ||
// becomes | ||
Err(Err::Error(error_position!(input, ErrorKind::OneOf))) | ||
``` | ||
```rust,ignore | ||
IResult::Incomplete(Needed::Size(1)) | ||
// becomes | ||
Err(Err::Incomplete(Needed::Size(1))) | ||
``` | ||
For pattern matching, you now need to handle the `Failure` case as well, which works like the error | ||
case: | ||
```rust,ignore | ||
match result { | ||
Ok((remaining, value)) => { ... }, | ||
Err(Err::Incomplete(needed) => { ... }, | ||
Err(Err::Error(e)) | Err(Err::Failure(e)) => { ... } | ||
} | ||
``` | ||
## Errors on `Incomplete` data size calculation | ||
In previous versions, `Needed::Size(sz)` indicated the total needed data size (counting the actual input). | ||
Now it only returns the additional data needed, so the values will have changed. | ||
## New trait for input types | ||
nom allows other input types than `&[u8]` and `&str`, as long as they implement a set of traits | ||
that are used everywhere in nom. This version introduces the `AtEof` trait: | ||
```rust | ||
pub trait AtEof { | ||
fn at_eof(&self) -> bool; | ||
} | ||
``` | ||
This trait allows the input value to indicate whether there can be more input coming later (buffering | ||
data from a file, or waiting for network data). | ||
## Dealing with `Incomplete` usage | ||
nom's parsers are designed to work around streaming issues: if there is not enough data to decide, a | ||
parser will return `Incomplete` instead of returning a partial value that might be false. | ||
As an example, if you want to parse alphabetic characters then digits, when you get the whole input | ||
`abc123;`, the parser will return `abc` for alphabetic characters, and `123` for the digits, and `;` | ||
as remaining input. | ||
But if you get that input in chunks, like `ab` then `c123;`, the alphabetic characters parser will | ||
return `Incomplete`, because it does not know if there will be more matching characters afterwards. | ||
If it returned `ab` directly, the digit parser would fail on the rest of the input, even though the | ||
input had the valid format. | ||
For some users, though, the input will never be partial (everything could be loaded in memory at once), | ||
and the solution in nom 3 and before was to wrap parts of the parsers with the `complete!()` combinator | ||
that transforms `Incomplete` in `Error`. | ||
nom 4 is much stricter about the behaviour with partial data, but provides better tools to deal with it. | ||
Thanks to the new `AtEof` trait for input types, nom now provides the `CompleteByteSlice(&[u8])` and | ||
`CompleteStr(&str)` input types, for which the `at_eof()` method always returns true. | ||
With these types, no need to put a `complete!()` combinator everywhere, you can juste apply those types | ||
like this: | ||
```rust,ignore | ||
named!(parser<&str,ReturnType>, ... ); | ||
// becomes | ||
named!(parser<CompleteStr,ReturnType, ... ); | ||
``` | ||
```rust,ignore | ||
named!(parser<&str,&str>, ... ); | ||
// becomes | ||
named!(parser<CompleteStr,CompleteStr, ... ); | ||
``` | ||
```rust,ignore | ||
named!(parser, ... ); | ||
// becomes | ||
named!(parser<CompleteByteSlice,CompleteByteSlice, ... ); | ||
``` | ||
And as an example, for a unit test: | ||
```rust,ignore | ||
assert_eq!(parser("abcd123"), Ok(("123", "abcd")); | ||
// becomes | ||
assert_eq!(parser(CompleteStr("abcd123")), Ok((CompleteStr("123"), CompleteStr("abcd"))); | ||
``` | ||
These types allow you to correctly handle cases like text formats for which there might be a last | ||
empty line or not, as seen in [one of the examples](https://github.com/Geal/nom/blob/87d837006467aebcdb0c37621da874a56c8562b5/tests/multiline.rs). | ||
## Producers and consumers | ||
Producers and consumers were removed in nom 4. That feature was too hard to integrate in code that | ||
deals with IO. | ||