Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce EbmlParsable trait to make the EBML parsing more ergonomic #115

Merged
merged 6 commits into from
Apr 4, 2023

Conversation

FreezyLemon
Copy link
Contributor

Okay, this should be the last time I do major changes to the EBML parsing code.

It has taken a while, but after this I feel good in going over the Matroska specification to fix the obvious issues (default values, optional vs. non-optional values). And then I'd like to fix the remuxer and the info tool.

* Introduce `EbmlParsable` trait to parse most known EBML types
* Rename `ErrorKind` to `ParseError`
* Move Element ID from `ParseError` into the `Error` enum
* Remove `ebml_` prefix from parsing functions
* Move `complete` combinator into parsing functions
* Implement EbmlParsable for Uuid
* Rename `skip_void` to `void`
  (The combinator doesn't really skip the void, it returns it)
* Rename `eat_void` to `skip_void` (better description)
Copy link
Member

@Luni-4 Luni-4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! The code is much cleaner! Just a question from my side

src/ebml.rs Outdated Show resolved Hide resolved
* float_def inserts a default value for 0-octet or missing Elements
* Change some struct fields to non-optional
Copy link
Member

@lu-zero lu-zero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, maybe float_with_default() might be more descriptive

@FreezyLemon
Copy link
Contributor Author

FreezyLemon commented Apr 3, 2023

The only thing I don't think we can handle well is quiet and signalling NaNs. EBML doesn't specify how to interpret the signalling bit, and not all platforms handle it the same.

I'll try to give a brief overview of how it works now, because it's a bit confusing.

When parsing a Float, there's 3 possible errors that can happen:

  1. What I call a "nom Error", where the parsing fails on a basic level. The usual example of this would be the VerifyError, which means the ID doesn't match -> Element doesn't exist in the bitstream.
  2. A ParseError::EmptyFloat -> Element exists but has a length of 0.
  3. A ParseError::FloatWidthIncorrect -> Element exists but has a length that is not 0, 4 or 8.

Now, with this information, we can define some behaviour for different situations. Option<f64> is used when we have a Float Element with a minOccurs of 0. f64 is used with a minOccurs = maxOccurs = 1, and we differentiate this between Elements with a default value and those without one.

No. Output type VerifyError EmptyFloat FloatWidthIncorrect
1 Option<f64> None Some(0) Err
2 f64, no default Err 0 Err
3 f64, default d d d Err

Basically, the float function exists to handle cases 1 and 2, while float_default handles case 3 (where the VerifyError also defaults instead of erroring)

@FreezyLemon FreezyLemon marked this pull request as draft April 3, 2023 15:08
@FreezyLemon
Copy link
Contributor Author

I'd like to double-check if the code now does what I explained above . I'll re-convert the PR after that.

@lu-zero
Copy link
Member

lu-zero commented Apr 3, 2023

@robUx4 do you have suggestions regarding NaNs?

@Luni-4
Copy link
Member

Luni-4 commented Apr 4, 2023

Thinking again about this PR, we can leave FIXMEs, open a new issue, talk there and subsequently create another PR. In this way we don't block this refactor. What do you think @FreezyLemon? Better to separate problems perhaps

@FreezyLemon
Copy link
Contributor Author

FreezyLemon commented Apr 4, 2023

@lu-zero I went with float_or(id, default) to make it similar to std's usual naming (unwrap_or, map_or, etc.)

@Luni-4 I don't mind. The float behaviour should in general be at least as good as before. So it should be fine to do those changes/fixes later in a separate PR.

@FreezyLemon FreezyLemon marked this pull request as ready for review April 4, 2023 10:11
Copy link
Member

@Luni-4 Luni-4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@Luni-4 Luni-4 merged commit 8e6c586 into rust-av:master Apr 4, 2023
@FreezyLemon FreezyLemon deleted the rewrite-parsing branch April 4, 2023 14:44
@robUx4
Copy link

robUx4 commented Apr 10, 2023

@robUx4 do you have suggestions regarding NaNs?

Our floats are using the IEEE.754 format. According to Wikipedia it's properly defined:

IEEE 754 NaNs are encoded with the exponent field filled with ones (like infinity values), and some non-zero number in the significand field (to make them distinct from infinity values);

There are multiple NaN values, but any of them is OK as EBML values.

@FreezyLemon
Copy link
Contributor Author

There are multiple NaN values, but any of them is OK as EBML values.

@robUx4 I guess EBML doesn't care about sNaN and qNaN? IEEE754 only has a recommendation regarding the signalling bit for NaNs, not a fast rule. This is because MIPS does it differently than x86 and ARM. I'd imagine EBML doesn't really care about this though?

@robUx4
Copy link

robUx4 commented Apr 25, 2023

EBML doesn't really care about this though?

We never got this far, no. I think it's too late to add constraints now. We can add recommendations, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants