Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: deterministic longturtle serialisation using RDF canonicalization + n-triples sort #3008

Merged
merged 4 commits into from
Jan 6, 2025

Conversation

edmondchuc
Copy link
Contributor

@edmondchuc edmondchuc commented Dec 13, 2024

This PR improves upon #2997 to remove the bespoke object blank node sorting technique to instead use sorted n-triples str lines after applying the RGDA1 graph canonicalisation algorithm. Fixes #1890.

It's necessary to read in the sorted n-triples lines with skolemize=True to preserve the blank node identifiers from the canonicalisation algorithm.

Now that we can sort reliably by the blank node identifiers, this implementation works for all blank node positions in an RDF statement, no matter if it's in the subject or object position. It even works for blank nodes at the top-level.

@ajnelson-nist, I've added your blank node test from Sort Turtle output (#1978) and it's now passing, yay!

Update: this also fixes the double up of semicolons bug when the subject is a top-level blank node. See 412fb5d.

Checklist

  • Checked that there aren't other open pull requests for
    the same change.
  • Checked that all tests and type checking passes.
  • If the change adds new features or changes the RDFLib public API:
    • Created an issue to discuss the change and get in-principle agreement.
    • Considered adding an example in ./examples.
  • If the change has a potential impact on users of this project:
    • Added or updated tests that fail without the change.
    • Updated relevant documentation to avoid inaccuracies.
    • Considered adding additional documentation.
  • Considered granting push permissions to the PR branch,
    so maintainers can fix minor issues and keep your PR up to date.

@coveralls
Copy link

coveralls commented Dec 13, 2024

Coverage Status

coverage: 90.274% (-0.002%) from 90.276%
when pulling 412fb5d on edmond/longturtle
into e8f61d4 on main.

@ajnelson-nist
Copy link
Contributor

@edmondchuc Thank you for porting my test!

Though, I noticed there were some spots semicolons got doubled-up.

@edmondchuc
Copy link
Contributor Author

@ajnelson-nist thanks for catching the double-up of the semicolons! I think it's fixed now.

@nicholascar nicholascar merged commit 182c3ba into main Jan 6, 2025
22 checks passed
@nicholascar nicholascar deleted the edmond/longturtle branch January 6, 2025 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sorting Turtle output?
4 participants