Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid IRI while missing scheme? #2975

Closed
hoijui opened this issue Jan 24, 2025 · 6 comments
Closed

Valid IRI while missing scheme? #2975

hoijui opened this issue Jan 24, 2025 · 6 comments

Comments

@hoijui
Copy link
Contributor

hoijui commented Jan 24, 2025

Version

5.4.0-SNAPSHOT

Question

I would expect these two tests to pass, but they both fail:

import java.util.*;
import org.apache.jena.iri.IRIFactory;
import org.apache.jena.iri.Violation;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;

public class TestIriValidation {
	public boolean checkIriBad(final String iri) {
		final IRIFactory iriFac = IRIFactory.iriImplementation();
		final Iterator<Violation> dataTypeIriViolations = iriFac.construct(iri).violations(true);
		boolean hadViolations = false;
		while (dataTypeIriViolations.hasNext()) {
			hadViolations = true;
		}
		return hadViolations;
	}

    @Test
    public void testBadIri1() {
    	assertTrue(checkIriBad("#:bar"));
    }

    @Test
    public void testBadIri2() {
    	assertTrue(checkIriBad("domain.com/directory"));
    }
}

They fail with:

org.opentest4j.AssertionFailedError: expected: <true> but was: <false>

Are these really valid IRIs?

(In Jena 2 the tests passed)

@afs
Copy link
Member

afs commented Jan 24, 2025

Online validator for IRIs https://www.sparql.org/iri-validator.html (running Jena 5.3.0 / jena-iri3986)
Code IRIValidatorHTML.

The IRI class is IRIx. This abstracts the provider.

Two providers old/legacy jena-iri (your code uses that) and jena-irir3986 (will replace jena-iri soon).

Even when using jena-iri, the correct route is via IRIx because the violation checking is different and reflects RDF better.

jena-iri3986 is up-to-date with all the RFCs that define the various URIs scheme of interest, as well as RFC 3986/7, the syntax of IRIs.

"Are these really valid IRIs?"

They are valid as IRIs but they are both not suitable for RDF which requires resolved IRIs which are not relative references. (Oddly, I was updating the "RDF Concepts" in the RDF Working Group in this area last week.)

Jena calls them " RDF references" - there isn't an exact term for this in the RFC3986 spec. ("Absolute" does not mean what people intuitively think it means!)

This is an area that has tightened up over the years, both spec-wise and in Jena.

Jena2 is RDF 1.0. RDF 1.0 that spec has "RDF URI references" - this all predates IRI3986/7.

    public static boolean checkIriBad(final String iriStr) {
        final IRIx iri = IRIx.create(iriStr);
        System.out.println(iri);
        System.out.println("hasViolations: "+iri.hasViolations());
        System.out.println("Reference: "+iri.isReference());
        return iri.hasViolations();
    }

isReference is false for both jena-iri and jena-iri3986 implementations.

Weclome to the black hole of IRI details!

@afs afs added the IRI label Jan 24, 2025
@hoijui
Copy link
Contributor Author

hoijui commented Jan 24, 2025

wooaahhhhhh....
(reading this message of yours.. does it mean I am past the event horizon already, and can't escape anymore? :-/ )

thanks a lot!
I do not get everything really, but ...

So instead of checkIriBad, what I really want is more like checkValidIriAndNotAReference?

Using the code you supplied, both test strings still pas as valid (and thus the tests fail):

#:bar
hasViolations: false
Reference: false
domain.com/directory
hasViolations: false
Reference: false

(Using latest jena main branch as of now: jena-5.3.0-18-ga1ca2b7c3d)

@afs
Copy link
Member

afs commented Jan 24, 2025

checkValidIriAndNotAReference -- yes "checkValidIriForRDF"

jena-iri3986 is general IRI processor - it does not apply or check RDF rules.

IRIx3986 is the adapter from jena-iri3986 to IRIx (which Jena uses everywhere).

A parser will always resolve a IRI against the base (there is always a base):

    public static void iriProcess(final String iriStr) {
        IRIx iri = IRIx.create(iriStr);
        System.out.println("Argument: "+iri);
        IRIx base = IRIs.getSystemBase();

        IRIx iriResolved = base.resolve(iri);

        System.out.println("iriResolved ="+iriResolved);
        System.out.println("Reference: "+iriResolved.isReference());
    }

@hoijui
Copy link
Contributor Author

hoijui commented Jan 24, 2025

aHA!
isReference means exactly the opposite of what I though it means.. doh! :D
thanks again!

So I'll use:

    public static boolean checkValidIriForRDF(final String iriStr) {
        final IRIx iri = IRIx.create(iriStr);
        return !iri.hasViolations() && iri.isReference();
    }
     public void testBadIri1() {
    	 assertFalse(checkValidIriForRDF("#:bar"));
     }

     public void testBadIri2() {
    	 assertFalse(checkValidIriForRDF("domain.com/directory"));
     }

     public void testGoodIri1() {
    	 assertTrue(checkValidIriForRDF("xsd:string"));
     }

     public void testGoodIri2() {
    	 assertTrue(checkValidIriForRDF("http://domain.com/directory#good"));
     }

These all pass!

@afs
Copy link
Member

afs commented Jan 24, 2025

Caveat the "xsd:string" - that is not XSD string. That is <xsd:string> - URI scheme "xsd", path "string". Jena does not expand prefixed names at the API.

@hoijui
Copy link
Contributor Author

hoijui commented Jan 24, 2025

ouhhhh...
(I feel the the pull, :O awwww....)

@afs afs closed this as completed Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants