-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use IRIs for resource identification #347
Comments
Currently CSS also assume identifiers as URIs. |
There is an open issue on what to do with this at solid/web-access-control-spec#93 Also related: #22 |
I don't know if it is actually the concern, but IRIs expand the attack surface for homograph attacks quite significantly. It makes sense to tread carefully in this area. |
that is concern only for case of domain-names. not w.r.t resource identifiers in given domain. Even that case is not a technical vulnerability per se, but of naive usage. This need not prevent us in embracing iris, which enable international users name resources in their scripts. |
Human-readable labels should be part of the resource description. IRI preferably stays machine-readable, especially everything after the domain name. Most mobile browsers will not even show anything else to the user. |
Is it? I agree that if there is only one storage per authority, or every storage has the same owner, it is a likely a marginal problem, but that is not necessarily the case.
Yes, but as most attacks are not on technical vulnerabilities, I think it is an important concern nevertheless. The problem is that people tend to rely on a bunch of flawed heuristics to determine actions that have security implications, like looking at the URL, I see this in my own family members. We should ensure that they don't need to, they should be able to rely on a few security mechanisms that can be easily understood by lay people, but I believe we're not there, IMHO. Until we have that, homograph attacks look pretty scary to me, but admittedly, I do not have numbers to back that up.
I believe there is nothing in the technical infrastructure that prevents us from adopting IRIs. It is actually not clear to me why we haven't. |
IRIs are not about end-user friendly labels. But they enable to have identifiers too internationalized, which in some cases cannot be expressed in ergonomic uris. for example, if we want to have identifier for an entry in lexicon of a language, or concepts of certain culture, with out complexities of lossy-translation or transliteration, etc, or simply unicode-filenames. Of-cource we can use uris with percent-encoding in this case, but it gives above issues, and non-ergonomic. |
There may be actually few things solid need to specify if to support iris. IRI support will be non-trivial for following reasons.
Say, in an identifier, we want to have a segment
Now issue is, for both of above approaches, if we want those identifiers in http, and also dereferencable (as in case of solid), http uri will be same. Thus rdf mandates to distinguish between Thus if we want to Solid is interesting case for iris, as it have to support both rdf, and http concerns for dereferencable-information-resources. |
contd from above... There seems two straight-forward behaviours possible to address this case.
2nd behaviour seems better for me. |
OK, thank you very much for your perspective, @damooo , it is very much appreciated. I believe we should address this in the future. It is very good that you already have concrete proposals to resolve this! |
I am not sure if gravity of this issue is clearly understood. Solid doesn't specify whether to decode or not http uri, to compute identifier of resource. It has potential to cause solid based linked-data apps doesn't work same with all solid servers, and will fail in many basic tasks, when ever an unicode char enter an identifier. I just tested with I give here let's say there is already a container with id In NSS:1.
|
2nd behaviour to percent-decode uris complicates few other things. specifically resolving relative-identifiers, base-identifiers etc will be nightmare to specify. There may be many other issues that will popup from different standards and operations. Thus solid should better go with first option, and must specify it only supports uris as identifiers for information-resources it manages. And should mandate to use literal uris as identifiers for resources it manages in rdf-docs. |
My apologies, @damooo , for not giving this issue the attention it deserves. My mental queue is filled since we have a deadline for the current milestone today. So, please take the following as little more than loud-thinking: On the backend, I suspect that IRIs would be prevalent, as RDF is defined in those terms. In the case of NSS, I found that it stores filenames on disc with UTF-8:
("blåbærsyltetøy" = "blueberry jam" has all three non-ascii Norwegian characters in one word, and is thus my favorite word for looking into such problems :-) ) Thus, the entire problem seems to be in the upper layer of the server implementations. I don't have the bandwidth to understand the implications, but given that we could potentially have a SPARQL Endpoint towards the stored data, it doesn't seem quite attractive to me to only have percent-encoded URIs, but there is also the homograph attack problem... However, since NSS is in line with option 1., and the short term goal for 0.9 is to describe NSS behavior, then it nevertheless seems like what we should do in the short term. Yet, there seems to be potential for a more sophisticated approach in longer term. |
@kjetilk , thanks for response
I want to clarify technical issue raised. |
Certainly! I just wanted to clarify that those implementation details are not what is holding us back. |
All, great issue and feedback.
All Solid specifications (not only SOLID-PROTOCOL) should clarify their requirements and considerations pertaining to internationalization. There is already some work on this, including spec content and open issues - for reviews - so improvements are very welcome. As per AWWW, the "situation" in Solid Protocol is that while the Interaction component necessitates the use of URIs, the Identification and Data Formats components, for the most part, necessitates the use of IRIs. I suggest that we clarify the situations in which converting IRIs to URIs, and vice versa, could happen and where new recommendations or considerations may be necessary. If so, there needs to be coherent round-tripping IRI->URI->IRI. Note: The table below is to document and discuss. It is NOT an exhaustive list of situations and the notes are not necessarily correct (or implementable). Implementations may want to experiment or provide feedback. Test authors should ignore this.
[1] Current transmission on the wire. |
@csarven, I propose to take this issue (and the related #22) up in the milestone for v0.11. As long as we specify Solid's handling of identifiers, there should be no problem, so it is important to do so. I will do a proposal based on the relevant specs and the idea above, to get the conversation going again. |
Sure. The "idea above" being #347 (comment) ? |
Is there a reason IRIs (Internationalized Resource Identifiers) (RFC3987) are not used for resource identification even though they are part of the RDF 1.1 specification and, in fact, they are already widely used?
I understand that on the level of HTTP, which may be the base of the Solid protocol, only URIs are used. However, I have the feeling the relationship of URIs and IRIs should be addressed somehow. In RFC3987 there is a mapping (percent-encoding) of IRIs to URIs and back. However, e.g. in RDF Turtle, IRI and its percent-encoded URI counterpart, are treated as two different IRIs, therefore some issues could arise.
Maybe this was already discussed, but I cannot see any mention of IRIs in the specification.
The text was updated successfully, but these errors were encountered: