-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathturtle.txt
187 lines (162 loc) · 15.2 KB
/
turtle.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
6. Turtle Grammar
A Turtle document is a Unicode[UNICODE] character string encoded in UTF-8. Unicode characters only in the range U+0000 to U+10FFFF inclusive are allowed.
6.1 White Space
White space (production WS) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. Rule names below in capitals indicate where white space is significant; these form a possible choice of terminals for constructing a Turtle parser.
White space is significant in the production String.
6.2 Comments
Comments in Turtle take the form of '#', outside an IRIREF or String, and continue to the end of line (marked by characters U+000D or U+000A) or end of file if there is no end of line after the comment marker. Comments are treated as white space.
6.3 IRI References
Relative IRIs are resolved with base IRIs as per Uniform Resource Identifier (URI): Generic Syntax [RFC3986] using only the basic algorithm in section 5.2. Neither Syntax-Based Normalization nor Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of RFC3986) are performed. Characters additionally allowed in IRI references are treated in the same way that unreserved characters are treated in URI references, per section 6.5 of Internationalized Resource Identifiers (IRIs) [RFC3987].
The @base or BASE directive defines the Base IRI used to resolve relative IRIs per RFC3986 section 5.1.1, "Base URI Embedded in Content". Section 5.1.2, "Base URI from the Encapsulating Entity" defines how the In-Scope Base IRI may come from an encapsulating document, such as a SOAP envelope with an xml:base directive or a mime multipart document with a Content-Location header. The "Retrieval URI" identified in 5.1.3, Base "URI from the Retrieval URI", is the URL from which a particular Turtle document was retrieved. If none of the above specifies the Base URI, the default Base URI (section 5.1.4, "Default Base URI") is used. Each @base or BASE directive sets a new In-Scope Base URI, relative to the previous one.
6.4 Escape Sequences
There are three forms of escapes used in turtle documents:
numeric escape sequences represent Unicode code points:
Escape sequence Unicode code point
'\u' hex hex hex hex A Unicode character in the range U+0000 to U+FFFF inclusive corresponding to the value encoded by the four hexadecimal digits interpreted from most significant to least significant digit.
'\U' hex hex hex hex hex hex hex hex A Unicode character in the range U+0000 to U+10FFFF inclusive corresponding to the value encoded by the eight hexadecimal digits interpreted from most significant to least significant digit.
where HEX is a hexadecimal character
HEX ::= [0-9] | [A-F] | [a-f]
string escape sequences represent the characters traditionally escaped in string literals:
Escape sequence Unicode code point
'\t' U+0009
'\b' U+0008
'\n' U+000A
'\r' U+000D
'\f' U+000C
'\"' U+0022
'\'' U+0027
'\\' U+005C
reserved character escape sequences consist of a '\' followed by one of ~.-!$&'()*+,;=/?#@%_ and represent the character to the right of the '\'.
Context where each kind of escape sequence can be used
numeric
escapes string
escapes reserved character
escapes
IRIs, used as RDF terms or as in @prefix, PREFIX, @base, or BASE declarations yes no no
local names no no yes
Strings yes yes no
Note
%-encoded sequences are in the character range for IRIs and are explicitly allowed in local names. These appear as a '%' followed by two hex characters and represent that same sequence of three characters. These sequences are not decoded during processing. A term written as <http://a.example/%66oo-bar> in Turtle designates the IRI http://a.example/%66oo-bar and not IRI http://a.example/foo-bar. A term written as ex:%66oo-bar with a prefix @prefix ex: <http://a.example/> also designates the IRI http://a.example/%66oo-bar.
6.5 Grammar
The EBNF used here is defined in XML 1.0 [EBNF-NOTATION]. Production labels consisting of a number and a final 's', e.g. [60s], reference the production with that number in the SPARQL 1.1 Query Language grammar [SPARQL11-QUERY].
Notes:
Keywords in single quotes ('@base', '@prefix', 'a', 'true', 'false') are case-sensitive. Keywords in double quotes ("BASE", "PREFIX") are case-insensitive.
Escape sequences UCHAR and ECHAR are case sensitive.
When tokenizing the input and choosing grammar rules, the longest match is chosen.
The Turtle grammar is LL(1) and LALR(1) when the rules with uppercased names are used as terminals.
The entry point into the grammar is turtleDoc.
In signed numbers, no white space is allowed between the sign and the number.
The [162s] ANON ::= '[' WS* ']' token allows any amount of white space and comments between []s. The single space version is used in the grammar for clarity.
The strings '@prefix' and '@base' match the pattern for LANGTAG, though neither "prefix" nor "base" are registered language subtags. This specification does not define whether a quoted literal followed by either of these tokens (e.g. "A"@base) is in the Turtle language.
[1] turtleDoc ::= statement*
[2] statement ::= directive | triples '.'
[3] directive ::= prefixID | base | sparqlPrefix | sparqlBase
[4] prefixID ::= '@prefix' PNAME_NS IRIREF '.'
[5] base ::= '@base' IRIREF '.'
[5s] sparqlBase ::= "BASE" IRIREF
[6s] sparqlPrefix ::= "PREFIX" PNAME_NS IRIREF
[6] triples ::= subject predicateObjectList | blankNodePropertyList predicateObjectList?
[7] predicateObjectList ::= verb objectList (';' (verb objectList)?)*
[8] objectList ::= object (',' object)*
[9] verb ::= predicate | 'a'
[10] subject ::= iri | BlankNode | collection
[11] predicate ::= iri
[12] object ::= iri | BlankNode | collection | blankNodePropertyList | literal
[13] literal ::= RDFLiteral | NumericLiteral | BooleanLiteral
[14] blankNodePropertyList ::= '[' predicateObjectList ']'
[15] collection ::= '(' object* ')'
[16] NumericLiteral ::= INTEGER | DECIMAL | DOUBLE
[128s] RDFLiteral ::= String (LANGTAG | '^^' iri)?
[133s] BooleanLiteral ::= 'true' | 'false'
[17] String ::= STRING_LITERAL_QUOTE | STRING_LITERAL_SINGLE_QUOTE | STRING_LITERAL_LONG_SINGLE_QUOTE | STRING_LITERAL_LONG_QUOTE
[135s] iri ::= IRIREF | PrefixedName
[136s] PrefixedName ::= PNAME_LN | PNAME_NS
[137s] BlankNode ::= BLANK_NODE_LABEL | ANON
Productions for terminals
[18] IRIREF ::= '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>' /* #x00=NULL #01-#x1F=control codes #x20=space */
[139s] PNAME_NS ::= PN_PREFIX? ':'
[140s] PNAME_LN ::= PNAME_NS PN_LOCAL
[141s] BLANK_NODE_LABEL ::= '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)?
[144s] LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
[19] INTEGER ::= [+-]? [0-9]+
[20] DECIMAL ::= [+-]? [0-9]* '.' [0-9]+
[21] DOUBLE ::= [+-]? ([0-9]+ '.' [0-9]* EXPONENT | '.' [0-9]+ EXPONENT | [0-9]+ EXPONENT)
[154s] EXPONENT ::= [eE] [+-]? [0-9]+
[22] STRING_LITERAL_QUOTE ::= '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"' /* #x22=" #x5C=\ #xA=new line #xD=carriage return */
[23] STRING_LITERAL_SINGLE_QUOTE ::= "'" ([^#x27#x5C#xA#xD] | ECHAR | UCHAR)* "'" /* #x27=' #x5C=\ #xA=new line #xD=carriage return */
[24] STRING_LITERAL_LONG_SINGLE_QUOTE ::= "'''" (("'" | "''")? ([^'\] | ECHAR | UCHAR))* "'''"
[25] STRING_LITERAL_LONG_QUOTE ::= '"""' (('"' | '""')? ([^"\] | ECHAR | UCHAR))* '"""'
[26] UCHAR ::= '\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX HEX
[159s] ECHAR ::= '\' [tbnrf"'\]
[161s] WS ::= #x20 | #x9 | #xD | #xA /* #x20=space #x9=character tabulation #xD=carriage return #xA=new line */
[162s] ANON ::= '[' WS* ']'
[163s] PN_CHARS_BASE ::= [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[164s] PN_CHARS_U ::= PN_CHARS_BASE | '_'
[166s] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040]
[167s] PN_PREFIX ::= PN_CHARS_BASE ((PN_CHARS | '.')* PN_CHARS)?
[168s] PN_LOCAL ::= (PN_CHARS_U | ':' | [0-9] | PLX) ((PN_CHARS | '.' | ':' | PLX)* (PN_CHARS | ':' | PLX))?
[169s] PLX ::= PERCENT | PN_LOCAL_ESC
[170s] PERCENT ::= '%' HEX HEX
[171s] HEX ::= [0-9] | [A-F] | [a-f]
[172s] PN_LOCAL_ESC ::= '\' ('_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?' | '#' | '@' | '%')
7. Parsing
The RDF 1.1 Concepts and Abstract Syntax specification [RDF11-CONCEPTS] defines three types of RDF Term: IRIs, literals and blank nodes. Literals are composed of a lexical form and an optional language tag [BCP47] or datatype IRI. An extra type, prefix, is used during parsing to map string identifiers to namespace IRIs. This section maps a string conforming to the grammar in section 6.5 Grammar to a set of triples by mapping strings matching productions and lexical tokens to RDF terms or their components (e.g. language tags, lexical forms of literals). Grammar productions change the parser state and emit triples.
7.1 Parser State
Parsing Turtle requires a state of five items:
IRI baseURI — When the base production is reached, the second rule argument, IRIREF, is the base URI used for relative IRI resolution.
Map[prefix -> IRI] namespaces — The second and third rule arguments (PNAME_NS and IRIREF) in the prefixID production assign a namespace name (IRIREF) for the prefix (PNAME_NS). Outside of a prefixID production, any PNAME_NS is substituted with the namespace. Note that the prefix may be an empty string, per the PNAME_NS production: (PN_PREFIX)? ":".
Map[string -> blank node] bnodeLabels — A mapping from string to blank node.
RDF_Term curSubject — The curSubject is bound to the subject production.
RDF_Term curPredicate — The curPredicate is bound to the verb production. If token matched was "a", curPredicate is bound to the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#type.
7.2 RDF Term Constructors
This table maps productions and lexical tokens to RDF terms or components of RDF terms listed in section 7. Parsing:
production type procedure
IRIREF IRI The characters between "<" and ">" are taken, with the numeric escape sequences unescaped, to form the unicode string of the IRI. Relative IRI resolution is performed per Section 6.3.
PNAME_NS prefix When used in a prefixID or sparqlPrefix production, the prefix is the potentially empty unicode string matching the first argument of the rule is a key into the namespaces map.
IRI When used in a PrefixedName production, the iri is the value in the namespaces map corresponding to the first argument of the rule.
PNAME_LN IRI A potentially empty prefix is identified by the first sequence, PNAME_NS. The namespaces map MUST have a corresponding namespace. The unicode string of the IRI is formed by unescaping the reserved characters in the second argument, PN_LOCAL, and concatenating this onto the namespace.
STRING_LITERAL_SINGLE_QUOTE lexical form The characters between the outermost "'"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_QUOTE lexical form The characters between the outermost '"'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_LONG_SINGLE_QUOTE lexical form The characters between the outermost "'''"s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_LONG_QUOTE lexical form The characters between the outermost '"""'s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
LANGTAG language tag The characters following the @ form the unicode string of the language tag.
RDFLiteral literal The literal has a lexical form of the first rule argument, String. If the '^^' iri rule matched, the datatype is iri and the literal has no language tag. If the LANGTAG rule matched, the datatype is rdf:langString and the language tag is LANGTAG. If neither matched, the datatype is xsd:string and the literal has no language tag.
INTEGER literal The literal has a lexical form of the input string, and a datatype of xsd:integer.
DECIMAL literal The literal has a lexical form of the input string, and a datatype of xsd:decimal.
DOUBLE literal The literal has a lexical form of the input string, and a datatype of xsd:double.
BooleanLiteral literal The literal has a lexical form of the true or false, depending on which matched the input, and a datatype of xsd:boolean.
BLANK_NODE_LABEL blank node The string matching the second argument, PN_LOCAL, is a key in bnodeLabels. If there is no corresponding blank node in the map, one is allocated.
ANON blank node A blank node is generated.
blankNodePropertyList blank node A blank node is generated. Note the rules for blankNodePropertyList in the next section.
collection blank node For non-empty lists, a blank node is generated. Note the rules for collection in the next section.
IRI For empty lists, the resulting IRI is rdf:nil. Note the rules for collection in the next section.
7.3 RDF Triples Constructors
A Turtle document defines an RDF graph composed of set of RDF triples. The subject production sets the curSubject. The verb production sets the curPredicate. Each object N in the document produces an RDF triple: curSubject curPredicate N .
Property Lists:
Beginning the blankNodePropertyList production records the curSubject and curPredicate, and sets curSubject to a novel blank node B. Finishing the blankNodePropertyList production restores curSubject and curPredicate. The node produced by matching blankNodePropertyList is the blank node B.
Collections:
Beginning the collection production records the curSubject and curPredicate. Each object in the collection production has a curSubject set to a novel blank node B and a curPredicate set to rdf:first. For each object objectn after the first produces a triple:objectn-1 rdf:rest objectn . Finishing the collection production creates an additional triple curSubject rdf:rest rdf:nil . and restores curSubject and curPredicate The node produced by matching collection is the first blank node B for non-empty lists and rdf:nil for empty lists.
7.4 Parsing Example
This section is non-normative.
The following informative example shows the semantic actions performed when parsing this Turtle document with an LALR(1) parser:
Example 27
@prefix ericFoaf: <http://www.w3.org/People/Eric/ericP-foaf.rdf#> .
@prefix : <http://xmlns.com/foaf/0.1/> .
ericFoaf:ericP :givenName "Eric" ;
:knows <http://norman.walsh.name/knows/who/dan-brickley> ,
[ :mbox <mailto:[email protected]> ] ,
<http://getopenid.com/amyvdh> .
Map the prefix ericFoaf to the IRI http://www.w3.org/People/Eric/ericP-foaf.rdf#.
Map the empty prefix to the IRI http://xmlns.com/foaf/0.1/.
Assign curSubject the IRI http://www.w3.org/People/Eric/ericP-foaf.rdf#ericP.
Assign curPredicate the IRI http://xmlns.com/foaf/0.1/givenName.
Emit an RDF triple: <...rdf#ericP> <.../givenName> "Eric" .
Assign curPredicate the IRI http://xmlns.com/foaf/0.1/knows.
Emit an RDF triple: <...rdf#ericP> <.../knows> <...who/dan-brickley> .
Emit an RDF triple: <...rdf#ericP> <.../knows> _:1 .
Save curSubject and reassign to the blank node _:1.
Save curPredicate.
Assign curPredicate the IRI http://xmlns.com/foaf/0.1/mbox.
Emit an RDF triple: _:1 <.../mbox> <mailto:[email protected]> .
Restore curSubject and curPredicate to their saved values (<...rdf#ericP>, <.../knows>).
Emit an RDF triple: <...rdf#ericP> <.../knows> <http://getopenid.com/amyvdh> .