-
Notifications
You must be signed in to change notification settings - Fork 167
/
Copy pathBNFC_XML.html
195 lines (152 loc) · 5.17 KB
/
BNFC_XML.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>XML Representations of LBNF Grammars</title>
</head>
<body bgcolor="#ffffff" text="#000000">
<center>
<h1>Representing data types and their objects in XML</h1>
<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a>
<p>
27 August 2004
</center>
The eXtended Markup Language, XML, is not just a document structuring
language, but also a way to represent data. Having an XML
representation of data objects is useful for the exchange of
data because many systems can read in data written in this form.
<p>
Therefore we have extended the BNF Converter with two flags,
<tt>-xml</tt> and <tt>-xmlt</tt>, which represent grammars
and abstract syntax trees in XML. For a given grammar <tt>Foo.cf</tt>,
the following files are generated:
<ul>
<li> <tt>Foo.dtd</tt>, Document Type Declaration
<li> <tt>XMLFoo.hs</tt>, a printer of trees into XML objects that are
valid w.r.t. <tt>Foo.dtd</tt>
</ul>
The test bench <tt>TestFoo.hs</tt> then also displays the XML
representation of each tree.
For the time being, the flags are only available in combination of
the Haskell back end.
<h2>Goals</h2>
The purpose of the XML generator of BNFC is to use XML as yet
another representation of abstract syntax, in addition to
Haskell's algebraic data types, Java's classes, and C's
unions of structures. Since algebraic datatypes are conceptually
the semantics of all these, we will briefly discuss how algebraic
datatype definitions and encoded in XML.
<p>
Two kinds of XML documents are to be generated:
<ul>
<li> DTDs = Document Type Declarations, from the algebraic datatype
definitions themselves
<li> XML elements, from trees built by constructors.
</ul>
Checking the <b>validity</b> of an element with respect to
a DTD is a central notion of XML. What we want to achieve is that
<ul>
<il> validity checking = type checking
</ul>
To encode a type system with a DTD makes it slightly more complicated
than would be needed otherwise, but we find this goal worth pursuing.
We are still left with several alternative encodings.
<h2>Alternative encodings</h2>
The following examples are used to illustrate the alternative encodings.
<pre>
data Prop = Falsum | Verum | Conj Prop Prop | Disj Prop Prop
Disj Verum Falsum
</pre>
<h4>Constructors as tags, no types</h4>
This gives nice elements, but the DTD is clumsy, since every argument
type of a constructor must be represented with the disjunction of all
constructors. Moreover, there is no natural "top level" tag unless
the type system explicitly includes a top type with just one "wrapper"
constructor.
<pre>
<!ELEMENT Verum EMPTY>
<!ELEMENT Falsum EMPTY>
<!ELEMENT Conj ((Verum|Falsum|Disj|Conj),(Verum|Falsum|Disj|Conj))>
<!ELEMENT Disj ((Verum|Falsum|Disj|Conj),(Verum|Falsum|Disj|Conj))>
<Disj>
<Verum/>
<Falsum/>
</Disj>
</pre>
tags(f xs) = 2 + sum (tags xs)
<h4>Constructors and types as tags</h4>
This gives a natural-looking DTD, but the trees become bulky.
<pre>
<!ELEMENT Verum EMPTY>
<!ELEMENT Falsum EMPTY>
<!ELEMENT Conj (Prop,Prop)>
<!ELEMENT Disj (Prop,Prop)>
<!ELEMENT Prop (Verum | Falsum | Disj | Conj)>
<Prop>
<Disj>
<Prop>
<Verum/>
</Prop>
<Prop>
<Falsum/>
</Prop>
</Disj>
</Prop>
</pre>
tags (f xs) = 4 + sum (tags xs)
<h4>Types as tags, constructors as attributes</h4>
The trees look deceptively natural, but this is a non-starter since
validation does not guarantee type correctness! The dual approach has
the same problem.
<pre>
<!ELEMENT Prop (() | (Prop, Prop))>
<!ATTLIST Prop name CDATA #required>
<Prop name="Disj">
<Prop name = "Verum" />
<Prop name = "Falsum" />
</Prop>
</pre>
tags (f xs) = 2 + sum (tags xs)
<h4>Types as tags, constructors as empty elements</h4>
This has a good balance between natural DTD and tree size, but
the introduction of empty elements to encode constructors feels
like a hack and certainly not very XML-like.
<pre>
<!ELEMENT Prop ((Verum) | (Falsum) | (Conj,Prop,Prop) | (Disj,Prop,Prop)>
<!ELEMENT Verum EMPTY>
<!ELEMENT Falsum EMPTY>
<!ELEMENT Conj EMPTY>
<!ELEMENT Disj EMPTY>
<Prop> <Disj/>
<Prop> <Verum/>
</Prop>
<Prop> <Falsum/>
</Prop>
</Prop>
</pre>
tags (f xs) = 3 + sum (tags xs)
<h2>Alternatives chosen</h2>
We chose to provide two encodings in the BNFC-XML generator.
They can be chosen by different flags:
<ul>
<li> <tt>-xml</tt>, constructors as tags, no types
<li> <tt>-xmlt</tt>, types as tags, constructors as empty elements
</ul>
The first encoding gives nice trees for exchange, but a horrible
DTD. The second encoding gives a nice DTD, but bulky trees.
Both encodings support type checking by validation.
<h2>Literals and token types</h2>
For literals and token types, both encodings use empty elements
with type as tag and the content as value of the attribute
<tt>value</tt>:
<pre>
<Integer value="123" />
<Ident value="x" />
<String value="aatu osaa soutaa" />
</pre>
The corresponding DTD needs two clauses per type <tt>Foo</tt>.
<pre>
<!ELEMENT Foo EMPTY>
<!ATTLIST Foo value CDATA #required>
</pre>
</body>
</html>