-
Notifications
You must be signed in to change notification settings - Fork 167
/
Copy pathBNF_Converter_C_Mode.html
440 lines (421 loc) · 20 KB
/
BNF_Converter_C_Mode.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
<title>BNF Converter C Mode</title>
</head>
<body>
<div style="text-align: center; ">
<h2>BNF Converter</h2>
<h2>C Mode</h2>
</div>
<h3>By Michael Pellauer</h3>
<h3>2003</h3>
One of the best features of the LBNF file format is that it is completely
independent of the target programming language. The new C mode allows the
BNF Converter to output a compiler front end in C using Flex and Bison. In
theory it will also work with Lex and YACC as well, although this has not
been thoroughly tested.<br>
<br>
BNFC C mode has been tested to work with Flex version 2.5.4 and Bison version 1.875.<br>
<br>
BNF Converter Homepage:<br>
<a href="http://www.cs.chalmers.se/%7Emarkus/BNFC/">http://www.cs.chalmers.se/~markus/BNFC/</a>
<br>
<br>
Flex Homepage:<br>
<a href="http://www.gnu.org/software/flex/flex.html">http://www.gnu.org/software/flex/flex.html</a>
<br>
<br>
Bison Homepage:<br>
<a href="http://www.gnu.org/software/bison/bison.html">http://www.gnu.org/software/bison/bison.html</a>
<br>
<br>
<h3>USAGE<br>
</h3>
<div style="margin-left: 40px; "><big><span style="font-family: monospace; ">
bnfc [-m] -cpp FILENAME.cf</span></big><br style="font-family: monospace; ">
</div>
<br>
To access the C mode simply pass the -c command to bnfc. <br>
The result will be the following files:<br>
<br>
<table cellpadding="2" cellspacing="2" border="0" style="text-align: left; width: 100%; ">
<tbody>
<tr>
<td style="vertical-align: top; text-decoration: underline; "><big><span style="font-family: monospace; ">
Filename:</span></big></td>
<td style="vertical-align: top; text-decoration: underline; "><big><span style="font-family: monospace; ">
Description:</span></big></td>
</tr>
<tr>
<td style="vertical-align: top; "><big><span style="font-family: monospace; ">
Absyn.h, Absyn.c</span></big><big><span style="font-family: monospace; "><br>
</span></big><big><span style="font-family: monospace; ">FILENAME.l<br>
</span></big><big><span style="font-family: monospace; ">FILENAME.y<br>
Pri</span></big><big><span style="font-family: monospace; ">nter.h, Printer.c<br>
</span></big><big><span style="font-family: monospace; ">Skeleton.h,
Skeleton.c<br>
Parser.h<br>
</span></big><big><span style="font-family: monospace; ">Test.c<br>
</span></big><big><span style="font-family: monospace; ">Makefile</span></big><br>
</td>
<td style="vertical-align: top; "><big><span style="font-family: monospace; ">
Abstract Syntax tree interface && structs</span></big><big><span style="font-family: monospace; "></span></big><br>
<big><span style="font-family: monospace; ">Flex file (lexer specification)<br>
Bison</span></big><big><span style="font-family: monospace; "> file (parser
specification)<br>
</span></big><big><span style="font-family: monospace; ">A Pretty
Printer for the abstract syntax tree.<br>
</span></big><big><span style="font-family: monospace; ">A code skeleton
for traversing the syntax tree.<br>
</span></big><big><span style="font-family: monospace; ">Contains
definitions of tokens for the lexer.<br>
A testbench to test the final result.<br>
</span></big><big><span style="font-family: monospace; ">A Makefile
(generated only with the -m flag).</span></big><br>
</td>
</tr>
</tbody>
</table>
<br>
<br>
<h3>Compiling the Compiler Front End</h3>
You can use the generated Makefile to compile the generated C Code with
"make" ("gmake" on some systems).<br>
<br>
If all goes well the following files should be generated:<br>
<br>
<table cellpadding="2" cellspacing="2" border="0" style="text-align: left; width: 100%; ">
<tbody>
<tr>
<td style="vertical-align: top; font-family: monospace; text-decoration: underline; "><big>
File:</big></td>
<td style="vertical-align: top; font-family: monospace; text-decoration: underline; "><big>
Description:</big></td>
</tr>
<tr>
<td style="vertical-align: top; font-family: monospace; "><big>Absyn.o<br>
Lexer.c<br>
Lexer.o<br>
Parser.c<br>
Parser.o<br>
Printer.o<br>
Test.o<br>
testFILENAME<br>
</big></td>
<td style="vertical-align: top; font-family: monospace; "><big> Abstract
Syntax files<br>
Lexer generated by Flex<br>
Compiled Lexer<br>
Parser generated by Bison<br>
Compiled parser<br>
Compiled Pretty Printer<br>
Compiled Test Bench<br>
Compiled executable of all .o files<br>
</big></td>
</tr>
</tbody>
</table>
<h3>Testing the Compiler Front End<br>
</h3>
The executable <span style="font-family: monospace; font-weight: bold; ">
testFILENAME</span> (where <span style="font-family: monospace; font-weight: bold; ">
FILENAME</span> is the name of the <span style="font-family: monospace; font-weight: bold; ">
.cf</span> file) may be used to test the result of the generation. To use
it simply give it the name of a file written in the target language. If parsing
is correct, the abstract syntax tree (in Haskell-like syntax) and linearized
pretty-printed result will be shown.<br>
<br>
For example, if we had created a subset of Java called Javalette, in <span style="font-family: monospace; font-weight: bold; ">
Javalette.cf</span>:<br>
<br>
<big><span style="font-family: monospace; font-weight: bold; ">> ./testJavalette
helloworld.jl</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">Parse Successful!</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">[Abstract Syntax]</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">(PDefList [(FuncDef TInt(FFuncName
"main")[][(SPrintString "Hello World"),</span><span style="font-family: monospace; ">
(SReturn NoExpression)]NoSemicolon)])</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">[Linearized Tree]</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">int main () </span><br style="font-family: monospace; ">
<span style="font-family: monospace; ">{</span><br style="font-family: monospace; ">
<span style="font-family: monospace; "> printString ("Hello World")
;</span><br style="font-family: monospace; ">
<span style="font-family: monospace; "> return ;</span><br style="font-family: monospace; ">
<span style="font-family: monospace; "> </span><br style="font-family: monospace; ">
<span style="font-family: monospace; ">}</span><br style="font-family: monospace; ">
</big><br>
If no argument is given then the executable attempts to read from stdin.<br>
<br>
<h3>The Abstract Syntax Tree</h3>
The Abstract Syntax tree is generated following the method Andrew Appel
outlines in<br>
"Modern Compiler Construction in C"<br>
<a href="http://www.cs.princeton.edu/%7Eappel/modern/c/">http://www.cs.princeton.edu/~appel/modern/c/</a>
<br>
<br>
The generated code closely follows Appels methodology, where each struct
represents an LBNF category. Labels are represented by a kind field, and
each kind has a field in a union, with the field name based on the LBNF label
name. <br>
Here is an example. Let us say that we have made a simple LBNF file to
describe lists of boolean expressions, called <span style="font-family: monospace; font-weight: bold; ">
BoolExp.cf</span>:<br>
<br>
<big><span style="font-family: monospace; ">--Begin LBNF file</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">PROG.
PROG ::= [EXP] ;</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">OrExp.
EXP ::= EXP "||" EXP1 ;</span><br style="font-family: monospace; ">
<span style="font-family: monospace; ">AndExp.
EXP1 ::= EXP1 "&&" EXP2 ;</span><br style="font-family: monospace; ">
<span style="font-family: monospace; ">TVal.
EXP2 ::= "true" ;</span><br style="font-family: monospace; ">
<span style="font-family: monospace; ">FVal.
EXP2 ::= "false" ;<br>
</span><br style="font-family: monospace; ">
<span style="font-family: monospace; ">separator EXP ";" ;</span><br style="font-family: monospace; ">
<span style="font-family: monospace; ">coercions EXP 2 ;</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">--LBNF file ends here</span></big><br>
<br>
Absyn.H will define the following classes:<br>
<br>
<big><tt><font color="#3333ff">struct</font> PROG_<br>
{<br>
<font color="#3333ff">enum</font> { is_PROG } kind;<br>
<font color="#3333ff">union</font><br>
{<br>
<font color="#3333ff">struct</font> { ListEXP listexp_;
} program_;<br>
} u;<br>
};<br>
<font color="#3333ff"> typedef</font> <font color="#3333ff">struct</font>
PROG_ *PROG;<br>
struct EXP_<br>
{<br>
<font color="#3333ff">enum</font> { is_EOr, is_EAnd, is_ETrue, is_EFalse,
is_EVar } kind;<br>
<font color="#3333ff">union</font><br>
{<br>
struct { EXP exp_1, exp_2; } eor_;<br>
struct { EXP exp_1, exp_2; } eand_;<br>
struct { Ident ident_; } evar_;<br>
} u; <br>
};<br>
<font color="#3333ff"> typedef</font> <font color="#3333ff">struct</font>
EXP_ *EXP;<br>
<font color="#3333ff">struct</font> ListEXP_<br>
{<br>
EXP exp_;<br>
ListEXP listexp_;<br>
};<big><br style="font-family: monospace; ">
<span style="font-family: monospace; "></span></big>typedef <font color="#3333ff">
struct</font> ListEXP_ *ListEXP;</tt></big><br>
<br>
Note that as in Appel's method, the actual struct name ends with an underscore,
and a typedef removes the need to specify pointers with *. Singular categories
such as <big><span style="font-family: monospace; ">PROG</span></big> maintain
their kind field for simplicity even though it has no other possible alternatives.<br>
<br>
BNFC generates constructor functions to create structs with a specific kind.
For example, for the constructors for the EXP struct above:<br>
<big><tt><br>
EXP make_EOr(EXP p1, EXP p2)<br>
{<br>
EXP tmp = (EXP) <b>malloc</b>(<b>sizeof</b>(*tmp));<br>
<b>if</b> (!tmp)<br>
{<br>
fprintf(stderr, <font color="#33cc00">"Error: out of
memory when allocating EXP!\n"</font>);<br>
exit(1);<br>
}<br>
tmp->kind = is_EOr;<br>
tmp->u.eor_.exp_1 = p1;<br>
tmp->u.eor_.exp_2 = p2;<br>
<b>return</b> tmp;<br>
}<br>
...<br>
EXP make_ETrue()<br>
{<br>
EXP tmp = (EXP) malloc(sizeof(*tmp));<br>
if (!tmp)<br>
{<br>
fprintf(stderr, <font color="#33cc00">"Error: out of
memory when allocating EXP!\n"</font>);<br>
exit(1);<br>
}<br>
tmp->kind = is_ETrue;<br>
<b>return</b> tmp;<br>
}<br>
...</tt></big><big> </big> <br>
<br>
<span style="font-weight: bold; "></span>The <big><span style="font-family: monospace; ">
ListEXP</span></big> struct is automatically generated to represent the
list of <big><span style="font-family: monospace; ">EXP</span></big>. This
is significantly different than the standard BNF Haskell mode, which usesHaskell's
built-in lists. Currently for each class <span style="font-style: italic; ">
NAME</span> that can be held in a list, a struct List<span style="font-style: italic; ">
NAME</span> is generated. These are straightforward null-terminated linked
lists and should be familiar to all C programmers. In the future this feature
could be extended to C++'s Standard Template Library.<br>
<br>
For example, here is the constructor for <big><span style="font-family: monospace; ">
ListEXP</span></big>:<br>
<big><tt><big><br>
</big>ListEXP make_ListEXP(EXP p1, ListEXP p2)<br>
{<br>
ListEXP tmp = (ListEXP) <b>malloc</b>(<b>sizeof</b>(*tmp));<br>
<b>if</b> (!tmp)<br>
{<br>
fprintf(stderr, <font color="#33cc00">"Error: out of
memory when allocating ListEXP!\n"</font>);<br>
exit(1);<br>
}<br>
tmp->exp_ = p1;<br>
tmp->listexp_ = p2;<br>
<b>return</b> tmp;<br>
}<big><span style="font-family: monospace; "><span style="color: rgb(51,51,255); ">
</span> </span></big></tt></big><big><span style="font-family: monospace; ">
</span><br style="font-family: monospace; ">
</big><br style="font-family: monospace; ">
The programmer can then traverse through the abstract syntax tree using
the generated Skeleton files (see below). The PrettyPrinter is an example
of this. It contains two functions that traverse the tree and return Strings.
The "show" functions print a Haskell-like view of the data type names. The
"print" function is a pretty-printer that uses simple heuristics (easily changed
in the functions "renderC" and "renderS") to linearize the syntax tree.<br>
<br>
We can see the PrettyPrinter class at work in the automatically generated
file <b><tt>Test.c</tt></b>. Continuing our early example, let us write a
simple input file in our language of boolean expressions:<br>
<br>
<big><span style="font-family: monospace; ">true && (false || true);</span><br style="font-family: monospace; ">
<span style="font-family: monospace; ">true;</span><br style="font-family: monospace; ">
<span style="font-family: monospace; ">false || false</span><br style="font-family: monospace; ">
</big><br>
We can now test our parser using the Test class:<br>
<br style="font-weight: bold; ">
<big><span style="font-family: monospace; font-weight: bold; ">> ./testBoolExp
testfile</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">Parse Successful!</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">[Abstract Syntax]</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">(PROG [(AndExp TVal(OrExp FValTVal)),
TVal, (OrExp FValFVal)])</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">[Linearized Tree]</span><br style="font-family: monospace; ">
<br style="font-family: monospace; ">
<span style="font-family: monospace; ">true && (false || true)
;</span><br style="font-family: monospace; ">
<span style="font-family: monospace; ">true ;</span><br style="font-family: monospace; ">
<span style="font-family: monospace; ">false || false </span></big><br>
<br>
<h3>Using the Skeleton File</h3>
Now that we have sucessfully constructed a parser, it's time to do something
useful with it. This generally involves traversing the Abstract Syntax tree
and producing a new data structure, such as translating from a high-level
language to assembly code. To assist the programmer with this process the
BNF Converter's C++ Mode provides the files <span style="font-family: monospace; font-weight: bold; ">
Skeleton.h</span> and <span style="font-family: monospace; font-weight: bold; ">
Skeleton.c</span><br>
<br>
This file is an empty skeleton that traverses the abstract syntax tree
(currently doing nothing interesting). It uses simple switch statements to
traverse the tree based on the kind field of the struct. For example, here
is the code that would be generated for EXP, above:<big><tt><br>
<br>
<font color="#3333ff">void</font> visitEXP(EXP _p_)<br>
{<br>
<font color="#3333ff">switch</font>(_p_->kind)<br>
{<br>
<font color="#3333ff">case</font> is_EOr:<br>
<font color="#cc0000">/* Code for EOrGoes Here */</font><br>
visitEXP(_p_->u.eor_.exp_1);<br>
visitEXP(_p_->u.eor_.exp_2);<br>
<font color="#3333ff">break;</font><br>
<font color="#3333ff">case</font> is_EAnd:<br>
<font color="#cc0000">/* Code for EAndGoes Here */</font><br>
visitEXP(_p_->u.eand_.exp_1);<br>
visitEXP(_p_->u.eand_.exp_2);<br>
<font color="#3333ff">break;</font><br>
<font color="#3333ff">case</font> is_ETrue:<br>
<font color="#cc0000">/* Code for ETrueGoes Here */</font><br>
break;<br>
<font color="#3333ff">case</font> is_EFalse:<br>
<font color="#cc0000">/* Code for EFalseGoes Here */</font><br>
<font color="#3333ff">break;</font><br>
<font color="#3333ff">case</font> is_EVar:<br>
<font color="#cc0000">/* Code for EVarGoes Here */</font><br>
visitIdent(_p_->u.evar_.ident_);<br>
<font color="#3333ff">break;</font><br>
<font color="#3333ff">default:</font><br>
fprintf(stderr, "Error: bad kind field when printing
EXP!\n");<br>
exit(1);<br>
}<br>
}</tt></big><br>
<br>
To use the skeleton the programmer must generally do the following:<br>
<ul>
<li>Copy <span style="font-family: monospace; font-weight: bold; ">Skeleton.h</span>
and <b><span style="font-family: monospace; ">Skeleton.c</span></b> to
a new file</li>
<li>Change the name of the functions from <tt>visit</tt> to something
meaningful.</li>
<li>Change the parameters and return types of the <tt>visit </tt>functions
with a search-and-replace.<br>
</li>
<li>Add some variables and a top level method to begin the transformation.</li>
</ul>
From this point it should be fairly straightforward to implement
the desired algorithm. Good examples of it can be found in the two functions
in <span style="font-family: monospace; font-weight: bold; ">Printer.c</span><br>
<h3>Known Issues<br>
</h3>
Here is a list of known issues with the BNF Converter's C Mode.<br>
<br>
<span style="font-weight: bold; text-decoration: underline; ">Keywords</span><br>
<br>
C contains many more keywords than Haskell, and currently there is nothing
to prevent a programmer from using these keywords in their LBNF grammar. For
example:<br>
<br>
<big><span style="font-family: monospace; ">Static. Typedef
::= Sychronized "." Switch ;</span></big><br>
<br>
While this will be accepted by BNF it might result in un-compilable code,
as the generated C occaisionally uses lowercase names for instance variables.
As such programmers should generally avoid matching Label names with C keywords.<br>
<br>
<br>
<span style="font-weight: bold; text-decoration: underline; ">Reuse of
Label Names</span><br>
<br>
Reusing label names currently results in a warning from BNFC. However,
for C this is a much more serious issue, as they are used for enumerated
types. For example:<br>
<big><tt><br>
<font color="#3333ff">struct</font> FOO_<br>
{<br>
<font color="#3333ff">enum </font>{ is_BAR...} kind;<br>
...<br>
<font color="#3333ff">struct</font> BAZ_<br>
enum { is_BAZ... <font color="#cc0000">/* Whoops! The compiler won't
like this! */</font></tt></big><br>
<br>
The best solution is for the programmer to ensure that all label names
are unique and not ignore the BNFC warning.<br>
<br>
</body>
</html>