forked from spitbol/x64
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy patherr.sbl
440 lines (377 loc) · 12.4 KB
/
err.sbl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
-in80
-title err.spt: compress spitbol error messages
-stitle initialization
* Copyright 1987-2012 Robert B. K. Dewar and Mark Emmer.
* Copyright 2012-2015 David Shields
*
* this file is part of macro spitbol.
*
* macro spitbol is free software: you can redistribute it and/or modify
* it under the terms of the gnu general public license as published by
* the free software foundation, either version 2 of the license, or
* (at your option) any later version.
*
* macro spitbol is distributed in the hope that it will be useful,
* but without any warranty; without even the implied warranty of
* merchantability or fitness for a particular purpose. see the
* gnu general public license for more details.
*
* you should have received a copy of the gnu general public license
* along with macro spitbol. if not, see <http://www.gnu.org/licenses/>.
*
*
* err.spt
*
* find a near optimum compression for a spitbol error message file.
*
* spitbol -u "target" -1<input file> -2<output file> err
*
* the variable target is set equal to the name of the target machine.
* usage is optional.
*
* technique:
* the 159 characters from 1 to 31, and 128 to 255 are used to
* represent words and phrases from the error file. the error
* list is then reduced to two character strings:
* 1. an error message string
* 2. a phrase string
*
* within each string, the individual messages or phrases are
* separated by '\0' characters. the nth element of either
* string can be found by scanning and counting '\0's.
*
* when a character with value 1 to 31, or 128 to 255 is encountered,
* it should be mapped to an ordinal number in the range [1-159], and
* the phrase string consulted to obtain the appropriate phrase.
*
* naturally (this being snobol4), phrases may recursively contain
* other code characters.
*
* the problem reduces to finding the 159 phrases that will
* provide the greatest compression. for any phrase p of length s
* that occurs n times in the source error message file, replacing
* it with a single character and adding it to the phrase string
* produces a savings of n*(s-1) - s - 1 characters.
*
* each of the n occurrances of p are replaced by a single character,
* for a savings of n*(s-1) in the error message string. the phrase
* string is lengthened by s + 1 characters (1 for the '\0' at the end).
*
* for a unique phrase (n=1), the savings is -2, that is, there is no
* savings, but rather a net gain of 2 characters.
*
* a sneaky pattern is used to generate all single- and multi-word
* phrases in a particular error message. each phrase indexes a table
* entry that counts the cost savings associated with that phrase.
*
* all error messages are scanned to build this savings table, which
* is then rsorted to obtain the phrase that if chosen, would produce
* the greatest savings. taking that phrase, all occurrances of it
* in all error messages are replaced by a single character, and the
* phrase is added to a phrase array.
*
* the entire process is repeated again, rather than just taking the
* next best entry from the table. this allows new combinations to
* materialize, as a result of having replaced the phrase with a single
* character, and also eliminates the problem of now obsolete table
* entries that partially overlapped the chosen phrase. the process
* is repeated for all 159 entries that can be accommodated in the
* phrase table.
*
* as each phrase is selected, the phrase array is also scanned
* to see any substitutions can be made there.
*
* the program makes considerable use of gimpel's seq function to
* sequence through various arrays.
*
* the output is a series of data statements in generic 68000 assembly
* language. there are two types of pseudo-ops:
* 1. byte to define decimal data bytes
* 2. ascii to define an ascii string
*
* needless to say, it does take some time to grind out the results (22
* minutes with 68k spitbol on a unix pc, 22 hours with snobol4+ on
* a pc/xt).
*
* note: moving the lower case characters (64-95) from the normal
* character set to the special set (thus providing for 191
* phrases instead of 159), only saved 40 bytes in the output
* file, and was deemed unnecessarily restrictive.
*
* concatenating the error and phrase arrays prior to each call
* roduced a less optimal packing, and was discarding
* in favor of just using the error array to pick the next
* phrase for compression.
* set target definition flag, if any.
*
target = (differ(host(0)) host(0), "unix_gas")
output = "target: " target
target break('_') . os '_' rem . asm
* tab character
tab = char(9)
ident(asm,'gas') :s(config.gas)
ident(asm,'nasm') :s(config.nasm)
output = 'unknown target assembler, using gas format'
config.gas
asm.byte = '.byte'
asm.comment = '#'
asm.data = '.data'
asm.delim = '"'
asm.global = '.global'
asm.string = '.ascii'
asm.word = '.word'
:(config.done)
config.nasm
asm.byte = 'db'
asm.comment = ';'
asm.data = 'segment' tab '.data'
asm.delim = '"'
asm.global = 'global'
asm.string = 'db'
config.done
*
byte = tab (ident(asm,'gas') '.byte','db') tab
* setup keywords
*
&anchor = 0
&trim = 1
* &stlimit = -1
&code = 1
* &fullscan = 1 ;* for snobol4+
* define the characters whose codes will be used as special pointers.
*
&alphabet len(1) len(31) . f1 len(96) len(128) . f2
specials = f1 f2
* functions
*
* seq(s,n)
*
* will sequence through a set of statements until failure is detected.
* the indexing variable is given by the name n. from gimpel, "algorithms
* in snobol4."
*
define('seq(arg_s,arg_name)') :(seq_end)
* entry point: initialize indexing variable. then convert arg_s to code.
*
seq $arg_name = 0
arg_s = code(arg_s ' :s(seq_1)f(seq_2)')
+ :f(error)
* increment indexing variable by 1 and spring off to compiled code.
* return will be to seq_1 or seq_2.
*
seq_1 $arg_name = $arg_name + 1 :<arg_s>
* control flows to seq_2 if a fail was detected. if first time through
* fail; otherwise succeed.
*
seq_2 eq($arg_name,1) :s(freturn)f(return)
seq_end
* asc(c)
*
* return the integer code for an ascii character c.
*
define('asc(c)') :(asc_end)
asc &alphabet break(c) @asc :(return)
asc_end
* spread(n)
*
* given n in the range [1-159], spread it out over the ranges [1-31]
* and [128-255].
*
define('spread(spread)') :(spread_end)
spread spread = ge(spread,32) spread + 96 :(return)
spread_end
* shrink(n)
*
* given n in the ranges [1-31] and [128-255], shrink it back to the
* range [1-159].
*
define('shrink(shrink)') :(shrink_end)
shrink shrink = ge(shrink,128) shrink - 96 :(return)
shrink_end
* best(err,n)
*
* select the best phrase from the error message array -- that is,
* the phrase that will provide the greatest savings if replaced
* by a one character code.
*
* technique: build a table of the savings associated with all
* possible phrases in the err array, sort it, and select the best.
* then make all possible replacements, using integer n for the code.
*
define('best(err,n)t,j') :(best_end)
* entry: define a fresh table to hold all the generated phrases.
*
best t = table(1001)
* generate phrases for all error messages, then sort and select best.
*
seq(" generate(err[j],t)", .j)
t = rsort(t,2) :f(freturn)
* abort the caller's sequencing if there is not a net savings possible
* any more.
*
gt(t[1,2],0) :f(freturn)
best = t[1,1]
* now replace the selected phrase in all error and phrase strings.
*
seq(" repl(.err[j],best,n)", .j)
seq(" repl(.phrase[j],best,n)", .j) :(return)
best_end
* generate(s,t)
*
* generate all phrases from a string s, and add their potential
* savings to table t. the first occurence of a phrase produces
* a savings of -2 (net loss). subsequent occurences provide an
* incremental savings of size(phrase) -1.
*
* the one phrase we do not generate from s is the entire
* string s itself, since moving it to the phrase table is
* meaningless. (assuming unique input strings.)
*
* this relies upon fail pattern to generate all the permutations.
* put a trace on variable w to watch the results.
*
define('generate(s,t)w')
gen_pat = fence ("" | (breakx(' ' specials) len(1)))
+ ((breakx(' ' specials) len(1)) | (len(1) rem)) $ w
+ *differ(s,w)
+ *?(t[w] = (ident(t[w]) -2, t[w] + size(w) - 1)) fail
+ :(generate_end)
generate
s ? gen_pat :(return)
generate_end
* repl(x,i,s,n)
*
* replace string s in $x with char(n).
* this must be isolated into a function to prevent seq from
* terminating early when a pattern match fails.
*
define('repl(x,s,n)') :(repl_end)
repl $x ? s = char(n) :f(return)s(repl)
repl_end
* putout(s,i)
*
* output string s as a series of ascii and byte data statements.
* i is the index of the string being compressed.
*
define('putout(s,i)c,n,w')
* pattern to break s into normal and special characters.
*
put_pat = (break(specials) | (len(1) rem)) . w
+ (span(specials) | "") . n
put_one = len(1) . c :(putout_end)
* entry. put out comment line to show what string is being compressed.
putout outfile =
outfile = asm.comment lpad(i,5) ' "' expand(s) '"'
* main loop. get segments of normal and special characters. each segment
* (or both) may be null.
*
put1 s ? put_pat =
* put out the normal characters, if any.
*
outfile = differ(w) tab asm.string tab asm.delim w asm.delim
outsize = outsize + size(w)
* if s is exhausted, append '\0' to special characters, then
* put it out as a series of byte data statements.
*
n = ident(s) n char(0)
outsize = outsize + size(n)
put2 n ? put_one = :f(put3)
outfile = byte asc(c) :(put2)
put3 differ(s) :s(put1)f(return)
putout_end
* expand(s)
*
* recursively expand a string with compressed phrases.
*
define('expand(s)f,c')
ex_pat = (break(specials) . f len(1) . c) |
+ ((len(1) rem) . f "" . c) :(expand_end)
expand s ? ex_pat = :f(return)
expand = ident(c) expand f :s(return)
expand = expand f expand(phrase[shrink(asc(c))])
+ :(expand)
expand_end
* readerr(err)
*
* read in the error message file, building array of messages.
* return maximum error number as function value.
*
define('readerr(a)s,n,msg')
rde_done = fence notany('0123456789')
rde_pat = fence len(3) . n len(1) rem . msg :(readerr_end)
readerr s = infile :f(return)
s ? rde_done :s(return)
s ? rde_pat :f(rderr1)
(differ(a[n]) differ(a[n],msg)) :s(rderr2)
a[n] = msg
readerr = gt(n,readerr) n
insize = insize + 1 + size(msg) :(readerr)
rderr1 output = "bad input line: " s :(freturn)
rderr2 output = "inconsistent re-use of message number " n
output = "first use: " a[n]
output = " re-use: " msg :(freturn)
readerr_end
***************************************************************************
*
* main program
*
*
* open channel 1 on command line for infile, channel 2 for outfile
output = ~input(.infile,1) "no input file" :s(end)
output = ~output(.outfile,2) "no output file" :s(end)
* arrays to hold error messages and phrases
*
in = array(400)
phrase = array(159)
* prepare statistics
*
insize = 0
outsize = 0
* read input file, create err array of exact size to just hold error messages.
*
err = array(readerr(in)) :f(end)
* copy in array to err array, then discard in array.
*
seq(" err[i] = in[i]", .i)
in =
* build phrase array
*
seq(" phrase[i] = best(err, spread(i))", .i)
* putout title and header information
*
outfile = asm.comment ' compressed spitbol error messages '
outfile = asm.comment
outfile = name
outfile = model
outfile = include
outfile = group
outfile = data
outfile =
* putout label and compressed error message table
* osx visible names must start with '_'.
*
* prefix = ident(target,"osx_64") "_"
outfile = tab asm.global tab prefix "errors"
outfile = prefix 'errors:' tab asm.byte tab '0'
seq(" putout(err[i],i)", .i)
* putout label and phrase table
*
outfile =
outfile = tab asm.global tab prefix 'phrases'
outfile = prefix 'phrases:' tab asm.byte tab '0'
seq(" putout(phrase[i],spread(i))", .i)
* finish up the output file
*
outfile =
outfile = enddata
outfile = endstmt
* print statistics
*
output = "input message length = " insize
output = "output string size = " outsize
output = "savings = " insize - outsize
output = "spitbol statements executed " &stcount
* clear error indicator for successful return
*
&code = 0
end