-
Notifications
You must be signed in to change notification settings - Fork 24
/
README
545 lines (428 loc) · 22.7 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
AWK library function descriptions
Every function below is fully POSIX compliant, and has been tested on gawk 3
and 4, as well as nawk 20110810 and mawk 1.3.4. Interval notation has not been
used in this library, even though POSIX states that it should be supported,
as most of the current implementations still do not support it.
Note: mawk 1.3.3 is even less POSIX compliant than 1.3.4, and doesn't handle
POSIX character classes in regexes (like [:space:] or [:alpha:]), among other
things. It is currently the standard on ubuntu, and is most likely standard on
other debian-based linux distributions, as well. The functions below are not
guaranteed to work on versions of mawk prior to 1.3.4, although they should not
be too difficult to alter in order to do so.
If you are using gawk, I recommend adding the location of this repo to the
AWKPATH environment variable. This will allow you to only supply the file name
to -f and @include, instead of having to supply the actual path to the library.
The 'examples' directory includes a sample script for each library, with sample
usage of each function. While most of the examples are solely there to give
examples, the "cfold" script is fully functioning and is (in my opinion) rather
useful. It shows just how powerful these libraries can be... most of the script
is just there to parse options. These examples are written with gawk extensions.
Making them POSIX is left as an exercise to the user, if desired.
Most of the functions in this library work by themselves, with the exception of
the functions in sort.awk and the max() and min() functions in strings.awk.
This means that they can easily be copy/pasted into a script, and will function
fine on their own. In the case of sort.awk, the functions that the others depend
on begin with '__', and which functions they go with (as well as which functions
require what) are explained in the comments.
Libraries, and the available functions within:
math.awk
abs(number)
returns the absolute value of "number"
ceil(number)
returns "number" rounded UP to the nearest int
ceiling(multiple, number)
returns "number" rounded UP to the nearest multiple of "multiple". integers
only
floor(multiple, number)
returns "number" rounded DOWN to the nearest multiple of "multiple".
integers only
round(multiple, number)
returns "number" rounded to the nearest multiple of "multiple". integers
only
rint(number)
returns "number" rounded to the nearest integer
change_base(number, start_base, end_base)
converts "number" from "start_base" to "end_base"
bases must be between 2 and 64. the digits greater than 9 are represented
by the lowercase letters, the uppercase letters, @, and _, in that order.
if ibase is less than or equal to 36, lowercase and uppercase letters may
be used interchangeably to represent numbers between 10 and 35. integers
only. returns 0 if any argument is invalid
format_num(number)
adds commas to "number" to make it more readable. for example,
format_num(1000) will return "1,000", and format_num(123456.7890) will
return "123,456.7890". also trims leading zeroes
returns 0 if "number" is not a valid number
str_to_num(string)
examines "string", and returns its numeric value. if "string" begins with a
leading 0, assumes that "string" is an octal number. if "string" begins with
a leading "0x" or "0X", assumes that "string" is a hexadecimal number.
otherwise, decimal is assumed.
isint(string)
returns 1 if "string" is a valid integer, otherwise 0
isnum(string)
returns 1 if "string" is a valid number, otherwise 0
isprime(number)
returns 1 if "number" is a prime number, otherwise 0. "number" must be a
positive integer greater than one
gcd(a, b)
returns the greatest common denominator (greatest common factor) of a and b.
both a and b must be positive integers. uses the recursive euclid algorithm.
lcm(a, b)
returns the least common multiple of a and b. both a and b must be positive
integers.
calc_e()
approximates e by calculating the sumation from k=0 to k=50 of 1/k!
returns 10 decimal places
calc_pi()
returns pi, with an accuracy of 10 decimal places
calc_tau()
returns tau, with an accuracy of 10 decimal places
http://tauday.com/tau-manifesto
deg_to_rad(degrees)
converts degrees to radians
rad_to_deg(radians)
converts radians to degrees
tan(expr)
returns the tangent of expr, which is in radians
csc(expr)
returns the cosecant of expr, which is in radians
sec(expr)
returns the secant of expr, which is in radians
cot(expr)
returns the cotangent of expr, which is in radians
sys.awk
isatty(fd)
Checks if "fd" is open on a tty. Returns 1 if so, 0 if not, and -1 if an
error occurs
mktemp(template [, type])
creates a temporary file or directory, safely, and returns its name.
if template is not a pathname, the file will be created in ENVIRON["TMPDIR"]
if set, otherwise /tmp. the last six characters of template must be "XXXXXX",
and these are replaced with a string that makes the filename unique. type, if
supplied, is either "f", "d", or "u": for file, directory, or dry run (just
returns the name, doesn't create a file), respectively. If template is not
provided, uses "tmp.XXXXXX". Files are created u+rw, and directories u+rwx,
minus umask restrictions. returns -1 if an error occurs.
strings.awk
center(string [, width])
returns "string" centered based on "width". if "width" is not provided (or
is 0), uses the width of the terminal, or 80 if standard output is not open
on a terminal.
note: does not check the length of the string. if it's wider than the
terminal, it will not center lines other than the first. for best results,
combine with fold() (see the "cfold" script in the "examples" directory for
a script that does exactly this!)
delete_arr(array)
deletes every element in "array"
fold(string, sep [, width])
returns "string", wrapped, with lines broken on "sep" to "width" columns.
"sep" is a list of characters to break at, similar to IFS in a POSIX shell.
if "sep" is empty, wraps at exactly "width" characters. if "width" is not
provided (or is 0), uses the width of the terminal, or 80 if standard output
is not open on a terminal.
note: currently, tabs are squeezed to a single space. this will be fixed
shell_esc(string)
returns the string escaped so that it can be used in a shell command
ssub(ere, repl [, in])
behaves like sub, except returns the result and doesn't modify "in".
note: 'ere' must not use /.../ literal regex quoting
sgsub(ere, repl [, in])
behaves like gsub, except returns the result and doesn't modify "in".
note: 'ere' must not use /.../ literal regex quoting
lsub(str, repl [, in])
substites the string "repl" in place of the first instance of "str" in the
string "in" and returns the result. does not modify the original string. if
"in" is not provided, uses $0
glsub(str, repl [, in])
behaves like lsub, except it replaces all occurances of "str"
note: does not work in mawk when 'str' is empty
str_to_arr(string, array)
converts string to an array, one char per element, 1-indexed
returns the array length
extract_range(string, start, stop)
extracts fields "start" through "stop" from "string", based on FS, with the
original field separators intact. returns the extracted fields.
fwidths(width_spec [, string])
extracts substrings from "string" according to "width_spec" from left to
right and assigns them to $1, $2, etc. also assigns the NF variable. if
"string" is not supplied, uses $0. "width_spec" is a space separated list of
numbers that specify field widths, just like GNU awk's FIELDWIDTHS variable.
if there is data left over after the last width_spec, adds it to a final
field. returns the value for NF.
fwidths_arr(width_spec, array [, string])
the behavior is the same as that of fwidths(), except that the values are
assigned to "array", indexed with sequential integers starting with 1.
returns the length. everything else is described in fwidths() above.
lsplit(str, arr, sep)
splits the string "str" into array elements "arr[1]", "arr[2]", .., "arr[n]",
and returns "n". all elements of "arr" are deleted before the split is
performed. the separation is done on the literal string "sep".
ssplit(str, arr, seps [, ere])
similar to GNU awk 4's "seps" functionality for split(). splits the string
"str" into the array "arr" and the separators array "seps" on the regular
expression "ere", and returns the number of fields. the value of "seps[i]"
is the separator that appeared in front of "arr[i+1]". if "ere" is omitted or
empty, FS is used instead. if "ere" is a single space, leading whitespace in
"str" will go into the extra array element "seps[0]" and trailing whitespace
will go into the extra array element "seps[len]", where "len" is the return
value.
note: /regex/ style quoting cannot be used for "ere".
ends_with(string, substring)
returns 1 if "strings" ends with "substring", otherwise 0
trim(string)
returns "string" with leading and trailing whitespace trimmed
rev(string)
returns "string" backwards
max(array [, how ])
returns the maximum value in "array", 0 if the array is empty, or -1 if an
error occurs. the optional string "how" controls the comparison mode.
requires the __mcompare() function.
valid values for "how" are:
"std"
use awk's standard rules for comparison. this is the default
"str"
force comparison as strings
"num"
force a numeric comparison
maxi(array [, how ])
the behavior is the same as that of max(), except that the array indices are
used, not the array values. everything else is explained in max() above.
min(array [, how ])
the behavior is the same as that of max(), except that the minimum value is
returned instead of the maximum. everything else is explained in max() above.
mini(array [, how ])
the behavior is the same as that of min(), except that the array indices are
used instead of the array values. everything else is explained in min() and
max() above.
msort.awk
msort(s, d [, how])
sorts the elements in the array "s" using awk's normal rules for comparing
values, creating a new sorted array "d" indexed with sequential integers
starting with 1. returns the length, or -1 if an error occurs.. leaves the
indices of the source array "s" unchanged. the optional string "how" controls
the direction and the comparison mode. uses the merge sort algorithm, with an
insertion sort when the list size gets small enough. this is not a stable
sort. requires the __compare() and __mergesort() functions.
valid values for "how" are:
"std asc"
use awk's standard rules for comparison, ascending. this is the default
"std desc"
use awk's standard rules for comparison, descending.
"str asc"
force comparison as strings, ascending.
"str desc"
force comparison as strings, descending.
"num asc"
force a numeric comparison, ascending.
"num desc"
force a numeric comparison, descending.
imsort(s [, how])
the bevavior is the same as that of msort(), except that the array "s" is
sorted in-place. the original indices are destroyed and replaced with
sequential integers. everything else is described in msort() above.
msorti(s, d [, how])
the behavior is the same as that of msort(), except that the array indices
are used for sorting, not the array values. when done, the new array is
indexed numerically, and the values are those of the original indices.
everything else is described in msort() above.
imsorti(s [, how])
the bevavior is the same as that of msorti(), except that the array "s" is
sorted in-place. the original indices are destroyed and replaced with
sequential integers. everything else is described in msort() and msorti()
above.
msortv(s, d [, how])
sorts the indices in the array "s" based on the values, creating a new
sorted array "d" indexed with sequential integers starting with 1, and the
values the indices of "s". returns the length, or -1 if an error occurs.
leaves the source array "s" unchanged. the optional string "how" controls
the direction and the comparison mode. uses the merge sort algorithm, with
an insertion sort when the list size gets small enough. this is not a stable
sort. requires the __compare() and __mergesortv() functions. valid values for
"how" are explained in the msort() function above.
qsort.awk
qsort(s, d [, how])
sorts the elements in the array "s" using awk's normal rules for comparing
values, creating a new sorted array "d" indexed with sequential integers
starting with 1. returns the length, or -1 if an error occurs.. leaves the
indices of the source array "s" unchanged. the optional string "how" controls
the direction and the comparison mode. uses the quick sort algorithm, with a
random pivot to avoid worst-case behavior on already sorted arrays. this is
not a stable sort. requires the __compare() and __quicksort() functions.
valid values for "how" are:
"std asc"
use awk's standard rules for comparison, ascending. this is the default
"std desc"
use awk's standard rules for comparison, descending.
"str asc"
force comparison as strings, ascending.
"str desc"
force comparison as strings, descending.
"num asc"
force a numeric comparison, ascending.
"num desc"
force a numeric comparison, descending.
iqsort(s [, how])
the bevavior is the same as that of qsort(), except that the array "s" is
sorted in-place. the original indices are destroyed and replaced with
sequential integers. everything else is described in qsort() above.
qsorti(s, d [, how])
the behavior is the same as that of qsort(), except that the array indices
are used for sorting, not the array values. when done, the new array is
indexed numerically, and the values are those of the original indices.
everything else is described in qsort() above.
iqsorti(s [, how])
the bevavior is the same as that of qsorti(), except that the array "s" is
sorted in-place. the original indices are destroyed and replaced with
sequential integers. everything else is described in qsort() and qsorti()
above.
qsortv(s, d [, how])
sorts the indices in the array "s" based on the values, creating a new
sorted array "d" indexed with sequential integers starting with 1, and the
values the indices of "s". returns the length, or -1 if an error occurs.
leaves the source array "s" unchanged. the optional string "how" controls
the direction and the comparison mode. uses the quicksort algorithm, with a
random pivot to avoid worst-case behavior on already sorted arrays. this is
not a stable sort. requires the __compare() and __vquicksort() functions.
valid values for "how" are explained in the qsort() function above.
psort.awk
psort(s, d, patts, max [, how])
sorts the values of the array "s", based on the rules below. creates a new
sorted array "d" indexed with sequential integers starting with 1. "patts"
is a compact (*non-sparse) 1-indexed array containing regular expressions.
"max" is the length of the "patts" array. returns the length of the "d"
array. valid values for "how" are explained below. uses the quicksort
algorithm, with a random pivot to avoid worst-case behavior on already sorted
arrays. requires the __pcompare() and __pquicksort() functions.
Sorting rules:
- When sorting, values matching an expression in the "patts" array will
take priority over any other values
- Each expression in the "patts" array will have priority in ascending
order by index. "patts[1]" will have priority over "patts[2]" and
"patts[3]", etc
- Values both matching the same regex will be compared as usual
- All non-matching values will be compared as usual
valid values for "how" are:
"std asc"
use awk's standard rules for comparison, ascending. this is the default
"std desc"
use awk's standard rules for comparison, descending.
"str asc"
force comparison as strings, ascending.
"str desc"
force comparison as strings, descending.
"num asc"
force a numeric comparison, ascending.
"num desc"
force a numeric comparison, descending.
ipsort(s, patts, max [, how])
the bevavior is the same as that of psort(), except that the array "s" is
sorted in-place. the original indices are destroyed and replaced with
sequential integers. everything else is described in psort() above.
psorti(s, d, patts, max [, how])
the behavior is the same as that of psort(), except that the array indices
are used for sorting, not the array values. when done, the new array is
indexed numerically, and the values are those of the original indices.
everything else is described in psort() above.
ipsorti(s, patts, max [, how])
the bevavior is the same as that of psorti(), except that the array "s" is
sorted in-place. the original indices are destroyed and replaced with
sequential integers. everything else is described in psort() and psorti()
above.
shuf.awk
shuf(s, d)
shuffles the array "s", creating a new shuffled array "d" indexed with
sequential integers starting with one. returns the length, or -1 if an error
occurs. leaves the indices of the source array "s" unchanged. uses the knuth-
fisher-yates algorithm. requires the __shuffle() function.
ishuf(s)
the behavior is the same as that of shuf(), except the array "s" is sorted
in-place. the original indices are destroyed and replaced with sequential
integers. everything else is described in shuf() above.
shufi(s, d)
the bevavior is the same as that of shuf(), except that the array indices
are shuffled, not the array values. when done, the new array is indexed
numerically, and the values are those of the original indices. everything
else is described in shuf() above.
ishufi(s)
the behavior is tha same as that of shufi(), except that the array "s" is
sorted in-place. the original indices are destroyed and replaced with
sequential integers. everything else is describmed in shuf() and shufi()
above.
csv.awk
create_line(array, max [, sep [, qualifier [, quote_type] ] ])
Generates an output line in quoted CSV format, from the contents of "array"
"array" is expected to be an indexed array (1-indexed). "max" is the highest
index to be used. "sep", if provided, is the field separator. If it is more
than one character, the first character in the string is used. By default,
it is a comma. "qualifier", if provided, is the quote character. Like "sep",
it is one character. The default value is `"'. "quote_type", if provided, is
used to determine how the output fields are quoted. Valid values are given
below. For example, the array: a[1]="foo"; a[2]="bar,quux"; a[3]="blah\"baz"
when called with create_line(a, 3), will return: "foo","bar,quux","blah""baz"
note: expects a non-sparse array. empty or unset values will become
empty fields
Valid values for "quote_type":
"t": Quote all strings, do not quote numbers. This is the default
"a": Quote all fields
"m": Only quote fields with commas or quote characters in them
qsplit(string, array [, sep [, qualifier] ])
a version of split() designed for CSV-like data. splits "string" on "sep"
(,) if not provided, into array[1], array[2], ... array[n]. returns "n", or
"-1 * n" if the line is incomplete (it has an uneven number of quotes). both
"sep" and "qualifier" will use the first character in the provided string.
uses "qualifier" (" if not provided) and ignores "sep" within quoted fields.
doubled qualifiers are considered escaped, and a single qualifier character
is used in its place. for example, foo,"bar,baz""blah",quux will be split as
such: array[1] = "foo"; array[2] = "bar,baz\"blah"; array[3] = "quux";
options.awk
getopts(optstring [, longopt_array ])
parses options, and deletes them from ARGV. "optstring" is of the form
"ab:c". each letter is a possible option. if the letter is followed by a
colon (:), then the option requires an argument. if an argument is not
provided, or an invalid option is given, getopts will print the appropriate
error message and return "?". returns each option as it's read, and -1 when
no options are left. "optind" will be set to the index of the next
non-option argument when finished. "optarg" will be set to the option's
argument, when provided. if not provided, "optarg" will be empty. "optname"
will be set to the current option, as provided. getopts will delete each
option and argument that it successfully reads, so awk will be able to treat
whatever's left as filenames/assignments, as usual. if provided,
"longopt_array" is the name of an associative array that maps long options
to the appropriate short option. (do not include the hyphens on either).
sample usage can be found in the examples dir, with gawk extensions, or in
the ogrep script for a POSIX example: https://github.com/e36freak/ogrep
times.awk
month_to_num(month)
converts human readable month to the decimal representation
returns the number, -1 if the month doesn't exist
day_to_num(day)
converts human readable day to the decimal representation
returns the number, -1 if the day doesn't exist
like date +%w, sunday is 0
hr_to_sec(timestamp)
converts HH:MM:SS to seconds, returns -1 if invalid format
sec_to_hr(seconds)
converts seconds to HH:MM:SS
ms_to_hr(milliseconds)
converts milliseconds to a "time(1)"-similar human readable format, such
as 1m4.356s
add_day_suff(day_of_month)
prepends the appropriate suffix to "day_of_month". for example,
add_day_suff(1) will return "1st", and add_day_suff(22) will return "22nd"
returns -1 if "day_of_month" is not a positive integer
colors.awk
set_cols(array)
sets the following values in "array" with tput. printing them will format
any text afterwards. colors and formats are:
bold - bold text (can be combined with a color)
black - black text
red - red text
green - green text
yellow - yellow text
blue - blue text
magenta - magenta text
cyan - cyan text
white - white text
reset - resets to default settings
You can do whatever you want with this stuff, but a thanks is always appreciated