Skip to content

Latest commit

 

History

History
166 lines (165 loc) · 13.6 KB

ROADMAP.md

File metadata and controls

166 lines (165 loc) · 13.6 KB

ROADMAP

  • peg: precedence climbing: S = num > mul_op > add_op;
  • package: build artifact for nix package.
  • peg: XXX: the better behavior is to puke extra whitespace if rule not advancing. a = b c?;
  • peg: action property can be rolled up.
  • peg: e"error message". this is useful when combining with negative + cut.
  • match: optimization for choices of all literals.
  • peg: left recursion generate a poly fill token
  • peg: runtime rules: INDENT, DEDENT, mustache opener/closer,
  • go: use left recursion for binary operations.
  • tests: more test spec for go, peppa, json, toml.
  • shell: peppa parse can parse multiple files.
  • shell: serialize / deserialize grammar object (cache, make loading grammar faster).
  • shell: add more meta info in peg.
  • shell: resolve config from ../etc/peppa/config.toml, /etc/peppa/config.toml, $HOME/.config/peppa/config.toml, ./config.toml.
  • shell: read PEPPAPATH.
  • package: install peg files to etc.
  • shell: read configuration from a toml file. The toml file provides grammar file and grammar entry conf for different file extensions.
  • shell: generate C code. It may share the AST structure of IR code.
  • shell: generate LLVM IR code.
  • package: add debian ppa.
  • package: add homebrew formula.
  • package: build deb.
  • package: build rpm.
  • code: split to multiple modules and use amalgamation.
  • match: use errno module for some errors, such as oom.
  • api: better error message, report filename, lineno, offset, line, highlight error slice, underline, error type, error code, error hint, error message, termcolor
  • peg: error string rule = (a b c ~ e"LiteralName form of CompositeLiteral shouldn't appear on if/for/switch statement without )]}.") expr. By using error string with cut, a syntax error is raised.
  • shell: peppa parse generates dot diagram.
  • spec: more tests for go grammar.
  • shell: resolve dir support special envvar: $PEPPAPATH, otherwise read from etc, home, cwd.
  • shell: peppa ast support option --language=xxx, which reads data from /etc/peppa/langs.d/xxx.peg, or $HOME/.config/peppa/langs.d/xxx.peg, or $CWD/xxx.peg.
  • shell: peppa test spec.json test a spec json in tinout style.
  • perf: faster spec checks - may use json-c or sorta library to compare the generated data - meanwhile, cache the grammar in memory during the session.
  • spec: @infix @left_assoc @precedence(1) compare_op = "==";, @infix @right_assoc power_op = "^"; rule = Primary (compare_op/power_op) Primary;. https://en.wikipedia.org/wiki/Operator-precedence_parser https://eli.thegreenplace.net/2012/08/02/parsing-expressions-by-precedence-climbing https://docs.rs/peg/0.7.0/peg/
  • spec: JSON as the basic types for node property.
  • spec: explain "greed", "CFG", "lateral backtracking", "vertical backtracking", "not advance in repeat" in spec. https://github.com/norswap/autumn/blob/master/doc/A3-how-autumn-works.md
  • peg: @precedence(5): set precedence for operator when apply @left_recursion.
  • peg: @trie keyword = "key" / "world" / ... Optimize to use trie algorithm.
  • peg: @reserved rule = "key". !("key" / "word") identifier can be optimized to use check registered to P4_Source , rather than checking one by one. the other way around is like process in autumn.
  • peg: support @inside(XXX), @outside(XXX) decorator: it can check if XXX is in the frame stack. can support second number parameter to limit the frame stack backtrack. example: golang CompositeLit should be inside an ExpressionGroup if inside IfStmt,ForStmt,SwitchStmt, otherwise it is ambiguous: Literal = BasicLit / @outside(IfStmt / ForStmt / SwitchStmt) CompositeLit / @inside(IfStmt / ForStmt / SwitchStmt) @inside(ExpressionGroup) CompositeLit / FunctionLit ;.
  • peg: matching parent successfully immediately. example: semicolons in golang, autoreplace U+FFFD to U+0A in markdown. identifier = letter (letter / unicode_digit)* (&newline @inside(Statement) @success(Statement))?;.
  • api: support locale, utf-8, utf-16, utf-32 encoding.
  • peg: support action code -> { ...; ...; @squashed; }: children node will be produced but eventually be squashed.
  • peg: support action code rule = (a b) -> { some action code here } if the action is applied to the whole sequence.
  • peg: support action code ., .[], .[n], .[m:n], .["name"].
  • peg: support action code array = ("[" element* "]") -> { @property value = .[] }.
  • peg: support action code rule = number -> { @property value = . | replace("_", "") | as_int ; }, rule = bool -> { @property value = . | as_bool; }.
  • peg: support action code rule = a:b -> { @override another = b; }; another = " "; this can be used to override some rules in runtime and is useful when implementing Mustache tag set delimiter.
  • api: function P4_NodeEqual(node1, node2, NULL): check text[slice] are the same, children count are the same, each child is the same. Third param check user data.
  • api: function P4_FindNodeChild(node, node_name).
  • api: function P4_FindNodeChildren(node, node_name).
  • peg: support rule = a:b; - P4_Node->rule_name = "a", P4_Node->node_name = "b".
  • man: man page for peppa.
  • peg: support multiple ranges in []: [0-9a-zA-Z].
  • api: jsonify support pretty print (indent, newline).
  • shell: compile peg to LLVM IR/bitcode code (can use LLVM to compile to C, JS).
  • ci: support windows.
  • ci: publish binary to github release.
  • ci: publish ppa.
  • ci: let homebrew users can use the utility.
  • api: operate slice (copy string, get size, etc).
  • api: operate node.
  • api: set source verbose mode.
  • api: until.
  • api: Sanitize \0 to whitespace for the source input, this happens in creating the source/setting the source size. Example: Python Parser.
  • api: support UTF-8 BOM sequence (0xEF 0xBB 0xBF) at the start of source. can be done via unistring: P4_LoadSource, if s startswith these three byte marks, add pseudo slice.
  • api: register a function for matching source. This should help dealing with some inputs difficult to parse.
  • peg: numeric.
  • peg: panic.
  • peg: rule template.
  • peg: sub grammar.
  • peg: left recursive: https://tratt.net/laurie/research/pubs/html/tratt__direct_left_recursive_parsing_expression_grammars/ https://github.com/orlandohill/peg-left-recursion. introduce left_recursion, right_recursion in sequence.
  • peg: describe grammar AST using ADSL-style rules.
  • perf: pre-alloc tokens.
  • perf: trace: add a tracer in P4_Source. When matching, annotate the tracer. An additional tool can aggregate data and output a DOT / compile to png. Similar: https://textx.github.io/Arpeggio/stable/debugging/#parser-debug-mode https://ohmlang.github.io/editor/
  • perf: tracer: https://pegjs.org/documentation https://github.com/orlandohill/peg-left-recursion
  • tests: benchmark: example: json-c. valgrind --tool=massif --massif-out-file=massif.out ./build/tests/test_example_json && ms_print massif.out ms_print.out.
  • match: packrat or not? http://mousepeg.sourceforge.net/index.htm https://github.com/norswap/autumn/blob/master/doc/B4-debugging-tracing.md#memoization
  • tests: add a fuzzy testing framework.
  • script: turn peg grammar into a railroad diagram.
  • shell: p4 ast --language=cmake. https://cmake.org/cmake/help/latest/manual/cmake-language.7.html
  • shell: p4 ast --grammar=python3. https://docs.python.org/3.11/reference/grammar.html
  • shell: p4 ast --grammar=c99: https://github.com/pointlander/peg/blob/master/grammars/c/c.peg
  • shell: p4 ast --grammar=es6: https://www.ecma-international.org/wp-content/uploads/ECMA-262_6th_edition_june_2015.pdf
  • shell: p4 ast --grammar=zig: https://ziglang.org/documentation/master/#Grammar
  • shell: p4 ast --grammar=awk: https://github.com/onetrueawk/awk/blob/master/lex.c https://github.com/onetrueawk/awk/blob/master/awkgram.y
  • shell: p4 ast --grammar=sql: https://tomassetti.me/parsing-sql/
  • peg: support Python-style INDENT rule.
  • api: print grammar and/or rules.
  • refactor: move some variables to frame to reduce function frame size.
  • api: add expect_rule_id, instead of saving errmsg.
  • shell: peppa ast -> peppa parse. v1.16.0
  • perf: only track backrefs if sequence has backref as member. v1.16.0
  • perf: Cache literal len or use better string structure internally. v1.16.0
  • peg: @sibling_to_descdent. can be used to transform cases like toml [key1.key2], key2 should be a child of key1. This can be done via right recursion: key = identifier "." key / identifier; see spec - right recursion. v1.16.0
  • peg: right_recursion. updated right recursion in spec. v1.16.0
  • peg: left_recursion. can be used to transform cases like expression: a = b @left_recursion (minus/plus/mul/div) b, the left side of @left_recursion is operand, the right side is infix and the other operand. Given 1+2+3, it will produce {{1, +, 2}, + 3}. v1.16.0
  • shell: p4 ast --grammar --rule --input. Return json tree in stdout or error in stderr. v1.15.0
  • shell: p4 test spec.yml. The spec contains an array of [name, grammar(/path/to/peg,./path/to/peg,toml,c99,...), rule, input, output, error]. this is done by scripts/check_spec.py. v1.15.0.
  • build: binary cli - shell.c. v1.15.0
  • tests: introduce JSON-based test. v1.15.0
  • peg: optional ICU support - \p{...}. This allows even more choices: Lu, gc=Lu, General_Category=Uppercase_Letter. Wrap the feature in ENABLE_ICU. use libunistring instead of ICU, which is way smaller. v1.15.0
  • peg: we can skip using RuneRanges but save category name like "Co" in Range, which will move the evaluation to the runtime. use libunistring which evaluated in runtime. v1.15.0
  • build: static lib. v1.15.0
  • cmake: make install. v1.15.0
  • peg: range support ID_Start, ID_Continue, Other_ID_Start, Other_ID_Continue. v1.15.0.
  • peg: insensitive back reference. v1.15.0.
  • peg: back reference. v1.15.0
  • tests: enable AddressSanitizer for CLang. v.15.0
  • build: wasm. docker run --rm -v $(pwd):/src -u $(id -u):$(id -g) emscripten/emsdk emcc peppapeg.c -Os -s WASM=1 -s SIDE_MODULE=1 -o /src/peppapeg.wasm. https://gist.github.com/kripken/59c67556dc03bb6d57052fedef1e61ab https://github.com/mbasso/awesome-wasm won't do it.
  • docs: read A parsing machine for PEGs.
  • peg: stop on first error v/s recover from Panic. e.g. cut.
  • api: cut: https://news.ycombinator.com/item?id=20502032. v1.14.0
  • api: no rule id. v1.14.0
  • docs: landing page for the project doc site.
  • bug: eval grammar: when failed, should produce no grammar. Fixed in v1.13.0
  • api: print error messages for Human. Added in v1.13.0.
  • api: support comment in peg grammar. Added in v1.12.0.
  • peg: CharacterSet. can use range.
  • peg: extend range: [0-9..2] / [a-z] / [\p{L}] / [\u{1}-\u{10ffff}]. Added in v1.12.0.
  • peg: built-in rules: letters.
  • peg: built-in rules: unicode letters.
  • peg: built-in rules: digits.
  • peg: built-in rules: unicode digits.
  • api: Support more spaced rules. Added in v1.11.0.
  • api: P4_AcquireSourceAst(source, &ast): set ast, reset source. It's useful when we need the parsed result but not care about source itself. Token tree should now owned by ast and shall then be free by the caller. Added in v1.11.0
  • docs: add explanations.
  • GetErrorString.
  • Reset source so it can be re-parsed. Added in v1.11.0.
  • P4_InspectSourceAst(tok, bool (f) (tok)). Example: https://golang.org/pkg/go/ast/#Inspect. Added in v1.11.0.
  • Allow replacing malloc/free functions. Added in v1.11.0.
  • Custom allocator and destructor. # define P4_MALLOC malloc. Added in v1.11.0.
  • Add tutorials (json, mustache, ini) in docs.
  • Add how-to guides in docs.
  • any: .. Added in v1.9.0.
  • Report line num & col. Added in v1.9.0.
  • Support partial source: P4_SetSourceContentSlice(s, i, j). Added v1.9.0.
  • Allow user setting Userdata for P4_Tokens. Added in v1.8.0.
  • Support non-ascii case insensitive literal matching. Added in v1.8.0.
  • Print token ast in json format. Added in v1.8.0.
  • GetErrorString. Added in v1.7.0.
  • Join. Added in v1.7.0.
  • Start-of-Input & End-of-Input. Added in v1.7.0.
  • Range stride. Added in v1.7.0.
  • Token only have expr id, no expr. Added in v1.6.0.
  • Return NullError for CreatePositive/CreateNegative/... Added in v1.6.0.
  • build & test via github workflow ci. Added in v1.6.0
  • Dynamically change grammar rules. Added in v1.3.0 via Callback.
  • Case insensitive BackReference. Added in v1.5.0.
  • Create docs and push it to gh-pages branch. Added in v1.5.0.
  • Add doxygen references in docs. Added in v1.5.0.
  • Callbacks for P4_Token. Added in v1.4.0.
  • Example JSON: rfc7159. Added in v1.4.0.
  • Auto generate documentation for functions and defined macros. Example: doxygen. Added in v1.4.0.
  • NeedLoosen, NeedSquash, NeedLift should be well-tested.
  • Add depth setter. Added in v1.3.0.
  • Cache squash. Added in v1.3.0.
  • Cache loosen. Added in v1.3.0.
  • Add Valgrind to the tests. Added in v1.3.0.
  • New Expression Kind: BackReference. Added in v1.2.0.
  • Add Flag Set/Unset functions. Added in v1.1.0.
  • INI parser example. Added in v1.1.0.
  • Flag SPACED. Added in v1.0.0.