This is an experimental project to compile Semgrep to JavaScript using js_of_ocaml (a.k.a. jsoo) to compile the OCaml code, and emscripten and WASM to compile the C code generated by tree-sitter, libyaml, and libpcre.
Make sure to have emscripten
and nodejs
installed via your favorite package manager, in addition to normal Semgrep deps.
First run
$ make build-semgrep-jsoo # production release
$ make build-semgrep-jsoo-debug # development release w/ debug symbols and other things
you should get a few MB files in _build/default/js/engine/Main.bc.js
Building with dune build js --profile=release
(instead of the default --profile=dev)
has a significant impact on the size of the generated JS file. You can go
from 110MB to 16MB!
See https://discuss.ocaml.org/t/tutorial-full-stack-web-dev-in-ocaml-w-dream-bonsai-and-graphql/9963/8?u=aryx for more information
After you've built all the OCaml, you must build the WebAssembly modules and package
everything together. The easiest way to do this is by running make
in this folder.
Just type make build
to build and then make test
to run some regression tests.
You can then load this file in your browser and test it here.
You can also load this file in nodejs and test it with:
js % node
Welcome to Node.js v16.19.0.
Type ".help" for more information.
> const engine = await require("./engine/dist").EngineFactory();
undefined
> const lua = await require("./languages/lua/dist").ParserFactory();
undefined
> engine.addParser(lua);
undefined
> engine.execute("lua", "rules.json", __dirname, ["test.lua"])
'{"matches":[{"rule_id":"test","location":{"path":"test.lua","start":{"line":1,"col":1,"offset":0},"end":{"line":1,"col":10,"offset":9}},"extra":{"message":"test","metavars":{"$X":{"start":{"line":1,"col":7,"offset":6},"end":{"line":1,"col":9,"offset":8},"abstract_content":"42"}},"engine_kind":"OSS"}}],"errors":[],"stats":{"okfiles":1,"errorfiles":0},"rules_by_engine":[["test","OSS"]],"engine_requested":"OSS"}'
Turbo mode related code. Called "engine", since it is just the Semgrep engine w/out any parser.
LSP.js related code, specifically code to make it a separate executable.
A way of running a subset of the existing Semgrep test suite, but in JS land.
Code to compile parsers with emcc, and to also as separate js_of_ocaml files so turbo mode isn't a huge file.
Tree sitter, PCRE and YAML C bindings. More emcc stuff
OCaml modules shared between a bunch of different folders
The way we build this stuff is weird, and split into a few stages. Since Semgrep is made up of OCaml and C code (for tree sitter, PCRE, yaml), we can't just transpile ocaml code to JS and use it as is. Additionally, for turbo mode, we separate out parsers from the core semgrep engine, to reduce file size, and so we also must hook those up somehow. Finally, for the language server, we need to wrap it in JS so IO works properly.
This is kicked off by make build-semgrep-jsoo(-debug)
, and will use js_of_ocaml code to transpile OCaml -> JS. Note that any JS files referenced in the (js_of_ocaml (javascript_files))
are pulled in at this point too.
These files are kind of useful, but as mentioned before, we don't have any parsers, or other C libraries, and we need to use those in javascript somehow too.
When you run make
in this folder, it kicks off this stage. It will compile PCRE, YAML, and tree sitter c code to web assembly. Note that the parsers are actually compiled to intermediate objects, instead of
a final js/wasm file, so that way LSP.js can use them too, when it compiles all of it's code into a single wasm file, instead of separating out the parsers. This is done as the language server will always want
all parsers enable, where turbo mode only needs a single parser loaded at one time.
Now we have a bunch of javascript files, and we need to link them together. So at this point we just have another layer of javascript files to connect everything.
For turbo mode and the parsers, we finally use esbuild so that way we can bundle all the js files together instead of having a ton we have to distribute