Skip to content

Latest commit

 

History

History
 
 

js

Semgrep JS

This is an experimental project to compile Semgrep to JavaScript using js_of_ocaml (a.k.a. jsoo) to compile the OCaml code, and emscripten and WASM to compile the C code generated by tree-sitter, libyaml, and libpcre.

Dependencies

Make sure to have emscripten and nodejs installed via your favorite package manager, in addition to normal Semgrep deps.

Building

First run

$ make build-semgrep-jsoo # production release
$ make build-semgrep-jsoo-debug # development release w/ debug symbols and other things

you should get a few MB files in _build/default/js/engine/Main.bc.js

Building with dune build js --profile=release (instead of the default --profile=dev) has a significant impact on the size of the generated JS file. You can go from 110MB to 16MB! See https://discuss.ocaml.org/t/tutorial-full-stack-web-dev-in-ocaml-w-dream-bonsai-and-graphql/9963/8?u=aryx for more information

After you've built all the OCaml, you must build the WebAssembly modules and package everything together. The easiest way to do this is by running make in this folder.

Just type make build to build and then make test to run some regression tests.

You can then load this file in your browser and test it here.

You can also load this file in nodejs and test it with:

js % node
Welcome to Node.js v16.19.0.
Type ".help" for more information.
> const engine = await require("./engine/dist").EngineFactory();
undefined
> const lua = await require("./languages/lua/dist").ParserFactory();
undefined
> engine.addParser(lua);
undefined
> engine.execute("lua", "rules.json", __dirname, ["test.lua"])
'{"matches":[{"rule_id":"test","location":{"path":"test.lua","start":{"line":1,"col":1,"offset":0},"end":{"line":1,"col":10,"offset":9}},"extra":{"message":"test","metavars":{"$X":{"start":{"line":1,"col":7,"offset":6},"end":{"line":1,"col":9,"offset":8},"abstract_content":"42"}},"engine_kind":"OSS"}}],"errors":[],"stats":{"okfiles":1,"errorfiles":0},"rules_by_engine":[["test","OSS"]],"engine_requested":"OSS"}'

File Structure

engine

Turbo mode related code. Called "engine", since it is just the Semgrep engine w/out any parser.

language_server

LSP.js related code, specifically code to make it a separate executable.

tests

A way of running a subset of the existing Semgrep test suite, but in JS land.

languages

Code to compile parsers with emcc, and to also as separate js_of_ocaml files so turbo mode isn't a huge file.

libpcre, libyaml, tree-sitter

Tree sitter, PCRE and YAML C bindings. More emcc stuff

shared, node_shared

OCaml modules shared between a bunch of different folders

Build structure

The way we build this stuff is weird, and split into a few stages. Since Semgrep is made up of OCaml and C code (for tree sitter, PCRE, yaml), we can't just transpile ocaml code to JS and use it as is. Additionally, for turbo mode, we separate out parsers from the core semgrep engine, to reduce file size, and so we also must hook those up somehow. Finally, for the language server, we need to wrap it in JS so IO works properly.

OCaml stage

This is kicked off by make build-semgrep-jsoo(-debug), and will use js_of_ocaml code to transpile OCaml -> JS. Note that any JS files referenced in the (js_of_ocaml (javascript_files)) are pulled in at this point too.

These files are kind of useful, but as mentioned before, we don't have any parsers, or other C libraries, and we need to use those in javascript somehow too.

EMCC stage

When you run make in this folder, it kicks off this stage. It will compile PCRE, YAML, and tree sitter c code to web assembly. Note that the parsers are actually compiled to intermediate objects, instead of a final js/wasm file, so that way LSP.js can use them too, when it compiles all of it's code into a single wasm file, instead of separating out the parsers. This is done as the language server will always want all parsers enable, where turbo mode only needs a single parser loaded at one time.

Linking

Now we have a bunch of javascript files, and we need to link them together. So at this point we just have another layer of javascript files to connect everything.

ESBuild

For turbo mode and the parsers, we finally use esbuild so that way we can bundle all the js files together instead of having a ton we have to distribute