semgrep
is a tool for easily detecting and preventing bugs and anti-patterns in
your codebase. It combines the convenience of grep
with the correctness of
syntactical and semantic search. Quickly write rules so you can code with
confidence.
Try it now: https://semgrep.live
Language support:
Python | Javascript | Go | Java | C | Typescript | PHP |
---|---|---|---|---|---|---|
✅ | ✅ | ✅ | ✅ | ✅ | Coming... | Coming... |
Example patterns:
Pattern | Matches |
---|---|
$X == $X |
if (node.id == node.id): ... |
requests.get(..., verify=False, ...) |
requests.get(url, timeout=3, verify=False) |
os.system(...) |
from os import system; system('echo sgrep') |
$ELEMENT.innerHTML |
el.innerHTML = "<img src='x' onerror='alert(`XSS`)'>"; |
$TOKEN.SignedString([]byte("...")) |
ss, err := token.SignedString([]byte("HARDCODED KEY")) |
→ see more example patterns in the semgrep-rules repository
Install semgrep
with Docker:
docker pull returntocorp/sgrep
On OSX, binaries are available via Homebrew:
brew install returntocorp/semgrep/semgrep
Here is a simple Python example, test.py
. We want to retrieve an object by ID:
def get_node(node_id, nodes):
for node in nodes:
if node.id == node.id: # Oops, supposed to be 'node_id'
return node
return None
This is a bug. Let's use semgrep
to find bugs like it, using a simple search pattern: $X == $X
. It will find all places in our code where the left- and right-hand sides of a comparison are the same expression:
$ docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --lang python --pattern '$X == $X' test.py
test.py
rule:python.deadcode.eqeq-is-bad: useless comparison operation `node.id == node.id` or `node.id != node.id`.
3: if node.id == node.id: # Oops, supposed to be 'node_id'
You can use rules developed by r2c to search for issues in your codebase:
cd /path/to/code
docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --config r2c
You can also create your own rules:
cd /path/to/code
docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --generate-config
docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep
For simple patterns use the --lang
and --pattern
flags. This mode of
operation is useful for quickly iterating on a pattern on a single file or
folder:
docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --lang javascript --pattern 'eval(...)' path/to/file.js
To fine-tune your searching, specify the --help
flag:
docker run --rm returntocorp/sgrep --help
For advanced configuration use the --config
flag. This flag automagically
handles a multitude of input configuration types:
--config <file|folder|yaml_url|tarball_url|registy_name>
In the absense of this flag, a default configuration is loaded from .sgrep.yml
or multiple files matching .sgrep/**/*.yml
.
semgrep
patterns make use of two primary features:
- Metavariables like
$X
,$WIDGET
, or$USERS
. Metavariable names can only contain uppercase characters - names like$x
or$SOME_VALUE
are invalid. Metavariables are used to track a variable across a specific code scope. - The
...
(ellipsis) operator. The ellipsis operator abstracts away sequences so you don't have to sweat the details of a particular code pattern.
For example,
$FILE = open(...)
will find all occurences in your code where the result of an open()
call is assigned
to an variable.
You can also construct rules by composing multiple patterns together.
Let's consider an example:
rules:
- id: open-never-closed
patterns:
- pattern: $FILE = open(...)
- pattern-not-inside: |
$FILE = open(...)
...
$FILE.close()
message: "file object opened without corresponding close"
languages: [python]
severity: ERROR
This rule looks for files that are opened but never closed. It accomplishes
this by looking for the open(...)
pattern and not a following close()
pattern. The $FILE
metavariable ensures that the same variable name is used
in the open
and close
calls. The ellipsis operator allows for any arguments
to be passed to open
and any sequence of code statements in-between the open
and close
calls. We don't care how open
is called or what happens up to
a close
call, we just need to make sure close
is called.
For more information on rule fields like patterns
and pattern-not-inside
see the configuration documentation.
Equivalences are another key concept in semgrep
. semgrep
automatically searches
for code that is semantically equivalent. For example, the following patterns
are semantically equivalent
subprocess.Popen(...)
from subprocess import Popen as sub_popen
result = sub_popen("ls")
For a full list of semgrep
feature support by language see the
language matrix.
As mentioned above, you may also specify a registry name as configuration. r2c provides a registry of configuration files. These rules have been tuned on thousands of repositories using our analysis platform.
docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --config r2c
- r2c
semgrep
meetup slides - Simple configuration documentation
- Advanced configuration documentation
- Integrations
- Development
- Bug reports
semgrep
is LGPL-licensed, feel free to help out: CONTRIBUTING.
semgrep
is a frontend to a larger program analysis library named pfff, where it was named sgrep
. pfff began and was open-sourced at Facebook but is now archived and the primary maintainer works at r2c.