Skip to content

Latest commit

 

History

History
206 lines (148 loc) · 7 KB

README.md

File metadata and controls

206 lines (148 loc) · 7 KB

semgrep

Homebrew r2c Community Slack

semgrep is a tool for easily detecting and preventing bugs and anti-patterns in your codebase. It combines the convenience of grep with the correctness of syntactical and semantic search. Quickly write rules so you can code with confidence.

Try it now: https://semgrep.live

Overview

Language support:

Python Javascript Go       Java   C         Typescript PHP    
Coming... Coming...

Example patterns:

Pattern Matches
$X == $X if (node.id == node.id): ...
requests.get(..., verify=False, ...) requests.get(url, timeout=3, verify=False)
os.system(...) from os import system; system('echo sgrep')
$ELEMENT.innerHTML el.innerHTML = "<img src='x' onerror='alert(`XSS`)'>";
$TOKEN.SignedString([]byte("...")) ss, err := token.SignedString([]byte("HARDCODED KEY"))

see more example patterns in the semgrep-rules repository

Installation

Install semgrep with Docker:

docker pull returntocorp/sgrep

On OSX, binaries are available via Homebrew:

brew install returntocorp/semgrep/semgrep

Usage

Example Usage

Here is a simple Python example, test.py. We want to retrieve an object by ID:

def get_node(node_id, nodes):
    for node in nodes:
        if node.id == node.id:  # Oops, supposed to be 'node_id'
            return node
    return None

This is a bug. Let's use semgrep to find bugs like it, using a simple search pattern: $X == $X. It will find all places in our code where the left- and right-hand sides of a comparison are the same expression:

$ docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --lang python --pattern '$X == $X' test.py
test.py
rule:python.deadcode.eqeq-is-bad: useless comparison operation `node.id == node.id` or `node.id != node.id`.
3:        if node.id == node.id:  # Oops, supposed to be 'node_id'

r2c-developed Rules

You can use rules developed by r2c to search for issues in your codebase:

cd /path/to/code
docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --config r2c

Custom Rules

You can also create your own rules:

cd /path/to/code
docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --generate-config
docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep

Configuration

For simple patterns use the --lang and --pattern flags. This mode of operation is useful for quickly iterating on a pattern on a single file or folder:

docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --lang javascript --pattern 'eval(...)' path/to/file.js

To fine-tune your searching, specify the --help flag:

docker run --rm returntocorp/sgrep --help

Configuration Files

For advanced configuration use the --config flag. This flag automagically handles a multitude of input configuration types:

  • --config <file|folder|yaml_url|tarball_url|registy_name>

In the absense of this flag, a default configuration is loaded from .sgrep.yml or multiple files matching .sgrep/**/*.yml.

Pattern Features

semgrep patterns make use of two primary features:

  • Metavariables like $X, $WIDGET, or $USERS. Metavariable names can only contain uppercase characters - names like $x or $SOME_VALUE are invalid. Metavariables are used to track a variable across a specific code scope.
  • The ... (ellipsis) operator. The ellipsis operator abstracts away sequences so you don't have to sweat the details of a particular code pattern.

For example,

$FILE = open(...)

will find all occurences in your code where the result of an open() call is assigned to an variable.

Composing Patterns

You can also construct rules by composing multiple patterns together.

Let's consider an example:

rules:
  - id: open-never-closed
    patterns:
      - pattern: $FILE = open(...)
      - pattern-not-inside: |
          $FILE = open(...)
          ...
          $FILE.close()
    message: "file object opened without corresponding close"
    languages: [python]
    severity: ERROR

This rule looks for files that are opened but never closed. It accomplishes this by looking for the open(...) pattern and not a following close() pattern. The $FILE metavariable ensures that the same variable name is used in the open and close calls. The ellipsis operator allows for any arguments to be passed to open and any sequence of code statements in-between the open and close calls. We don't care how open is called or what happens up to a close call, we just need to make sure close is called.

For more information on rule fields like patterns and pattern-not-inside see the configuration documentation.

Equivalences

Equivalences are another key concept in semgrep. semgrep automatically searches for code that is semantically equivalent. For example, the following patterns are semantically equivalent

subprocess.Popen(...)
from subprocess import Popen as sub_popen
result = sub_popen("ls")

For a full list of semgrep feature support by language see the language matrix.

Registry

As mentioned above, you may also specify a registry name as configuration. r2c provides a registry of configuration files. These rules have been tuned on thousands of repositories using our analysis platform.

docker run --rm -v "${PWD}:/home/repo" returntocorp/sgrep --config r2c

Resources

Contribution

semgrep is LGPL-licensed, feel free to help out: CONTRIBUTING.

semgrep is a frontend to a larger program analysis library named pfff, where it was named sgrep. pfff began and was open-sourced at Facebook but is now archived and the primary maintainer works at r2c.