Skip to content
forked from semgrep/semgrep

Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.

License

Notifications You must be signed in to change notification settings

QPC-github/semgrep

Repository files navigation

sgrep

r2c community slack

sgrep.live-Try it now

sgrep, for syntactical (and occasionnally semantic) grep, is a tool to help find bugs by specifying code patterns using a familiar syntax. The idea is to mix the convenience of grep with the correctness and precision of a compiler frontend.

Quick Examples

patternwill match code like
$X == $Xif (node.id == node.id): ...
foo(kwd1=1, kwd2=2, ...)foo(kwd2=2, kwd1=1, kwd3=3)
subprocess.open(...)import subprocess as s; s.open(['foo'])
see more examples in the sgrep-rules registry

Supported Languages

JavaScript Python Go Java C Ruby Scala
coming coming

Meetups

Want to learn more about sgrep? Check out these slides from the r2c February meetup

Installation

Too lazy to install? Try out sgrep.live for instant access to the full power of sgrep

Docker

sgrep is packaged within a docker container, making installation as easy as installing docker.

Quickstart

docker pull returntocorp/sgrep

cd /path/to/repo
# generate a template config file
docker run --rm -v $(pwd):/home/repo returntocorp/sgrep --generate-config

# look for findings
docker run --rm -v $(pwd):/home/repo returntocorp/sgrep

Usage

Rule Development

To rapidly iterate on a single pattern, you can test on a single file or folder. For example,

docker run --rm -v $(pwd):/home/repo returntocorp/sgrep -l python -e '$X == $X' path/to/file.py

Here, sgrep will search the target with the pattern $X == $X (which is a stupid equals check) and print the results to stdout. This also works for directories and will skip the file if parsing fails. You can specifiy the language of the pattern with --lang javascript for example.

To see more options

docker run --rm -v $(pwd):/home/repo returntocorp/sgrep --help

Config Files

Format

See config.md for example configuration files and details on the syntax.

sgrep Registry

r2c provides a registry of config files tuned using our analysis platform on thousands of repositories. To use:

sgrep --config r2c

Default

Default configs are loaded from .sgrep.yml or multiple files matching .sgrep/**/*.yml and can be overridden by using --config <file|folder|yaml_url|tarball_url|registy_name>

Design

Sgrep has a design philosophy that emphasizes simplicity and a single pattern being as expressive as possible:

  1. Use concrete code syntax: easy to learn
  2. Metavariables ($X): abstract away code
  3. '...' operator: abstract away sequences
  4. Knows about code equivalences: one pattern can match many equivalent variations on the code
  5. Less is more: abstract away additional details

Patterns

Patterns are snippets of code with variables and other operators that will be parsed into an AST for that langauge and will be used to search for that pattern in code. See patterns.md for full documentation.

Metavariables

$X, $FOO, $RETURNCODE are all examples of metavariables. You can referance them later in your pattern and sgrep will ensure they match. Metavariables can only contain uppercase ASCII characters; $x and $SOME_VALUE are not valid metavariables.

Operators

... is the primary "match anything" operator

Equivalences

sgrep automatically searches for code that is semantically equivalent. For example, a pattern for

subprocess.open(...)

will match

from subprocess import open as
 sub_open
result = sub_open(“ls”)

and other semantically equivalent configurations.

Integrations

See integrations.md

Bug Reports

Reports are welcome! Please open an issue on this project.

Contributions

sgrep is LGPL-licensed and we would love your contributions. See docs/development.md

About

Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • OCaml 46.6%
  • Python 18.0%
  • Java 16.6%
  • JavaScript 12.5%
  • Yacc 1.5%
  • Standard ML 1.0%
  • Other 3.8%