This document illustrates some examples and approaches for writing Semgrep rules.
- Auditing Dangerous Function Use
- Enforce Specific Use of an API
- Ensure One Function is Called Before Another
- Find All Routes in an Application
Using Semgrep to audit dangerous function calls is easy.
- Match a function call by name.
- Filter out hardcoded strings.
- Look explicitly for dangerous keyword arguments.
Let's do an example with the subprocess
module in Python.
Match the function call by name. The ellipsis operator ...
abstracts away whole segments of code. Effectively, it says "I don't care about what's in here."
patterns:
- pattern: subprocess.call(...)
import subprocess
import sys
subprocess.call("echo 'hello'") # Matches here
subprocess.call("grep -R {} .".format(sys.argv[1])) # Matches here
subprocess.call("grep -R {} .".format(sys.argv[1]), shell=True) # Matches here
subprocess.call("grep -R {} .".format(sys.argv[1]), shell=True, cwd="/home/user") # Matches here
subprocess.run("grep -R {} .".format(sys.argv[1]), shell=True) # Doesn't match here
Filter out hardcoded strings. The ellipsis operator can be used inside quotes to represent any string literal. We can filter out static strings by using this with the pattern-not
clause.
patterns:
- pattern-not: subprocess.call("...")
- pattern: subprocess.call(...)
import subprocess
import sys
subprocess.call("echo 'hello'") # Doesn't match here anymore!
subprocess.call("grep -R {} .".format(sys.argv[1])) # Matches here
subprocess.call("grep -R {} .".format(sys.argv[1]), shell=True) # Matches here
subprocess.call("grep -R {} .".format(sys.argv[1]), shell=True, cwd="/home/user") # Matches here
subprocess.run("grep -R {} .".format(sys.argv[1]), shell=True) # Doesn't match here
Look explicitly for dangerous keyword arguments. You may want to match only when certain keyword arguments are present. For example, when subprocess.call is passed the keyword argument shell=True, Python won't auto-escape shell metacharacters that are passed in, which, if an attacker has control over the input, may lead to them being able to run arbitrary shell commands.
We can match keyword arguments just like in Python into our pattern. Combined with the ellipsis operator, this pattern will match if shell=True
appears at the end of the sequence of arguments.
patterns:
- pattern-not: subprocess.call("...")
- pattern: subprocess.call(..., shell=True)
import subprocess
import sys
subprocess.call("echo 'hello'") # Doesn't match
subprocess.call("grep -R {} .".format(sys.argv[1])) # Doesn't match here anymore!
subprocess.call("grep -R {} .".format(sys.argv[1]), shell=True) # Matches here
subprocess.call("grep -R {} .".format(sys.argv[1]), shell=True, cwd="/home/user") # Oops! We don't match here anymore either!
subprocess.run("grep -R {} .".format(sys.argv[1]), shell=True) # Doesn't match here
Semgrep will match (..., shell=True)
only when shell=True
is the last argument, but that's not what we want, as shell=True
is dangerous regardless of the order in which it's passed in. We can update our pattern to handle cases where shell=True
is a keyword argument regardless of the order it's passed in by using the ellipsis operator on both sides of shell=True
.
patterns:
- pattern-not: subprocess.call("...")
- pattern: subprocess.call(..., shell=True, ...)
import subprocess
import sys
subprocess.call("echo 'hello'") # Doesn't match
subprocess.call("grep -R {} .".format(sys.argv[1])) # Doesn't match
subprocess.call("grep -R {} .".format(sys.argv[1]), shell=True) # Matches here
subprocess.call("grep -R {} .".format(sys.argv[1]), shell=True, cwd="/home/user") # Matches here too!
subprocess.run("grep -R {} .".format(sys.argv[1]), shell=True) # Doesn't match here
Bonus: Match any subprocess
function with shell=True
. As you probably noticed, subprocess.run
is subject to the same issue as subprocess.call
. subprocess.run
was made available in Python 3.5. We can match both subprocess.call
and subprocess.run
by using metavariables.
Metavariables let you match any code expression. To use metavariables in Semgrep, use the dollar sign as a prefix and all capital letters. In this example, we will use subprocess.$FUNC
. The name can be anything -- it's just a like a variable in a normal language and will "hold" the expression it matches.
To learn more about metavariables, visit the primary documentation.
patterns:
- pattern-not: subprocess.$FUNC("...")
- pattern: subprocess.$FUNC(..., shell=True, ...)
import subprocess
import sys
subprocess.call("echo 'hello'") # Doesn't match
subprocess.call("grep -R {} .".format(sys.argv[1])) # Doesn't match
subprocess.call("grep -R {} .".format(sys.argv[1]), shell=True) # Matches here
subprocess.call("grep -R {} .".format(sys.argv[1]), shell=True, cwd="/home/user") # Matches here
subprocess.run("grep -R {} .".format(sys.argv[1]), shell=True) # Matches here too!
Sometimes you may wish to enforce the specific use of an API. There are many examples of this, such as subprocess.call(..., shell=True, ...)
above; you may wish to match and fail any commit where shell=True
. This is easy to do in Semgrep, as seen in the above section.
However, there are also APIs that are insecure by default--or insecure depending on context, such as Jinja2, which does not enable autoescaping by default. Jinja2 is an arbitrary templating engine, so this makes sense in non-web contexts. This may not be obvious, though, and if you are working directly with the Jinja2 engine in a web context, you want to make sure to include autoescape=True
.
(Not to scare anyone: Flask, for instance, autoescapes templates with the '.html' extension.)
This is an interesting case because we want to enforce the presence of autoescape=True
. Matching this is easy:
patterns:
- pattern: jinja2.Environment(..., autoescape=True, ...)
But what we want is to alert when the opposite conditions are met. Therefore we want to match (1) when autoescape=False
and (2) when autoescape
is not present in the function call at all! This pattern will match when jinja2.Environment()
does not contain autoescape=True
:
patterns:
- pattern-not: jinja2.Environment(..., autoescape=True, ...)
- pattern: jinja2.Environment(...)
The full version of this rule, which filters out additional cases, looks like this: https://semgrep.dev/WADz.
This can be generalized with the following approach:
- Match the function call by name.
- Filter out good patterns.
Another example of this approach is setting secure cookies in Flask.
patterns:
- pattern-not: flask.response.set_cookie(..., httponly=True, secure=True,...)
- pattern: flask.response.set_cookie(...)
You can ensure one function is called before another in Semgrep by utilizing the pattern-not-inside
clause. The approach will be:
- Match the last function call by name.
- Filter out when the right function is called above.
Match the last function call by name. Let's use this Java example from semgrep.dev. We want to make sure verify_transaction
is called before make_transaction
. First, match the function call that should be called last. In this case, it's make_transaction
. (Yes, they're normally called "methods." Stick with me.)
patterns:
- pattern: make_transaction(...);
Filter out when the right function is called above. Next, we can filter out the case where verify_transaction
appears above. To do this, we will use the pattern-not-inside
clause. pattern-not-inside
will filter out ranges, inclusive of the ellipsis operator. The patterns look like this:
patterns:
- pattern-not-inside: |
verify_transaction(...);
...
- pattern: make_transaction(...);
(The pipe (|
) is YAML syntax that permits a multi-line string.)
If I were to describe this pattern in English, it would read: Filter out any matches inside statements after verify_transaction
, otherwise match make_transaction
. Written another way: match make_transaction
only when verify_transaction
is not above.
Bonus: Ensure the same variable is used in both functions. The above patterns has three matches in the given examples. There is a fourth case to match where the wrong Transaction
object is verified. To match only when the same variable is used in both functions, we can use a metavariable. Just like variables, a Semgrep pattern will match only when the metavariables are the same wherever it is used. We can augment the pattern like this to catch the fourth case:
patterns:
- pattern-not-inside: |
verify_transaction($TRANSACTION);
...
- pattern: make_transaction($TRANSACTION);
A real example of this is setting the secure flag on cookies in Java. Cookie
objects are added to HttpServletResponse
objects, and to set the secure flag, setSecure(true)
must be called on the Cookie
object prior to its addition.
Cookie cookie = new Cookie("key", "value");
cookie.setSecure(true);
response.addCookie(cookie);
Matching this is the same approach as make_transaction
above.
- Match
response.addCookie(...)
. - Filter out when
setSecure(true)
is called.
Since we can't know the name of the variables in advance, we can use metavariables for the Cookie
and HttpServletResponse
objects. The patterns look like this:
patterns:
- pattern-not-inside: |
$COOKIE.setSecure(true);
...
- pattern: $RESP.addCookie($COOKIE);
🚧🚧 Coming soon 🚧🚧
- Specify the route pattern (annotations).
- Use metavariables to match specific components.