This is the standard collection of rules for capa - the tool to automatically identify capabilities of programs.
Rule writing should be easy and fun! A large rule corpus benefits everyone in the community and we encourage all kinds of contributions.
Anytime you see something neat in malware, we want you to think of expressing it in a capa rule. Then, we'll make it as painless as possible to share your rule here and distribute it to the capa users.
capa uses a collection of rules to identify capabilities within a program. These rules are easy to write, even for those new to reverse engineering. By authoring rules, you can extend the capabilities that capa recognizes. In some regards, capa rules are a mixture of the OpenIOC, Yara, and YAML formats.
Here's an example of a capa rule:
rule:
meta:
name: hash data with CRC32
namespace: data-manipulation/checksum/crc32
authors:
- [email protected]
scope: function
mbc:
- Data::Checksum::CRC32 [C0032.001]
examples:
- 2D3EDC218A90F03089CC01715A9F047F:0x403CBD
- 7D28CB106CB54876B2A5C111724A07CD:0x402350 # RtlComputeCrc32
- 7EFF498DE13CC734262F87E6B3EF38AB:0x100084A6
features:
- or:
- and:
- mnemonic: shr
- or:
- number: 0xEDB88320
- bytes: 00 00 00 00 96 30 07 77 2C 61 0E EE BA 51 09 99 19 C4 6D 07 8F F4 6A 70 35 A5 63 E9 A3 95 64 9E = crc32_tab
- number: 8
- characteristic: nzxor
- and:
- number: 0x8320
- number: 0xEDB8
- characteristic: nzxor
- api: RtlComputeCrc32
capa interpets the content of these rules as it inspects executable files. If you follow the guidelines of this rule format, then you can teach capa to identify new capabilities.
The doc/format.md file describes exactly how to construct rules. Please refer to it as you create rules for capa.
The organization of this repository mirrors the namespaces of the rules it contains. capa uses namespaces to group like things together, especially when it renders its final report. Namespaces are hierarchical, so the children of a namespace encodes its specific techniques. In a few words each, the top level namespaces are:
- anti-analysis - packing, obfuscation, anti-X, etc.
- collection - data that may be enumerated and collected for exfiltration
- communication - HTTP, TCP, command and control (C2) traffic, etc.
- compiler - detection of build environments, such as MSVC, Delphi, or AutoIT
- data-manipulation - encryption, hashing, etc.
- executable - characteristics of the executable, such as PE sections or debug info
- host-interaction - access or manipulation of system resources, like processes or the Registry
- impact - end goal
- internal - used internally by capa to guide analysis
- lib - building blocks to create other rules
- linking - detection of dependencies, such as OpenSSL or Zlib
- load-code - runtime load and execution of code, such as embedded PE or shellcode
- malware-family - detection of malware families
- nursery - staging ground for rules that are not quite polished
- persistence - all sorts of ways to maintain access
- runtime - detection of language runtimes, such as the .NET platform or Go
- targeting - special handling of systems, such as ATM machines
We can easily add more top level namespaces as the need arises.
capa supports rules matching other rule matches.
For example, the following rule set describes various methods of persistence.
Note that the rule persistence
matches if either run key
or service
match against a sample.
---
rule:
meta:
name: persistence
features:
or:
- match: run key
- match: service
---
rule:
meta:
name: run key
features:
string: /CurrentVersion\/Run/i
---
rule:
meta:
name: service
features:
api: CreateService
Using this feature, we can capture common logic into "library rules". These rules don't get rendered as results but are used as building blocks to create other rules. For example, there are quite a few ways to write to files on Windows, so the following library rule makes it easy for other rules to thoroughly match file writing.
rule:
meta:
name: write file
lib: True
features:
or:
api: WriteFile
api: fwrite
...
Set rule.meta.lib=True
to declare a lib rule and place the rule file into the lib rule directory.
Library rules should not have a namespace.
Library rules will not be rendered as results.
Capa will only attempt to match lib rules that are referenced by other rules,
so there's no performance overhead for defining many reusable library rules.
The rule nursery is a staging ground for rules that are not quite polished. Nursery rule logic should still be solid, though metadata may be incomplete. For example, rules that miss a public example of the technique.
The rule engine matches regularly on nursery rules. However, our rule linter only enumerates missing rule data, but will not fail the CI build, because its understood that the rule is incomplete.
We encourage contributors to create rules in the nursery, and hope that the community will work to "graduate" the rule once things are acceptable.
Examples of things that would place a rule into the nursery:
- no real-world examples
- missing categorization
- (maybe) questions about fidelity (e.g. RC4 PRNG algorithm)