Skip to content

Latest commit

 

History

History
 
 

tools

Sigma Tools

This folder contains libraries and the following command line tools:

  • sigmac: converter between Sigma rules and SIEM queries
  • merge_sigma: Merge Sigma collections into simple Sigma rules.
  • sigma2misp: Import Sigma rules to MISP events.

Sigmac

The Sigmac is one of the most important files, as this is what sets the correct fields that your backend/database will use after being translated from the (original) log source's field names. Please read below to understand how a SIGMAC is constructed. Additionally, see Choosing the Right Sigmac for an idea of which file and command line options (if applicable) that will best suite your environment.

Configuration File

The configuration file contains mappings for the target environments:

  • between generic Sigma field names and those used in the target environment
  • between log source identifiers from Sigma and...
    • ...index names from target
    • ...conditions that should be added to generated expression (e.g. EventLog: Microsoft-Windows-Sysmon) with AND.
  • between placeholders in sigma rules and lists that describe their values in the target environment

The mappings are configured in a YAML file with the following format:

title: short description of configuration
order: numeric value
backends:
  - backend_1
  - backend_2
  - ...
fieldmappings:
  sigma_fieldname_1: target_fieldname   # Simple mapping
  sigma_fieldname_2:                    # Multiple mappings
    - target_fieldname_1
    - target_fieldname_2
  sigma_fieldname_3:                    # Conditional mapping
    field1=value1:
    field2=value2:
      - target_fieldname_1
      - target_fieldname_2
logsources:
  sigma_logsource:
    category: ...
    product: ...
    service: ...
    index:
      - target_indexname1
      - target_indexname2
    conditions:
      field1: value1
      field2: value2
logsourcemerging: and/or
defaultindex: indexname
placeholders:
  name1:
    - value1
    - value2
  name2: value

Metadata

A configuration should contain the following attributes:

  • title: Short description of configuration shown in list printed by converter on request.
  • order: Numeric value that determines allowed order of usage. A configuration B can only be applied after another configuration A if order of B is higher or equal to order of A. The Sigma converter enforces this. Convention:
    • 10: Configurations for generic log sources
    • 20: Backend-specific configuration
  • backends: List of backend names. The configuration can't be used with backends not listed here. Don't define for generic configurations.

Field Mappings

Field mappings in the fieldmappings section map between Sigma field names and field names used in target SIEM systems. There are three types of field mappings:

  • Simple: the source field name corresponds to exactly one target field name given as string. Example: EventID: EventCode for translation of Windows event identifiers between Sigma and Splunk.
  • Multiple: a source field corresponds to a list of target fields. Sigmac generates an OR condition that covers all field names. This can be useful in configuration change and migration scenarios, when field names change. A further use case is when the SIEM normalizes one source field name into different target field names and the exact rules are unknown.
  • Conditional: a source field is translated to one or multiple target field names depending on values from other fields in specific rules. This is useful in scenarios where the SIEM maps the same Sigma field to different target field names depending on the event or log type, like Logpoint.

While simple and multiple mapping type are quite straightforward, conditional mappings require further explanation. The mapping is provided as map where the keys have the following format:

  • field=value: condition that must be fulfilled for execution of the given translation
  • default: mapping that is used if no condition matches.

Sigmac applies conditional mappings as follows:

  1. All conditions are mapped against all field:value pairs of the rule. It merges all pairs into one table and is therefore not able to distinguish between different definitions. Matching mappings are collected in a list.
  2. If the list is empty, the default mapping is used.
  3. The result set of target field name mappings is translated into an OR condition, similar to multiple field mappings. If no mapping could be determined, the Sigma field name is used.

Use the fieldlist backend to determine all field names used by rules. Example:

$ tools/sigmac.py -r -t fieldlist rules/windows/ 2>/dev/null | sort -u
AccessMask
CallTrace
CommandLine
[...]
TicketOptions
Type

Log Source Mappings

Each log source definition must contain at least one category, product or service element that corresponds to the same fields in the logsources part of sigma rules. If more than one field is given, all must match (AND).

The index field can contain a string or a list of strings. They a converted to the target expression language in a way that the rule is searched in all given index patterns.

The conditions part can be used to define field: value conditions if only a subset of the given indices is relevant. All fields are linked with logical AND and the resulting expression is also lined with AND against the expression generated from the sigma rule.

Example: a logstash configuration passes all Windows logs in one index. For Sysmon only events that match *EventLog:"Microsoft-Windows-Sysmon" are relevant. The config looks as follows:

...
logsources:
  sysmon:
    product: sysmon
    index: logstash-windows-*
    conditions:
      EventLog: Microsoft-Windows-Sysmon
...

If multiple log source definitions match, the result is merged from all matching rules. The parameter logsourcemerging determines how conditions are merged. The following methods are supported:

  • and (default): merge all conditions with logical AND.
  • or: merge all conditions with logical OR.

This enables to define logsources hierarchically, e.g.:

logsources:
  windows:
    product: windows
    index: logstash-windows-*
  windows-application:
    product: windows
    service: application
    conditions:
      EventLog: Application
  windows-security:
    product: windows
    service: security
    conditions:
      EventLog: Security

Log source windows configures an index name. Log sources windows-application and windows-security define additional conditions for matching events in the windows indices.

The keyword defaultindex defines one or multiple index patterns that are used if the above calculation doesn't results in at least one index name.

Addition of Target Formats

Addition of a target format is done by development of a backend class. A backend class gets a parse tree as input and must translate parse tree nodes into the target format.

Translation Process

  1. Parsing YAML
  2. Parsing of Condition
  3. Internal representation of condition as parse tree
  4. Attachment of definitions into corresponding parse tree nodes
  5. Translation of field and log source identifiers into target names
  6. Translation of parse tree into target format (backend classes)

Backend Configuration Files

You can also pass backend options from a configuration file, which simplifies the CLI usage.

One can specify both individual backend options (--backend-option) and specify a configuration file as well - in this case, options are merged, and priority is given to the options passed via the CLI.

Sample usages:

# Backend configuration file (here for Elastalert)
$ cat backend_config.yml 
alert_methods: email
emails: [email protected]
smtp_host: smtp.google.com
from_addr: [email protected]
expo_realert_time: 10m

# Rule to compile
$ RULE=rules/windows/builtin/win_susp_sam_dump.yml

# Generate an elastalert rule and take options from the configuration file
$ python3 tools/sigmac $RULE -t elastalert --backend-config backend_config.yml
alert:
- email
description: Detects suspicious SAM dump activity as cause by QuarksPwDump and other
  password dumpers
email:
- [email protected]
filter:
- query:
    query_string:
      query: (EventID:"16" AND "*\\AppData\\Local\\Temp\\SAM\-*.dmp\ *")
from_addr: [email protected]
index: logstash-*
name: SAM-Dump-to-AppData_0
priority: 2
realert:
  minutes: 0
smtp_host: smtp.google.com
type: any

# Override an option from the configuration file via the CLI
$ python3 tools/sigmac $RULE -t elastalert --backend-config backend_config.yml --backend-option smtp_host=smtp.mailgun.com
alert:
- email
description: Detects suspicious SAM dump activity as cause by QuarksPwDump and other
  password dumpers
email:
- [email protected]
filter:
- query:
    query_string:
      query: (EventID:"16" AND "*\\AppData\\Local\\Temp\\SAM\-*.dmp\ *")
from_addr: [email protected]
index: logstash-*
name: SAM-Dump-to-AppData_0
priority: 2
realert:
  minutes: 0
smtp_host: smtp.mailgun.com
type: any

Choosing the right SIGMAC

The section will show you which -c option (the Sigmac) and which --backend-option(s) to use. The rest of SIGMA should be run as normal. For example, run the rest of the command as you normally would, regarding the -t (target backend) and which rule(s) you are performing SIGMA on.

If the target backend/database does not do a lot of field renaming/normalization than the selection of which Sigmac to use is easier to determine. However, this section will help guide you in this decision.

Elasticsearch or ELK

For this backend, there are two very important components. One is the field name and the other is the the way the value for the field name are analyzed AKA searchable in the Elasticsearch database. If you are interested in understand how this is important, you can read more here to understand the impact between keyword types and text types. You have a few different variations of what could be the correct Sigmac to use. Based on the version of Elasticsearch, using ECS or not, using certain Beat's settings enabled or not, and so on.

In order to aide in the decision of the correct Sigmac there are a few quick questions to ask yourself and based on those answers will be which one to use. Please note the answer to each question. It is OK to not know the answer to each question and in fact is very common (that's OK).

  1. What version of Filebeat are you using (you may not be using this at all).
  2. Are you using Elastic Common Schema (ECS)?
  3. What index do your store the log source's data in? Some examples:
    • Window's logs are most likely in winlogbeat-*
    • Linux logs are most likely in filebeat-*
    • Zeek/Bro data is most likely in filebeat-*
    • If you are using logstash, data is most likely in logstash-*
  4. If you are using Filebeat, are you using the module enabled? Here is link showing the description for Windows log Security Channel

Now choose your data source:

Elastic - Zeek (FKA Bro) / Corelight Data

  • Corelight's implementation of ECS: -c tools/config/ecs-zeek-corelight.yml --backend-option keyword_base_fields="*" --backend-option analyzed_sub_field_name=".text" --backend-option keyword_whitelist="event.dataset,source.ip,destination.ip,source.port,destination.port,*bytes*" example of the full command running on all the proxy rules converting to a Kibana (lucene) query: tools/sigmac -t es-qs -c tools/config/ecs-zeek-corelight.yml --backend-option keyword_base_fields="*" --backend-option analyzed_sub_field_name=".text" --backend-option keyword_whitelist="event.dataset,source.ip,destination.ip,source.port,destination.port,*bytes*" rules/proxy/*
  • Filebeat version 7 or higher and or Elastic's implementation: -c tools/config/ecs-zeek-elastic-beats-implementation.yml --backend-option keyword_base_fields="*"
  • Using logstash and NOT using ECS: -c tools/config/logstash-zeek-default-json.yml

Elastic Windows Event Log / Sysmon Data Configurations

index templates

If you are able, because this will be one of the best ways to determine which options to use - run the following command. Take the output from question 3 and replace in the example command winlogbeat with index. You can run this from the CLI against your Elasticsearch instance or from Kibana Dev Tools. You will only need to use the first index template pattern. Look under the section dynamic_templates and then look for strings_as_keyword. Under that section, is there a strings_as_keyword ? If so take note.

curl -XGET "http://127.0.0.1:9200/winlogbeat-*/_mapping/?filter_path=*.mappings.dynamic_templates*,*.index_patterns"

The next question to ask yourself, is do you want easily bypassable queries due to case sensitive searches? Take note of yes/no.

Now lets determine which options and Sigmac to use.

Sigmac's -c option

  1. Using winlogbeat version 6 or less -c tools/config/winlogbeat-old.yml
  2. Using winlogbeat version 7 or higher without modules enabled (answer from question 4) and strings_as_keyword does not contain text -c tools/config/winlogbeat-old.yml
  3. Using winlogbeat version 7 or higher with modules enabled (answer from question 4) -c tools/config/winlogbeat-modules-enabled.yml

Backend options --backend-option You can add the following depending on additional information from your answers/input above.

  1. If you are using ECS, your data is going to winlogbeat-* index, or your default field is a keyword type then add the following to your SIGMA command: --backend-option keyword_field=""

    • If you want to prevent case sensitive bypasses you can add the following to your command: --backend-option case_insensitive_whitelist="*"
    • If you want to prevent case sensitive bypasses but only for certain fields, you can use an option like this: -backend-option keyword_field="" --backend-option case_insensitive_whitelist="*CommandLine*, *ProcessName*, *Image*, process.*, *FileName*, *Path*, *ServiceName*, *ShareName*, file.*, *Directory*, *directory*, *hash*, *Hash*, *Object*, ComputerName, *Subject*, *Target*, *Service*"
  2. If you are using analyzed (text) fields or your index template portion of strings_as_keyword contains text then you can add the following:

    --backend-option keyword_base_fields="*" --backend-option analyzed_sub_field_name=".text"
  3. If you only have some analyzed fields then you would use an example like this:

    --backend-option keyword_base_fields="*" --backend-option analyzed_sub_field_name=".text" --backend-option analyzed_sub_fields="TargetUserName, SourceUserName, TargetHostName, CommandLine, ProcessName, ParentProcessName, ParentImage, Image"
  4. If you only have some analyzed fields then you would use an example like this:

    --backend-option keyword_base_fields="*" --backend-option analyzed_sub_field_name=".text" --backend-option analyzed_sub_fields="TargetUserName, SourceUserName, TargetHostName, CommandLine, ProcessName, ParentProcessName, ParentImage, Image"
  5. Use an analyzed field or different field for queries that contain wildcard(s)

    --backend-option wildcard_use_keyword="false"

Elastic - Some Final Examples

So putting it all together to help show everything from above, here are some "full" examples:

  • base field keyword & no analyzed field w/ case insensitivity (covers elastic 7 with beats/ecs (default)mappings) and using winlogbeat with modules enabled. Also, keeps winlog.channel from making case insensitive as is not necessary (ie: the keyword_whitelist option)
tools/sigmac -t es-qs -c tools/config/winlogbeat-modules-enabled.yml --backend-option keyword_field="" --backend-option case_insensitive_whitelist="*" --backend-option keyword_whitelist="winlog.channel" rules/windows/process_creation/win_office_shell.yml
  • base field keyword & subfield is analyzed(.text) and winlogbeat with modules enabled
tools/sigmac -t es-qs -c tools/config/winlogbeat-modules-enabled.yml --backend-option keyword_base_fields="*" --backend-option analyzed_sub_field_name=".text" rules/windows/process_creation/win_office_shell.yml
  • base field keyword & only some analyzed fields and winlogbeat without modules enabled
tools/sigmac -t es-qs -c tools/config/winlogbeat.yml  --backend-option keyword_base_fields="*" --backend-option analyzed_sub_field_name=".text" --backend-option analyzed_sub_fields="TargetUserName, SourceUserName, TargetHostName, CommandLine, ProcessName, ParentProcessName, ParentImage, Image" rules/windows/process_creation/win_office_shell.yml
  • using beats/ecs Elastic 7 with case insensitive and some .text fields and winlogbeat without modules enabled
tools/sigmac -t es-qs -c tools/config/winlogbeat.yml --backend-option keyword_base_fields="*" --backend-option analyzed_sub_field_name=".text" --backend-option keyword_whitelist="winlog.channel,winlog.event_id" --backend-option case_insensitive_whitelist="*" --backend-option analyzed_sub_fields="TargetUserName, SourceUserName, TargetHostName, CommandLine, ProcessName, ParentProcessName, ParentImage, Image" rules/windows/process_creation/win_office_shell.yml
  • using keyword as a subfield and custom analyzed field as a subfield with winlogbeat mappings
tools/sigmac -t es-qs -c tools/config/winlogbeat.yml --backend-option keyword_field=".keyword" --backend-option analyzed_sub_field_name=".security" rules/windows/sysmon/sysmon_wmi_susp_scripting.yml

Devo

Devo backend admits several configurations that, based on the data source type, will apply a specific mapping and will point to the proper Devo table. The current available configurations are:

  • devo-windows, for windows sources
  • devo-web, for generic web sources (webserver, apache, proxy...)
  • devo-network, for generic network sources (firewall, dns...)

These backend configurations will specify the Devo table to build the query upon, and the output query will reference such table if the rule sources matches the configuration sources.

For example, in order to translate a windows-related Sigma rule, one would use:

tools/sigmac -t devo -c tools/config/devo-windows.yml rules/windows/sysmon/sysmon_wmi_susp_scripting.yml