Extension package for dbt inspired by the Great Expectations package for Python. The intent is to allow dbt users to deploy GE-like tests in their data warehouse directly from dbt, vs having to add another integration with their data warehouse.
Include in packages.yml
packages:
- package: calogica/dbt_expectations
version: [">=0.2.0", "<0.3.0"]
# <see https://github.com/calogica/dbt-expectations/releases/latest> for the latest version tag
For latest release, see https://github.com/calogica/dbt-expectations/releases
This package includes a reference to dbt-date which in turn references dbt-utils so there's no need to also import dbt-utils in your local project.
The following variables need to be defined in your dbt_project.yml
file:
vars:
'dbt_date:time_zone': 'America/Los_Angeles'
You may specify any valid timezone string in place of America/Los_Angeles
.
For example, use America/New_York
for East Coast Time.
This project contains integration tests for all test macros in a separate integration_tests
dbt project contained in this repo.
To run the tests:
- You will need a profile called
integration_tests
in~/.dbt/profiles.yml
pointing to a writable database. - Then, from within the
integration_tests
folder, rundbt seed
to loaddata_test.csv
to thetest
schema of your database. - Then run
dbt test
to run the tests specified inintegration_tests/models/schema_tests/schema.yml
Expect the specified column to exist.
Applies to: Column
tests:
- dbt_expectations.expect_column_to_exist
Expect the number of columns in a model to be between two values.
Applies to: Column
tests:
- dbt_expectations.expect_table_column_count_to_be_between:
min_value: 1
max_value: 4
Expect the number of columns in a model to match another model.
Applies to: Model, Seed, Source
models: # or seeds:
- name: my_model
tests:
- dbt_expectations.expect_table_column_count_to_equal_other_table:
compare_model: ref("other_model")
Expect the number of columns in a model to be equal to expected_number_of_columns
.
Applies to: Model, Seed, Source
models: # or seeds:
- name: my_model
tests:
- dbt_expectations.expect_table_column_count_to_equal:
value: 7
Expect the columns to exactly match a specified list.
Applies to: Model, Seed, Source
models: # or seeds:
- name: my_model
tests:
- dbt_expectations.expect_table_columns_to_match_ordered_list:
column_list: ["col_a", "col_b"]
Expect the columns in a model to match a given list.
Applies to: Model, Seed, Source
models: # or seeds:
- name: my_model
tests:
- dbt_expectations.expect_table_columns_to_match_set:
column_list: ["col_a", "col_b"]
Expect the number of rows in a model to be between two values.
Applies to: Model, Seed, Source
models: # or seeds:
- name: my_model
tests:
- dbt_expectations.expect_table_row_count_to_be_between:
min_value: 1
max_value: 4
Expect the number of rows in a model match another model.
Applies to: Model, Seed, Source
models: # or seeds:
- name: my_model
tests:
- dbt_expectations.expect_table_row_count_to_equal_other_table:
compare_model: ref("other_model")
Expect the number of rows in a model to be equal to expected_number_of_rows
.
Applies to: Model, Seed, Source
models: # or seeds:
- name: my_model
tests:
- dbt_expectations.expect_table_row_count_to_equal:
value: 4
Expect each column value to be unique.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_be_unique
Expect column values to not be null.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_not_be_null
Expect column values to be null.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_be_null
Expect a column to contain values of a specified data type.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: date
Expect a column to contain values from a specified type list.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_be_in_type_list:
column_type_list: [date, datetime]
Expect each column value to be in a given set.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_be_in_set:
value_set: ['a','b','c']
Expect each column value to be between two values.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 10
Expect each column value not to be in a given set.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_not_be_in_set:
value_set: ['e','f','g']
Expect column values to be increasing.
If strictly=True, then this expectation is only satisfied if each consecutive value is strictly increasing–equal values are treated as failures.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_be_increasing:
sort_column: date_day
Expect column values to be decreasing.
If strictly=True, then this expectation is only satisfied if each consecutive value is strictly increasing–equal values are treated as failures.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_be_decreasing:
sort_column: col_numeric_a
strictly: false
Expect column entries to be strings with length between a min_value value and a max_value value (inclusive).
Applies to: Column
tests:
- dbt_expectations.expect_column_value_lengths_to_be_between:
min_value: 1
max_value: 4
Expect column entries to be strings with length equal to the provided value.
Applies to: Column
tests:
- dbt_expectations.expect_column_value_lengths_to_equal:
value: 10
Expect column entries to be strings that match a given regular expression. Valid matches can be found anywhere in the string, for example "[at]+" will identify the following strings as expected: "cat", "hat", "aa", "a", and "t", and the following strings as unexpected: "fish", "dog".
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: "[at]+"
Expect column entries to be strings that do NOT match a given regular expression. The regex must not match any portion of the provided string. For example, "[at]+" would identify the following strings as expected: "fish”, "dog”, and the following as unexpected: "cat”, "hat”.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_not_match_regex:
regex: "[at]+"
Expect the column entries to be strings that can be matched to either any of or all of a list of regular expressions. Matches can be anywhere in the string.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_match_regex_list:
regex_list: ["@[^.]*", "&[^.]*"]
Expect the column entries to be strings that do not match any of a list of regular expressions. Matches can be anywhere in the string.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_not_match_regex_list:
regex_list: ["@[^.]*", "&[^.]*"]
Expect column entries to be strings that match a given SQL like
pattern.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_match_like_pattern:
like_pattern: "%@%"
Expect column entries to be strings that do not match a given SQL like
pattern.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_not_match_like_pattern:
like_pattern: "%&%"
Expect the column entries to be strings that match any of a list of SQL like
patterns.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_match_like_pattern_list:
like_pattern_list: ["%@%", "%&%"]
Expect the column entries to be strings that do not match any of a list of SQL like
patterns.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_not_match_like_pattern_list:
like_pattern_list: ["%@%", "%&%"]
Expect the number of distinct column values to be equal to a given value.
Applies to: Column
tests:
- dbt_expectations.expect_column_distinct_count_to_equal:
value: 10
Expect the number of distinct column values to be greater than a given value.
Applies to: Column
tests:
- dbt_expectations.expect_column_distinct_count_to_be_greater_than:
value: 10
Expect the set of distinct column values to be contained by a given set.
Applies to: Column
tests:
- dbt_expectations.expect_column_distinct_values_to_be_in_set:
value_set: ['a','b','c','d']
Expect the set of distinct column values to contain a given set.
In contrast to expect_column_values_to_be_in_set
this ensures not that all column values are members of the given set but that values from the set must be present in the column.
Applies to: Column
tests:
- dbt_expectations.expect_column_distinct_values_to_contain_set:
value_set: ['a','b']
Expect the set of distinct column values to equal a given set.
In contrast to expect_column_distinct_values_to_contain_set
this ensures not only that a certain set of values are present in the column but that these and only these values are present.
Applies to: Column
tests:
- dbt_expectations.expect_column_distinct_values_to_equal_set:
value_set: ['a','b','c']
Expect the column mean to be between a min_value value and a max_value value (inclusive).
Applies to: Column
tests:
- dbt_expectations.expect_column_mean_to_be_between:
min_value: 0
max_value: 2
Expect the column median to be between a min_value value and a max_value value (inclusive).
Applies to: Column
tests:
- dbt_expectations.expect_column_median_to_be_between:
min_value: 0
max_value: 2
Expect specific provided column quantiles to be between provided min_value and max_value values.
Applies to: Column
tests:
- dbt_expectations.expect_column_quantile_values_to_be_between:
quantile: .95
min_value: 0
max_value: 2
Expect the column standard deviation to be between a min_value value and a max_value value. Uses sample standard deviation (normalized by N-1).
Applies to: Column
tests:
- dbt_expectations.expect_column_stdev_to_be_between:
min_value: 0
max_value: 2
Expect the number of unique values to be between a min_value value and a max_value value.
Applies to: Column
tests:
- dbt_expectations.expect_column_unique_value_count_to_be_between:
min_value: 3
max_value: 3
Expect the proportion of unique values to be between a min_value value and a max_value value.
For example, in a column containing [1, 2, 2, 3, 3, 3, 4, 4, 4, 4], there are 4 unique values and 10 total values for a proportion of 0.4.
Applies to: Column
tests:
- dbt_expectations.expect_column_proportion_of_unique_values_to_be_between:
min_value: 0
max_value: .4
Expect the most common value to be within the designated value set
Applies to: Column
tests:
- dbt_expectations.expect_column_most_common_value_to_be_in_set:
value_set: [0.5]
top_n: 1
Expect the column max to be between a min and max value
Applies to: Column
tests:
- dbt_expectations.expect_column_max_to_be_between:
min_value: 1
max_value: 1
Expect the column min to be between a min and max value
Applies to: Column
tests:
- dbt_expectations.expect_column_min_to_be_between:
min_value: 0
max_value: 1
Expect the column to sum to be between a min and max value
Applies to: Column
tests:
- dbt_expectations.expect_column_sum_to_be_between:
min_value: 1
max_value: 2
Expect values in column A to be greater than column B.
Applies to: Model, Seed, Source
tests:
- dbt_expectations.expect_column_pair_values_A_to_be_greater_than_B:
column_A: col_numeric_a
column_B: col_numeric_a
or_equal: True
Expect the values in column A to be the same as column B.
Applies to: Model, Seed, Source
tests:
- dbt_expectations.expect_column_pair_values_to_be_equal:
column_A: col_numeric_a
column_B: col_numeric_a
Expect paired values from columns A and B to belong to a set of valid pairs.
Note: value pairs are expressed as lists within lists
Applies to: Model, Seed, Source
tests:
- dbt_expectations.expect_column_pair_values_to_be_in_set:
column_A: col_numeric_a
column_B: col_numeric_b
value_pairs_set: [[0, 1], [1, 0], [0.5, 0.5], [0.5, 0.5]]
Expect the values for each record to be unique across the columns listed. Note that records can be duplicated.
Applies to: Model, Seed, Source
tests:
- dbt_expectations.expect_select_column_values_to_be_unique_within_record:
column_list: ["col_string_a", "col_string_b"]
ignore_row_if: "any_value_is_missing"
Expects that sum of all rows for a set of columns is equal to a specific value
Applies to: Model, Seed, Source
tests:
- dbt_expectations.expect_multicolumn_sum_to_equal:
column_list: ["col_numeric_a", "col_numeric_b"]
sum_total: 4
Expect that the columns are unique together, e.g. a multi-column primary key.
Applies to: Model, Seed, Source
tests:
- dbt_expectations.expect_compound_columns_to_be_unique:
column_list: ["date_col", "col_string_b"]
ignore_row_if: "any_value_is_missing"
Expects changes in metric values to be within Z sigma away from a moving average, taking the (optionally logged) differences of an aggregated metric value and comparing it to its value N days ago.
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_be_within_n_moving_stdevs:
date_column_name: date
period: day
lookback_periods: 1
trend_periods: 7
test_periods: 14
sigma_threshold: 3
take_logs: true
Expects (optionally grouped & summed) metric values to be within Z sigma away from the column average
Applies to: Column
tests:
- dbt_expectations.expect_column_values_to_be_within_n_stdevs:
group_by: date_day
sigma_threshold: 3