Skip to content

Latest commit

 

History

History
 
 

5-data-modeling

Data modeling

architecture

What is data modeling?

Once the storage step is done loading enriched events into our storage target of choice, we can start to use that data to build intelligence.

Whilst it is possible to query the events directly, it is often more convenient to transform and aggregate the events into a set of tables that are then used by various data consumers within the business. This has a number of advantages:

  1. It guarantees that all users within the business are using the same basic business logic (e.g. identity stitching and sessionization);
  2. It is easier and faster to run queries against the modeled data
  3. It is possible to connect a BI or pivoting tool directly to the aggregate data

What data models are available?

Web model

The new generation of Snowplow officially-supported SQL data models for working with Snowplow data. There are two ways to run the Web model, depending on your choice of tooling to execute SQL against your database.

SQL-Runner version

dbt version

Mobile model

The first generation of Snowplow officially-supported SQL data models for working with Snowplow mobile data can be found in the dedicated data models GitHub repository. Redshift, BigQuery and Snowflake are supported using SQL-Runner, with a dbt based model to follow soon.

Copyright and license

The data models are copyright 2016-2021 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.