Skip to content

Latest commit

 

History

History
 
 

Coach

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Overview

This workshop is intended to give Data Engineers a level 400 understanding of the Modern Data Warehouse architecture and development skills to build it with Azure Synapse Analytics. First, data engineers will learn how to migrate their SQL Server on-premise workloads to Azure Synapse analytics. Likewise, the workshop will provide the skills and best practices to integrate a Data Lake into the existing data warehouse platform. This will require the existing ETL (SSIS package) be refactored into Azure Data Factory pipelines. Additionally, Modern Data Warehouse platforms are starting to integrate real-time data pipelines to stream clickstream data into the data lake and view it with Azrue Databricks. Lastly, the students will be able to build out a Power BI Data model and tune it and the Synapse platform for optmial performance. This will showcase Synapse Analytics performance with Dashboards.

The format we're following for this is similar to other initiatives like OpenHack and What the Hack. The material is intended to be light on presentations and heavy on hands on experience. The participants will spend the majority of their time working on challenges. The challenges are not designed to be hands on labs, but rather a business problem with success criteria. The focus here is encouraging the participants to think about what they're doing and not just blindly following steps in a lab.

Expected / Suggested Timings

The following is expected timing for a standard delivery.

Topic Duration
Presentation 0: Welcome and Introduction 5 mins
Challenge 0: Environment Setup 30 mins
Presentation 1: Intro to Modern Data Warehouse 30 mins
Challenge 1: Data Warehouse Migration 240 mins
Challenge 2: Data Lake Integration 120 mins
Challenge 3: Data pipeline Migration 240 mins
Challenge 4: Realtime Data Pipelines 120 mins
Challenge 5: Analytics Migration 120 mins

Content

In order to deliver this hack there is a variety of supporting content. This content is indexed below.

Challenges

  1. Data Warehouse Migration
  2. Data Lake Integration
  3. Data Pipeline Migration
  4. Real-time Data pipeline
  5. Analytics migration

Ideas for other Challenges (Kanban Board)

This area is for us to keep a running list of things we would like to incorporate into the Core or Optional challenges. Please contact Jason Virtue (repo owner) if you would like to pick one of these to work on, or want to add a new one yourself. Help and collaboration are always welcome.

  1. Setup incremental loads in SSIS jobs
  2. Deploy job into ADF SSIS Runtime and Catalog
  3. Generate new data and load into Synapase
  4. Deploy Azure Databricks workspace, mount your new storage and enable interactive queries and analytics!
  5. Refactor the T-SQL code in Polybase to leverage Python or Scala
  6. Build out these data pipelines using Azure Mapping Data Flows
  7. Setup external table in Azure Synapse Analytics
  8. Create Power BI report to use clickstream data
  9. Recreate this pipeline using Synapse Spark Pool