This repository aims to provide various Databricks tutorials and demos.
If you would like to follow along, check out the Databricks Community Cloud.
The demo is broken into logic sections using the New York City Taxi Tips dataset. Please complete in the following order:
- Send Data to Azure Event Hub (python)
- Read Data from Azure Event Hub (scala)
- Train a Basic Machine Learning Model on Databricks (scala)
- Create new Send Data Notebook
- Make Streaming Predictions
The demo is broken into logic sections. Please complete in the following order:
- Setup Environment
- Data Ingestion
- Bronze Data to Silver Data
- A quick ML Model
- Silver Data To Gold Data
- A Few Cool Features of Delta
- Summary
Using Service Principals to Automate the creation of a Databricks Access Token
This is a lie. Delta Lake does not actually support views but it is a common ask from many clients. Whether views are desired to help enforce row-level security or provide different views of data here are a few ways to get it done.
Batch processing changes within a delta lake is common practice and easy to do. We provide a few examples on how to use the Delta Lake time travel capabilities to get different views on how a table has changed between two versions.
An example of using the Autoloader capabilities for file-based processing. Ensures exactly one-time processing for files.
In this directory I keep a central repository of articles written and helpful resource links with short descriptions.
Below are a number of link with quick descriptions on what they cover.
-
- This blog provides a number of very helpful use cases that can be solved using an upsert operation. The parts I found most interesting were different functionality when it came to the actions available when rows are matched or not matched. Users have the ability to delete rows, updates specific values, insert rows, or update entire rows. The
foreachBatch
function is crucial for CDC operations.
- This blog provides a number of very helpful use cases that can be solved using an upsert operation. The parts I found most interesting were different functionality when it came to the actions available when rows are matched or not matched. Users have the ability to delete rows, updates specific values, insert rows, or update entire rows. The
-
- Python and Scala example completing an upsert with the
foreachBatch
function.
- Python and Scala example completing an upsert with the
-
- Shows various scenarios for updating delta tables via updates, inserts, and deletes.
- There is specific information surrounding schema evolution with the upsert operations, specifically, schema can evolve when using
insertAll
orupdateAll
, but it will not work if you try inserting a row with a column that does not exist yet. - There can be 1, 2, or 3 whenMatched or whenNotMatched clauses. Of these, at most 2 can be whenMatched clauses, and at most 1 can be a whenNotMatched clause.
- There is more specifics about what actions each of these clause can take as well.
- Automatic Schema Evolution
Please feel free to recommend demos or contact me if there are any confusing/broken steps. For any additional comments or questions email me at [email protected].
These examples are not affiliated or purposed to be official documentation for Databricks. For official documentation and tutorials please go to the Databricks Academy or the Databricks blog