Skip to content

Commit

Permalink
Added partitioning and workaround for sql 2008r2 and cleaned up readm…
Browse files Browse the repository at this point in the history
…e for demos
  • Loading branch information
rgward committed Feb 20, 2019
1 parent c4d6a7e commit cb0c893
Show file tree
Hide file tree
Showing 10 changed files with 65 additions and 35 deletions.
8 changes: 5 additions & 3 deletions demos/sqlserver/polybase/fundamentals/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This folder contains demo scripts to show the basic funcionality of Polybase by examining the configuration of nodes through DMV, creating an external table over HDFS, and monitoring execution details through DMVs.

## Install and Configure Polybase
## Requirements - Install and Configure Polybase

These demos require that you install SQL Server 2019 on Windows Server and configure a head node and at least one compute node (i.e. a scale out group). This demo currently requires SQL Server 2019 CTP 2.3 or higher.

Expand Down Expand Up @@ -32,13 +32,15 @@ I also first ensured that the Windows Firewall was configured for SQL Server and

I then used the sp_polybase_join_group procedure per the documentation on bwpolybase2 and bwpolybase3 to join the scale out group. This required restarting the Polybase services on each machine.

## Check the Polybase configuration
## Demo Steps

### Check the Polybase configuration

1. Run the T-SQL commands in the script **polybase_status.sql** to see configuration of the scale out group and details of the head and compute nodes

2. Use SSMS to browse tables in the DWConfiguration, DWDiagnostics, and DWQueue databases which are installed on all nodes.

## Create an external table and track query and polybase execution
### Create an external table and track query and polybase execution

1. Download and restore the WideWorldImporters backup from https://github.com/Microsoft/sql-server-samples/tree/master/samples/databases/wide-world-importers

Expand Down
2 changes: 1 addition & 1 deletion demos/sqlserver/polybase/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The demos are organized into the following folders

## fundamentals

Use these demos to show basics of Polybase head and compute nodes and the execution lifecycle of an external table to HDFS files
Use these demos to show basics of Polybase head and compute nodes and the execution lifecycle of an external table to HDFS files. Yout must go through this demo to setup Polybase to be able to run the demos in the sqldatahub folder, unless you already have a SQL Server 2019 CTP 2.3 or higher Polybase cluster.

## sqldatahub

Expand Down
6 changes: 5 additions & 1 deletion demos/sqlserver/polybase/sqldatahub/azuredb/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,12 @@

This demo shows you how to setup an Azure SQL Database external data source and table with examples of how to query the data source and join data to local SQL Server 2019 data. The demo assumes you have installed a SQL Server 2019 Polybase scale out group as documented in the **fundamentals** folder of this overall demo.

## Requirements

1. Create a new database in Azure. For purposes of this demo it doesn't matter whether the database is a Managed Instance or any tier of Azure DB. For purposes of this demo I called my database **wwiazure**. To make connectivity easier, I created a new virtual network for my Azure SQL Server and included the polybase head node server, bwpolybase, in the same virtual network as Azure SQL Server. You can read more about how to do this at https://docs.microsoft.com/en-us/azure/sql-database/sql-database-vnet-service-endpoint-rule-overview

2. Connecting to the azure SQL Server hosting your database, I ran the script **createazuredbtable.sql** to create the table and insert some data. Notice the COLLATE clauses I needed to use to match what WWI uses from the example database. The table created for this demo mimics the **Warehouse.StockItem** table in the WideWorldImporters database.

3. On my SQL Server 2019 head node (bwpolybase), I used the **azuredb_external_table.sql** script to create the database scoped credential, externaL data source, external table, and sample SELECT statements to query the external table and join it with local SQL Server 2019 tables in the WideWorldImporters database. Take note of the COLLATE required to match WWI and the syntax for the external data source to point to the azure SQL Server.
## Demo Steps

1. On my SQL Server 2019 head node (bwpolybase), I used the **azuredb_external_table.sql** script to create the database scoped credential, externaL data source, external table, and sample SELECT statements to query the external table and join it with local SQL Server 2019 tables in the WideWorldImporters database. Take note of the COLLATE required to match WWI and the syntax for the external data source to point to the azure SQL Server.
6 changes: 5 additions & 1 deletion demos/sqlserver/polybase/sqldatahub/cosmosdb/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

This demo shows you how to setup a CosmosDB external data source and table with examples of how to query the data source and join data to local SQL Server 2019 data. The demo assumes you have installed a SQL Server 2019 Polybase scale out group as documented in the **fundamentals** folder of this overall demo.

## Requirements

1. Create a new database, collection, and document with CosmosDB in Azure. I used the Azure portal to create a new cosmosdb database in the same resource group as my polybase head node (bwpolybase). When I used the portal to create a new cosmosdb instance, I chose the Azure CosmosDB Database for Mongo API for the API selection. I used the Data Explorer tool from the portal to create my database called WideWorldImporters with a collection called Orders. Then I created a new document with field names and values like the following (Note: the _id field was created by Data Explorer and the id field was a default value already provided by the tool)

{
Expand All @@ -16,4 +18,6 @@ This demo shows you how to setup a CosmosDB external data source and table with
"ExpectedDeliveryDate" : "2018-05-21"
}

3. On my SQL Server 2019 head node (bwpolybase), I used the **cosmosdb_external_table.sql** script to create the database scoped credential, externaL data source, external table, and sample SELECT statements to query the external table and join it with local SQL Server 2019 tables in the WideWorldImporters database. The **Connection String** option from the portal of the instance shows you the username and password to use. It also has HOST and PORT fields which are used to build the LOCATION sytnax for the data source.
## Demo Steps

1. On my SQL Server 2019 head node (bwpolybase), I used the **cosmosdb_external_table.sql** script to create the database scoped credential, externaL data source, external table, and sample SELECT statements to query the external table and join it with local SQL Server 2019 tables in the WideWorldImporters database. The **Connection String** option from the portal of the instance shows you the username and password to use. It also has HOST and PORT fields which are used to build the LOCATION sytnax for the data source.
9 changes: 9 additions & 0 deletions demos/sqlserver/polybase/sqldatahub/hdfs/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# SQL Server demo with Polybase for HDFS

## Requirements

Follow all the instructions in the fundamentals folder which is at the same level as the sqldatahub folder.

## Demos Steps

1. Run all the T-SQL commands in **hdfs_external_table.sql**. You will need to edit the appropriate details to point to your Azure storage container including the credential and location for the data source.
6 changes: 2 additions & 4 deletions demos/sqlserver/polybase/sqldatahub/oracle/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This demo shows you how to setup a Oracle external data source and table with examples of how to query the data source and join data to local SQL Server 2019 data. The demo assumes you have installed a SQL Server 2019 Polybase scale out group as documented in the **fundamentals** folder of this overall demo.

## Installing and setting up Oracle
## Requirements - Installing and setting up Oracle

SQL Server external tables should work with most current Oracle versions (11g+) so for this demo you can choose any Oracle installation or platform you like. For my demo, I used Oracle Express 11g using docker containers on Red Hat Enterpise Linux in an Azure Virtual Machine. The following are the steps and scripts I used to install and Oracle instance using a docker container and create a table to be used for the demo. I created my RHEL VM in Azure in the same resource group (bwsql2019demos) as the polybase head node running SQL Server 2019 on Windows Server. I then on this head node server added an entry in the /etc/hosts file for the RHEL Azure VM private IP address with a name of bworacle so I can use this name when creating an external data source.

Expand Down Expand Up @@ -34,7 +34,7 @@ sqlplus64 gl/glpwd@localhost:49161/xe @createtab.sql

8. I then executed the **insertdata.sql** script finding a valid CustomerTransactionID from the Sales.CustomerTransactions table in the WideWorldImporters database. This ID becomes the arref fields in the accounts receivable table.

## Using the external table in Oracle
## Demo Steps

1. With everything in hand on my Oracle server, I now can use the oracle_external_table.sql script to create the data source and external table.

Expand All @@ -43,5 +43,3 @@ Note the syntax for the LOCATION string for the external table which I was requi
LOCATION='[XE].[GL].[ACCOUNTSRECEIVABLE]'

This script also includes examples to query the table and join together with the [Sales].[CustomerTransactions] table.


11 changes: 6 additions & 5 deletions demos/sqlserver/polybase/sqldatahub/saphana/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ This is a demo to show how to connect to a SAP HANA database using the odbc conn

This demo assumes you have setup a Polybase scale out group as shown in the fundamentals folder above this folder. NOTE: You must be running SQL Server 2019 CTP 2.3 or higher in order to use this SAP HANA demo.

## SAP HANA setup
## Requirements - SAP HANA setup

### SAP HANA Server Setup

You can use an existing SAP HANA system. For purposes of this demo I created an SAP HANA server using the SAP HANA Express Edition template from Azure which you can read more at https://azuremarketplace.microsoft.com/ja-/marketplace/apps/sap.hanaexpress?tab=Overview. This is a SUSE Linux VM with SAP HANA installed. I created this VM in the resource group of the polybase head node, bwsql2019demos, so I would be on the same virtual net. I also took the private IP address of the SAP VM and put this in c:\windows\system32\drivers\etc\hosts as bwsaphana so I could just refer to that hostname on my polybase head node.

Expand All @@ -23,7 +25,7 @@ https://developers.sap.com/tutorials/hxe-ms-azure-marketplace-getting-started.ht

7. Execute **insertdata.sh** which will insert a row into the Customers table. I used a CustomerID of 100000 because this is far bigger than what the WideWorldImporters database would have in its Sales.Customer table set of ID values.

## Install the SAP HANA Driver
### Install the SAP HANA Driver

In order to use the general odbc connector built into SQL Server 2019 you must install the driver to connect to your ODBC data source. Since I am going to show you how to connect to SAP HANA, I found the official 64bit ODBC Driver for SAP HANA called HDBODBC. In order to support a scale out group you need to install this on each of the Polybase nodes. For purposes of this demo, I will only install this on the head node, bwpolybase.

Expand All @@ -42,7 +44,7 @@ The experience was interesting and here our some tips:
- You want your System DSN to use the right port for SAP HANA. The port is based on the instance number and tenant (database) of the SAP HANA Server. The port will always be 3XXYY where XX = instance number and YY = number for the database. The instance number was 90 from the default Azure template install for me. But what about the VANDELAY database? Turns out there is a view called sys_databases.m_services in the SYSTEMDB database which tells you the SQL port for each database. When I ran a query against this view logged in as SYSTEM I found out the port for VANDELAY was 39041. So I used this port in the DSN configuration.
- Since bwsaphana is imy hosts file to point to the private IP of the SAP VM, I can use that as the host server name.

## Using the external table
## Demo Steps

1. All the instructions are in the **sap_external_table.sql** script to create the credential, data source, external table, and query against the remote table.

Expand All @@ -55,5 +57,4 @@ CONNECTION_OPTIONS = 'Driver={HDBODBC};ServerNode=bwsaphana:39041',
PUSHDOWN = ON,
CREDENTIAL = SAPHANACredentials
)
GO

GO
34 changes: 20 additions & 14 deletions demos/sqlserver/polybase/sqldatahub/sql2008r2/justwwi_suppliers.sql
Original file line number Diff line number Diff line change
@@ -1,9 +1,19 @@
USE [JustWorldImporters]
GO

-- TODO: Add in partitions
-- Create a partition function
--
CREATE PARTITION FUNCTION [PF_Supplier_Names](nvarchar(100))
AS RANGE RIGHT FOR VALUES (N'Brooks Brothers', N'Old Suppliers -1', N'Old Suppliers -250000',
N'Old Suppliers -500000', N'Old Suppliers -750000')
GO
-- Create the partition scheme
--
CREATE PARTITION SCHEME [PS_Supplier_Names]
AS PARTITION [PF_Supplier_Names]
TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY])
GO
-- Create the table
--

DROP TABLE [Suppliers]
GO
CREATE TABLE [Suppliers](
Expand Down Expand Up @@ -35,13 +45,13 @@ CREATE TABLE [Suppliers](
[PostalPostalCode] [nvarchar](10) NOT NULL,
[LastEditedBy] [int] NOT NULL
CONSTRAINT [PK_Purchasing_Suppliers] PRIMARY KEY CLUSTERED
(
[SupplierID] ASC
),
CONSTRAINT [UQ_Purchasing_Suppliers_SupplierName] UNIQUE NONCLUSTERED
(
[SupplierName] ASC
)
) ON [PS_Supplier_names]([SupplierName]),
-- CONSTRAINT [UQ_Purchasing_Suppliers_ID] UNIQUE NONCLUSTERED
--(
-- [SupplierID] ASC
--)
)
-- Insert some data
--
Expand All @@ -51,11 +61,9 @@ INSERT INTO [Suppliers]
VALUES (-1, 'Brooks Brothers', 4, -1, -2, 1, 24161, 24161, 'First US Clothing', 'Bank of New York Mellon', 'New York', NULL, '123456789', NULL, 30, '2121111111', '2121112222', 'brooksbrothers.com', '1 Broadway', NULL, '10004', '1 Broadway', NULL, '10004', 1)
GO


-- Let's go insert 1M fake suppliers
-- This is commented out for now due to an issue scanning a large result set against a SQL 2008R2 serverr
--
/* SET NOCOUNT ON
SET NOCOUNT ON
GO
BEGIN TRAN
GO
Expand All @@ -67,11 +75,9 @@ BEGIN
SET @y = 'Old Supplier'+CAST(@X as nvarchar(10))
INSERT INTO Suppliers VALUES (@x, @Y, 4, -1, -2, 1, 24161, 24161, 'Unknown', 'Unknown', 'Unknown', NULL, '123456789', NULL, 0, '2121111111', '2121112222', 'Unknown', 'Unknown', NULL, '00000', 'Unknown', NULL, '00000', 1)
SET @x = @x - 1
--SELECT @x
END
GO
COMMIT TRAN
GO
SET NOCOUNT OFF
GO */

GO
12 changes: 9 additions & 3 deletions demos/sqlserver/polybase/sqldatahub/sql2008r2/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,14 @@

This demo shows you how to setup a SQL Server 2008R2 external data source and table with examples of how to query the data source and join data to local SQL Server 2019 data. The demo assumes you have installed a SQL Server 2019 Polybase scale out group as documented in the **fundamentals** folder of this overall demo.

1. Install SQL Server 2008R2. For my environment, I installed SQL Server 2008R2 on Windows Server 2008R2 using Azure with the gallery template provided on Azure. I put this VM in the same resource group of my SQL Server 2019 Windows Server head node so I would be on the same virtual network. I then added the IP address of my SQL Server 2008R2 server in the /etc/hosts file of my SQL Server 2019 head node (bwpolybase) with the convenient name of bwsql2008r2. I also created a new user called sqluser (password is in the scripts to create the external data source), created a database (defaults) called JustWorldImporters, and made sqluser a dbo of that database.
## Requirements

2. Connecting to the bwsql2008r2 server, I ran the script **justwwi_suppliers.sql** to create the table and insert some data.
Install SQL Server 2008R2. For my environment, I installed SQL Server 2008R2 on Windows Server 2008R2 using Azure with the gallery template provided on Azure. I put this VM in the same resource group of my SQL Server 2019 Windows Server head node so I would be on the same virtual network. I then added the IP address of my SQL Server 2008R2 server in the /etc/hosts file of my SQL Server 2019 head node (bwpolybase) with the convenient name of bwsql2008r2. I also created a new user called sqluser (password is in the scripts to create the external data source), created a database (defaults) called JustWorldImporters, and made sqluser a dbo of that database.

3. On my SQL Server 2019 head node (bwpolybase), I used the **sql2008r2_external_table.sql** script to create the database scoped credential, externaL data source, external table, and sample SELECT statements to query the external table and join it with local SQL Server 2019 tables in the WideWorldImporters database.
## Demo Steps

1. Connecting to the bwsql2008r2 server, I ran the script **justwwi_suppliers.sql** to create the table and insert some data. Notice in this script I used partitions so that when scanning the remote table from Polybase, each compute node will query a specific set of partitions.

2. On my SQL Server 2019 head node (bwpolybase), I used the **sql2008r2_external_table.sql** script to create the database scoped credential, externaL data source, external table, and sample SELECT statements to query the external table and join it with local SQL Server 2019 tables in the WideWorldImporters database.

**BONUS**: Connect to your SQL Server 2008R2 instance and run SQL Profiler. Include the Hostname column in the trace. Notice when you scan the entire remote table all 3 nodes query individual partitions.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ CREATE EXTERNAL DATA SOURCE SQLServerInstance
WITH (
LOCATION = 'sqlserver://bwsql2008r2',
PUSHDOWN = ON,
CREDENTIAL = SQLServerCredentials
CREDENTIAL = SQLServerCredentials, -- This is a workaround for a bug in SQL 2019 CTP 2.3 under investigation.
CONNECTION_OPTIONS = 'UseDefaultEncryptionOptions=false'
)
GO
DROP SCHEMA sqlserver
Expand Down Expand Up @@ -68,8 +69,7 @@ CREATE EXTERNAL TABLE sqlserver.suppliers
GO
CREATE STATISTICS SupplierNameStatistics ON sqlserver.suppliers ([SupplierName]) WITH FULLSCAN
GO

-- Do a quick scan to test it
-- Scan the table to make sure it works
--
SELECT * FROM sqlserver.suppliers
GO
Expand Down

0 comments on commit cb0c893

Please sign in to comment.