Skip to content

Commit 687985e

Browse files
authored
Fix illegal chars, add some info, other editing
I am fixing 4 illegal characters (> 0x7F): curly apostrophes, n-dash. . My attention was brought to this article by my scan, for proper Versioning, of all articles that have been directly updated in upstream branch 'release-sqlseattle'. . I was surprised to see stub articles in our formal release-sql seattle branch, which will be going live in a few weeks. It might be cleaner to have stub and other incomplete articles instead in a non-release upstream branch, until they are completed and are thus ready for release-sqlseattle. Perhaps opinions vary on the approach. When I encountered the stubs, I decided to look through more PolyBase .md files, which led me to this file. . I could find no reason why this article needs to be updated directly in release-sqlseattle, or in other words, why it would need the new Versioning moniker 'sql-server-ver15'. . But when my attention was brought to this article, I could not help but notice a few minor editing issues. For example, the first mention of Transact-SQL must be spelled out that way, and only then can later occurrences abbreviate to T-SQL. And the occurrence here of lowercase t-sql is never proper. I notice a couple other minor issues, like the lack of any space between the end of one sentence and the start of the next. Also, the Microsoft style guide explicitly forbids the slash construct of "import/export", partly because the construct harms machine translation into non-English languages. . I have been curious about PolyBase, and felt that parts of the existing draft were not quite giving me all the short introductory info that I wanted. I did a little research elsewhere, and felt compelled to add a little info to the existing draft. But I found it kinda necessary to rephrase or rewrite a paragraph or two to make the added info integrate elegantly. I eventually found myself doing more rewriting that I had planned. . So my ask is that you review this PR. Then in a Comment in the PR, @ mention me to say if you want me to Close and discard this PR, or for me to #sign-off and Merge it (into 'master'), or whatever else. . Thank you Matthew. GeneMi (MightyPen)
1 parent 959ccef commit 687985e

File tree

1 file changed

+65
-45
lines changed

1 file changed

+65
-45
lines changed
Lines changed: 65 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "PolyBase Guide | Microsoft Docs"
3-
ms.date: "05/30/2017"
3+
ms.date: "05/31/2017"
44
ms.prod: sql
55
ms.reviewer: ""
66
ms.suite: "sql"
@@ -14,7 +14,7 @@ f1_keywords:
1414
helpviewer_keywords:
1515
- "PolyBase"
1616
- "PolyBase, overview"
17-
- "Hadoop import ×"
17+
- "Hadoop import"
1818
- "Hadoop export"
1919
- "Hadoop export, PolyBase overview"
2020
- "Hadoop import, PolyBase overview"
@@ -23,50 +23,70 @@ ms.author: jroth
2323
manager: craigg
2424
---
2525
# PolyBase Guide
26+
2627
[!INCLUDE[appliesto-ss-xxxx-asdw-pdw-md](../../includes/appliesto-ss-xxxx-asdw-pdw-md.md)]
27-
PolyBase is a technology that accesses data outside of the database via the t-sql language. In SQL Server 2016, it allows you to run queries on external data in Hadoop or to import/export data from Azure Blob Storage. Queries are optimized to push computation to Hadoop. In Azure SQL Data Warehouse, you can import/export data from Azure Blob Storage and Azure Data Lake Store.
28-
29-
30-
To use PolyBase, see [Get started with PolyBase](../../relational-databases/polybase/get-started-with-polybase.md).
31-
32-
![PolyBase logical](../../relational-databases/polybase/media/polybase-logical.png "PolyBase logical")
33-
34-
## Why use PolyBase?
35-
To make good decisions, you want to analyze both relational data and other data that is not structured into tables —notably Hadoop. This is difficult to do unless you have a way to transfer data among the different types of data stores. PolyBase bridges this gap by operating on data that is external to SQL Server.
36-
37-
To keep it simple, PolyBase does not require you to install additional software to your Hadoop environment. Querying external data uses the same syntax as querying a database table. This all happens transparently. PolyBase handles all the details behind-the-scenes, and no knowledge about Hadoop is required by the end user to query external tables.
38-
39-
PolyBase can:
40-
41-
- **Query data stored in Hadoop from SQL Server or PDW.** Users are storing data in cost-effective distributed and scalable systems, such as Hadoop. PolyBase makes it easy to query the data by using T-SQL.
42-
43-
- **Query data stored in Azure Blob Storage.** Azure blob storage is a convenient place to store data for use by Azure services. PolyBase makes it easy to access the data by using T-SQL.
44-
45-
- **Import data from Hadoop, Azure Blob Storage, or Azure Data Lake Store** Leverage the speed of Microsoft SQL's columnstore technology and analysis capabilities by importing data from Hadoop, Azure Blob Storage, or Azure Data Lake Store into relational tables. There is no need for a separate ETL or import tool.
4628

47-
- **Export data to Hadoop, Azure Blob Storage, or Azure Data Lake Store.** Archive data to Hadoop, Azure Blob Storage, or Azure Data Lake Store to achieve cost-effective storage and keep it online for easy access.
48-
49-
- **Integrate with BI tools.** Use PolyBase with Microsoft’s business intelligence and analysis stack, or use any third party tools that are compatible with SQL Server.
50-
51-
## Performance
52-
53-
- **Push computation to Hadoop.**The query optimizer makes a cost-based decision to push computation to Hadoop when doing so will improve query performance. It uses statistics on external tables to make the cost-based decision. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources.
54-
55-
- **Scale compute resources.** To improve query performance, you can use SQL Server [PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md). This enables parallel data transfer between SQL Server instances and Hadoop nodes, and it adds compute resources for operating on the external data.
56-
57-
## PolyBase Guide Topics
58-
This guide includes topics to help you use PolyBase efficiently and effectively.
59-
60-
|||
61-
|-|-|
62-
|**Topic**|**Description**|
63-
|[Get started with PolyBase](../../relational-databases/polybase/get-started-with-polybase.md)|Basic steps to install and configure PolyBase. This shows how to create external objects that point to data in Hadoop or Azure blob storage, and gives query examples.|
64-
|[PolyBase Versioned Feature Summary](../../relational-databases/polybase/polybase-versioned-feature-summary.md)|Describes which PolyBase features are supported on SQL Server, SQL Database, and SQL Data Warehouse.|
65-
|[PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md)|Scale out parallelism between SQL Server and Hadoop by using SQL Server scale-out groups.|
66-
|[PolyBase installation](../../relational-databases/polybase/polybase-installation.md)|Reference and steps for installing PolyBase with the installation wizard or with a command-line tool.|
67-
|[PolyBase configuration](../../relational-databases/polybase/polybase-configuration.md)|Configure SQL Server settings for PolyBase. For example, configure computation pushdown and kerberos security.|
68-
|[PolyBase T-SQL objects](../../relational-databases/polybase/polybase-t-sql-objects.md)|Create the T-SQL objects that PolyBase uses to define and access external data.|
69-
|[PolyBase Queries](../../relational-databases/polybase/polybase-queries.md)|Use T-SQL statements to query, import, or export external data.|
70-
|[PolyBase troubleshooting](../../relational-databases/polybase/polybase-troubleshooting.md)|Techniques to manage PolyBase queries. Use dynamic management views (DMVs) to monitor PolyBase queries, and learn to read a PolyBase query plan to find performance bottlenecks.|
29+
PolyBase enables your SQL Server 2016 instance to process Transact-SQL queries that read data from Hadoop. The same query can also access relational tables in your SQL Server. PolyBase enables the same query to also join the data from Hadoop and SQL Server. In SQL Server, an [external table[(../../t-sql/statements/create-external-table-transact-sql.md) or [external data source](../../t-sql/statements/create-external-data-source-transact-sql.md) provides the connection to Hadoop.
30+
31+
PolyBase provides these same functionalities for the following SQL products from Microsoft:
32+
33+
- SQL Server 2016, and later verions
34+
- Analytics Platform System (formerly Parallel Data Warehouse)
35+
- Azure SQL Data Warehouse
36+
37+
PolyBase pushes some computations to the Hadoop node to optimize the overall query. However, PolyBase external access is not limited to Hadoop. Other unstructured non-relational tables are also supported, such as delimited text files.
38+
39+
#### Data import and export
40+
41+
With the underlying help of PolyBase, T-SQL queries can also import and export data from Azure Blob Storage. Further, PolyBase enables Azure SQL Data Warehouse to import and export data from Azure Data Lake Store, and from Azure Blob Storage.
42+
43+
To use PolyBase, see [Get started with PolyBase](../../relational-databases/polybase/get-started-with-polybase.md).
7144

45+
![PolyBase logical](../../relational-databases/polybase/media/polybase-logical.png "PolyBase logical")
46+
47+
## Why use PolyBase?
48+
49+
In the past it was more difficult to join your SQL Server data with external data. You had the two following unpleasant options:
50+
51+
- Transfer half your data so that all your data was in one format or the other.
52+
- Query both sources of data, then write custom query logic to join and integrate the data at the client level.
53+
54+
PolyBase avoids those unpleasant options by using T-SQL to join the data
55+
56+
To keep things simple, PolyBase does not require you to install additional software to your Hadoop environment. You query external data by using the same T-SQL syntax used to query a database table. The support actions implemented by PolyBase all happen transparently. The query author does not need any knowledge about Hadoop.
57+
58+
PolyBase can:
59+
60+
- **Query data stored in Hadoop from SQL Server or PDW.** Users are storing data in cost-effective distributed and scalable systems, such as Hadoop. PolyBase makes it easy to query the data by using T-SQL.
61+
62+
- **Query data stored in Azure Blob Storage.** Azure blob storage is a convenient place to store data for use by Azure services. PolyBase makes it easy to access the data by using T-SQL.
63+
64+
- **Import data from Hadoop, Azure Blob Storage, or Azure Data Lake Store.** Leverage the speed of Microsoft SQL's columnstore technology and analysis capabilities by importing data from Hadoop, Azure Blob Storage, or Azure Data Lake Store into relational tables. There is no need for a separate ETL or import tool.
65+
66+
- **Export data to Hadoop, Azure Blob Storage, or Azure Data Lake Store.** Archive data to Hadoop, Azure Blob Storage, or Azure Data Lake Store to achieve cost-effective storage and keep it online for easy access.
67+
68+
- **Integrate with BI tools.** Use PolyBase with Microsoft's business intelligence and analysis stack, or use any third party tools that are compatible with SQL Server.
69+
70+
## Performance
71+
72+
- **Push computation to Hadoop.** The query optimizer makes a cost-based decision to push computation to Hadoop when doing so will improve query performance. It uses statistics on external tables to make the cost-based decision. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources.
73+
74+
- **Scale compute resources.** To improve query performance, you can use SQL Server [PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md). This enables parallel data transfer between SQL Server instances and Hadoop nodes, and it adds compute resources for operating on the external data.
75+
76+
## PolyBase Guide Topics
77+
78+
This guide includes topics to help you use PolyBase efficiently and effectively.
79+
80+
|||
81+
|-|-|
82+
|**Topic**|**Description**|
83+
|[Get started with PolyBase](../../relational-databases/polybase/get-started-with-polybase.md)|Basic steps to install and configure PolyBase. This shows how to create external objects that point to data in Hadoop or Azure blob storage, and gives query examples.|
84+
|[PolyBase Versioned Feature Summary](../../relational-databases/polybase/polybase-versioned-feature-summary.md)|Describes which PolyBase features are supported on SQL Server, SQL Database, and SQL Data Warehouse.|
85+
|[PolyBase scale-out groups](../../relational-databases/polybase/polybase-scale-out-groups.md)|Scale out parallelism between SQL Server and Hadoop by using SQL Server scale-out groups.|
86+
|[PolyBase installation](../../relational-databases/polybase/polybase-installation.md)|Reference and steps for installing PolyBase with the installation wizard or with a command-line tool.|
87+
|[PolyBase configuration](../../relational-databases/polybase/polybase-configuration.md)|Configure SQL Server settings for PolyBase. For example, configure computation pushdown and kerberos security.|
88+
|[PolyBase T-SQL objects](../../relational-databases/polybase/polybase-t-sql-objects.md)|Create the T-SQL objects that PolyBase uses to define and access external data.|
89+
|[PolyBase Queries](../../relational-databases/polybase/polybase-queries.md)|Use T-SQL statements to query, import, or export external data.|
90+
|[PolyBase troubleshooting](../../relational-databases/polybase/polybase-troubleshooting.md)|Techniques to manage PolyBase queries. Use dynamic management views (DMVs) to monitor PolyBase queries, and learn to read a PolyBase query plan to find performance bottlenecks.|
91+
|   |   |
7292

0 commit comments

Comments
 (0)