Load External Tables Using PolyBase
Hello,
In this blog, we will discuss one more interesting, easy-to-use, and most useful feature of SQL Server. i.e., PolyBase.
Many of us
would be aware of this feature. But if it is new for you, then this POC
would be helpful to understand the same.
Here, we
will discuss PolyBase and will see how we can configure the same to query
external data.
What is PolyBase?
PolyBase is a data virtualization feature for SQL Server.
PolyBase enables your SQL Server instance to query data with T-SQL directly from External sources like SQL Server, Oracle, Teradata, MongoDB, Hadoop clusters, Cosmos DB without separately installing client connection software.
We may have questions like we can configure Linked server as well to query external data. Right! But there are few differences between Polybase and linked server which is well explained here.
The Differences are,
PolyBase |
Linked Servers |
Database scoped object |
Instance scoped object |
Uses ODBC drivers |
Uses OLEDB providers |
Supports read-only operations for
all data sources and insert operation for HADOOP & data pool data source
only |
Supports both read and write operations |
Queries to remote data source from
a single connection can be scaled-out |
Queries to remote data source from
a single connection cannot be scaled-out |
No separate configuration needed
for availability group |
Separate configuration needed for each
instance in availability group |
Basic authentication only |
Basic & integrated
authentication |
Suitable for analytic queries
processing large number of rows |
Suitable for OLTP queries returning
single or few rows |
Queries using external table cannot
participate in distributed transaction |
Distributed queries can participate
in distributed transaction |
This is all
about basic Ideas about PolyBase.
Now we will
see how we can configure external data source using the same.
In order to load External tables using PolyBase, just remember below easy steps.
1) Create Master Key (If it is not available)
2) Create Database Scoped Credentials (Credentials for external data source)
3) Create External Data Source
4) Create External Table
Let’s start with our POC.
Scenario:
There are
two SQL databases names Fireworks and Flameworks.
Fireworks
database is on Instance1 and Flameworks database is on instance2.
We got a requirement to load tables [Flameworks].[dbo].[Flame_Color] into Fireworks database.
Let’s follow the below steps to complete the requirement
Make sure, PolyBase feature is installed and SQL Server PolyBase Engine and SQL Server PolyBase Data Movement service is running on Instance1.
USE [Fireworks]
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'PolyBase147#';
Note:
-- Query sys.symmetric_keys to check the existing database master key.
-- If master key is already present then use the same. If not then create the new one.
-- As soon as you create the master key make sure you take backup of the same for the safe side
Create Database Scoped Credentials
USE [Fireworks]
CREATE DATABASE SCOPED CREDENTIAL External_Credentials
WITH
IDENTITY = 'Superman',
SECRET = 'SaveTheWorld@2021’;
This
credential would be used to authenticate external data sources. Make sure user used
for database scoped credentials should have required access to the target database.
Create
External Data Source
Now Create
Data source using Database scoped credentials and Connection string of External
data source.
USE [Fireworks]
CREATE EXTERNAL DATA SOURCE
Flameworks_Ext
WITH
( LOCATION = 'SQLSERVER://Instance2:1433', -- Connection
string of External Data source
CREDENTIAL = External_Credentials
) ;
Create
External Table
USE [Fireworks]
CREATE EXTERNAL TABLE [dbo].[Flame_Color]
(
[Blue] [nchar](10) NULL,
[Red]
[nchar](10) NULL,
[Orange]
[nchar](10) NULL
)
WITH (LOCATION = '[Flameworks].[dbo].[Flame_Color]', -- Table Location including Database name, Schema and
Table name
DATA_SOURCE = Flameworks_Ext);
Note: It is good to have Table Definition as same as Target Table.
Now we are
done with the configuration of External data source and external table using
PolyBase.
We can view the same under Table-->External Tables
In the same way, we can load multiple tables from the same data source.
Also, the same approach should be used for other data sources like Oracle, MySQL ETC.
Now Let’s
discuss which operations can be performed on External Table.
If we expand the external table then we can only see the column and Statistics folder.
Means we can create statistics on external tables. Create Index is not allowed.
Also, As mentioned above, we can perform Read-Only operations on all data-source and insert operations on Hadoop.
Ref:
Introducing data virtualization with PolyBase - SQL Server | Microsoft Docs
CREATE EXTERNAL TABLE (Transact-SQL) - SQL Server | Microsoft Docs
Learn new thing
ReplyDeleteInteresting 👍👍👍
ReplyDelete