Load External Tables Using PolyBase

Hello, 

In this blog, we will discuss one more interesting, easy-to-use, and most useful feature of SQL Server. i.e., PolyBase.

Many of us would be aware of this feature. But if it is new for you, then this POC would be helpful to understand the same.

Here, we will discuss PolyBase and will see how we can configure the same to query external data.

What is PolyBase?

PolyBase is a data virtualization feature for SQL Server.

PolyBase enables your SQL Server instance to query data with T-SQL directly from External sources like SQL Server, Oracle, Teradata, MongoDB, Hadoop clusters, Cosmos DB without separately installing client connection software.

We may have questions like we can configure Linked server as well to query external data. Right! But there are few differences between Polybase and linked server which is well explained here.

The Differences are,

PolyBase

Linked Servers

Database scoped object

Instance scoped object

Uses ODBC drivers

Uses OLEDB providers

Supports read-only operations for all data sources and insert operation for HADOOP & data pool data source only

Supports both read and write operations

Queries to remote data source from a single connection can be scaled-out

Queries to remote data source from a single connection cannot be scaled-out

No separate configuration needed for availability group

Separate configuration needed for each instance in availability group

Basic authentication only

Basic & integrated authentication

Suitable for analytic queries processing large number of rows

Suitable for OLTP queries returning single or few rows

Queries using external table cannot participate in distributed transaction

Distributed queries can participate in distributed transaction

This is all about basic Ideas about PolyBase.

Now we will see how we can configure external data source using the same.

In order to load External tables using PolyBase, just remember below easy steps.

1) Create Master Key (If it is not available)

2) Create Database Scoped Credentials (Credentials for external data source)

3) Create External Data Source

4) Create External Table


Let’s start with our POC.

Scenario:

There are two SQL databases names Fireworks and Flameworks.

Fireworks database is on Instance1 and Flameworks database is on instance2.

We got a requirement to load tables [Flameworks].[dbo].[Flame_Color] into Fireworks database.

Let’s follow the below steps to complete the requirement

Make sure, PolyBase feature is installed and SQL Server PolyBase Engine and SQL Server PolyBase Data Movement service is running on Instance1.


Create Master Key

USE [Fireworks]

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'PolyBase147#';

Note: 

-- Query sys.symmetric_keys to check the existing database master key.

-- If master key is already present then use the same. If not then create the new one.

-- As soon as you create the master key make sure you take backup of the same for the safe side


Create Database Scoped Credentials

USE [Fireworks]

CREATE DATABASE SCOPED CREDENTIAL External_Credentials

WITH

IDENTITY = 'Superman',

SECRET = 'SaveTheWorld@2021’;

This credential would be used to authenticate external data sources. Make sure user used for database scoped credentials should have required access to the target database.


Create External Data Source

Now Create Data source using Database scoped credentials and Connection string of External data source. 

USE [Fireworks]

CREATE EXTERNAL DATA SOURCE Flameworks_Ext

WITH

  ( LOCATION = 'SQLSERVER://Instance2:1433', -- Connection string of External Data source

    CREDENTIAL = External_Credentials   

  ) ;


Create External Table

USE [Fireworks]

CREATE EXTERNAL TABLE [dbo].[Flame_Color]

        (

            [Blue] [nchar](10) NULL,

            [Red] [nchar](10) NULL,

            [Orange] [nchar](10) NULL

        )

WITH (LOCATION = '[Flameworks].[dbo].[Flame_Color]', -- Table Location including Database name, Schema and Table name

DATA_SOURCE = Flameworks_Ext);

Note: It is good to have Table Definition as same as Target Table. 

Now we are done with the configuration of External data source and external table using PolyBase.

We can view the same under Table-->External Tables

In the same way, we can load multiple tables from the same data source.

Also, the same approach should be used for other data sources like Oracle, MySQL ETC.

Now Let’s discuss which operations can be performed on External Table.

If we expand the external table then we can only see the column and Statistics folder.

Means we can create statistics on external tables. Create Index is not allowed.

Also, As mentioned above, we can perform Read-Only operations on all data-source and insert operations on Hadoop.


Thank You! 

Ref:






Comments

Post a Comment

Popular posts from this blog

What is DACPAC and how to use it?

Backdoor in SQL server. Is this hacking, loop hole or Feature ?

Always on setup end to end - Part 1 || Pre-requisites