Linked services (LS) define the connection information to external resources, like connection strings. In Azure Data Factory (ADF) there are, currently, close to 100 different LS.

Linked Services

These connections can be parameterized and values passed at runtime. For some of them, the parameterization can be done using the ADF UI directly . For the remaining LS, that parameterization can be achieved by manipulating the LS JSON.

In this blog post, we’ll go through both options.

Prerequisites
Why parameterize a Linked Service?
How does it work?
Building the Demo Scenario
Executing and Testing the Demo Scenario
- Copy Azure SQL Database data
- Copy Storage Account Blob
Wrapping Up

Prerequisites

If you want to follow the examples described in this post, I suggest you follow the instructions available here which will deploy all the objects required for this blog post. All you need is an Azure subscription.

If you don’t want to use that option, make sure you have the following items available:

An Azure subscription. You may register for a free Azure subscription if you don’t have one yet
An Azure Resource Group where you will place the resources created. Please refer to Create a Resource Group
An Azure Data Factory instance. Please refer to Create a Data Factory to see how to create a new instance using the portal
Two Azure SQL Servers, each with a basic database (cheapest option). Please refer to Create a Single Database. Also, make sure to allow all Azure services in the server-level firewall rules, as well as your own IP as seen here
Two Azure Data Lake Storage Gen2, each with a container created. Please refer to Create a Data Lake Storage Account
Upload to one of the storage accounts created, the following file into the container created in the previous step
Create in both databases the following table and insert these records into just one of them
Assign to the Azure Data Factory managed instance, the role Storage Blob Data Contributor in the 2 storage accounts. Please refer to Assign Azure roles

Why parameterize a Linked Service?

If you use ADF to orchestrate the movement of data between several locations, it’s quite easy to start accumulating a lot of different LS. You can easily use 4 or 5 Azure SQL Databases, 4 or 5 storage accounts, 4 or 5 REST APIs, etc. This makes the management of an ADF instance harder, not only from a development perspective but also when it comes to migrate objects to other environments.

This is the reason why the parameterization of LS is good. We can create only one LS per different type of connection and by making use of parameters, define which location we’re trying to reach at runtime. These parameters can easily be defined and controlled through metadata and incorporated into the data flow process.

How does it work?

Runtime Parameters

The capability of defining parameters in our LS to make them dynamic means that all subsequent components (datasets, activities and pipelines) will inherit that dynamic capability. It will be through a set of variables passing between each component that we define what is going to be executed at runtime.

Looking at the image above, assuming a Data Lake Storage Gen2 example, we have the following: - A parameter LS_AccountURL defined at LS level which is then assigned to the attribute URL on the LS - A parameter DS_AccountURL defined at dataset level which is assigned to the dataset LS property LS_AccountURL - Two parameters, PL_SourceAccountURL and PL_TargetAccountURL, defined at pipeline level which will be used in a copy activity to define the dataset property DS_AccountURL

When we execute the pipeline, we assign values to the 2 pipeline parameters which will then be passed to the copy activity (and any other activities that may make use of them), passed to the dataset and, finally, passed to the LS. This allows us to reach 2 different storage accounts but have only 1 LS created.

Building the Demo Scenario

The examples we’re going to build are simple, we’ll be moving some data between 2 Azure SQL Databases and also copy a .csv file from one storage account to another.

We’ll parameterize the Azure SQL Database LS by using ADF UI and parameterize the Storage Account LS by manipulating the JSON directly. After that, we’ll create 2 datasets (one for each type of linked service) and 2 pipelines to move the data.

Parameterize the Azure SQL Database Linked Service

If you used the GitHub content to deploy the objects required to follow this blog post, you’ll have two LS already created: one for an Azure SQL Database and another one for an Azure Data Lake Storage Gen2 (if you didn’t, you’ll need to create them).

We’ll start by editing the SQL Database connection by clicking on the object.

Manage Linked Services

Create 4 parameters named ServerName, DatabaseName, Username and Password. After that, fill in the details of the LS as seen in the image below and click on the Apply button.
- Fully qualified domain name - @{linkedService().ServerName}
- Database name - @{linkedService().DatabaseName}
- Authentication type - leave default (SQL Authentication)
- User name - @{linkedService().Username}
- Password - @{linkedService().Password}

Linked Service Details

Parameterize the Azure Data Lake Storage Linked Service

We’re going to edit this specific LS by manipulating directly the JSON defining the object.

Click on the Code button to see the JSON definition of the object.

LS Code Definition

{
    "name": "LS_DataLakeStorage",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
        "type": "AzureBlobFS",
        "typeProperties": {
            "url": "https://youraccountnamehere.dfs.core.windows.net"
        },
        "annotations": []
    }
}

To parameterize this LS, we’ll need to add a parameter, alongside its datatype, to the JSON definition and replace the url value with the name of that parameter. The definition will look like what you can see below. Click on the Apply button.

{
    "name": "LS_DataLakeStorage",
    "properties": {
        "parameters": {
            "AccountUrl": {
                "type": "string"
            }
        },
        "annotations": [],
        "type": "AzureBlobFS",
        "typeProperties": {
            "url": "@{linkedService().AccountUrl}"
        }
    },
    "type": "Microsoft.DataFactory/factories/linkedservices"
}

Create an Azure SQL Database Dataset

Go to the Author page and select the option to create a New dataset. Once the window opens, choose the Azure SQL Database option and click on Continue.

New Azure SQL Dataset

On the next screen, name the dataset GenericSQLTable and choose the LS already created, LS_SQLServer. Click on the OK button.

Define Azure SQL Dataset

Once you click on the OK button, a new tab will open with the new dataset just created. The first thing you’ll immediately notice is that the dataset has 4 LS properties corresponding to the 4 parameters we used in the definition of the LS. Choose the Parameters tab and create 5 parameters:
- LS_ServerName
- LS_DatabaseName
- LS_Username
- LS_Password
- TableName

SQL Dataset Parameters

Go back to the Connection tab and input the 5 parameters we just defined as seen below.

SQL Dataset Connection Definition

Create an Azure Data Lake Storage Gen2 Dataset

Go to the Author page and select the option to create a New dataset. Once the window opens, choose the Azure Data Lake Storage Gen2 option and click on Continue.

New Azure Data Lake Dataset

On the next screen, select the option Delimited Text and click on the Continue button.

Dataset File Format

On the next screen, name the dataset GenericDelimitedFile and choose the LS already created, LS_DataLakeStorage. Click on the OK button.

Define Azure Data Lake Dataset

As before, a new tab will open with the new dataset just created with 1 LS property. Choose the Parameters tab and create 3 parameters:
- LS_AccountUrl
- ContainerName
- FileName

Data Lake Dataset Parameters

Go back to the Connection tab and input the 3 parameters we just defined as seen below. Leave all other options as default.

Data Lake Dataset Connection Definition

Click the Publish button to save the 2 datasets just created. When a new window opens on the right-side, click the Publish button again.

Publish

Create Pipeline to Copy SQL Database Data

Go to the Author page and select the option to create a New pipeline. A new tab will open with the new pipeline created.

New Pipeline

Add a Copy data activity to the pipeline.

Add Copy Data Activity

On the Parameters tab of the pipeline, define the following 10 parameters:
- SourceServer
- SourceDatabase
- SourceUsername
- SourcePassword
- SourceTable
- TargetServer
- TargetDatabase
- TargetUsername
- TargetPassword
- TargetTable

Pipeline Parameters

Click on the Copy data activity in the pipeline and in the Source tab of the activity, select the GenericSQLTable as the source dataset and input the following parameters we just defined as the Dataset properties (see image below)

Copy Activity Source Definitions

In the Sink tab of the Copy data activity, select the GenericSQLTable as the sink dataset and input the following parameters we just defined as the Dataset properties (see image below)

Copy Activity Sink Definitions

Create Pipeline to Copy Storage Account Blob

As in the previous section, create a new pipeline and add a Copy data activity to it.
On the Parameters tab of the pipeline, define the following 6 parameters:
- SourceAccountUrl
- SourceContainer
- SourceFileName
- TargetAccountUrl
- TargetContainer
- TargetFileName

Pipeline Parameters

Click on the Copy data activity in the pipeline and in the Source tab of the activity, select the GenericDelimitedFile as the source dataset and input the following parameters we just defined as the Dataset properties (see image below).

Copy Activity Source Definitions

In the Sink tab of the Copy data activity, select the GenericDelimitedFile as the sink dataset and input the following parameters we just defined as the Dataset properties (see image below).

Copy Activity Sink Definitions

Click the Publish button to save the 2 pipelines just created. When a new window opens on the right-side, click the Publish button again.

Publish

Executing and Testing the Demo Scenario

We’re now going to execute the 2 pipelines created. The first will copy records from a table in our source database to our target database. The second will copy a blob from our source storage account to our target storage account.

First, let’s check our database tables to make sure we only have records in our source table. The easiest way to do that is going to the Azure portal, select the database resource and choose the option Query editor(preview). You’ll see a SQL server authentication area where you need to enter the password defined when deploying the resources.

Query Editor

After logging in, execute the following query

SELECT * FROM dbo.FootballClubs

In the source database, you should see 9 records being returned.

Source Data

In the target database, there won’t be any records.

Target Data

To check our storage accounts, the best option is again to use the portal, select the storage resource and choose the option Containers. You should then see a container named sourcecontainer or targetcontainer depending on which storage account you’re looking at.

Storage Account Containers

Within the container sourcecontainer, you should see the file FootballClubs.csv.

Source Container Blobs

The container targetcontainer should be empty.

Target Container Blobs

Copy Azure SQL Database data

Back in ADF, open pipeline1 and choose the option Trigger now

Trigger Pipeline

A window will open on the right-hand side asking for values of all the parameters we defined in the pipeline. Input the following values and click on the OK button.
- SourceServer - fully qualified name of the server such as sourcesqlsrvXXXXXX.database.windows.net
- SourceDatabase - SourceDB
- SourceUsername - saXXXXXX (name of the server admin that can be seen on the portal’s server page)
- SourcePassword - password you defined when deploying the resources
- SourceTable - FootballClubs
- TargetServer - fully qualified name of the server such as targetsqlsrvXXXXXX.database.windows.net
- TargetDatabase - TargetDB
- TargetUsername - saXXXXXX (name of the server admin that can be seen on the portal’s server page)
- TargetPassword - password you defined when deploying the resources
- TargetTable - FootballClubs

SQL Pipeline Parameters

Since the pipeline is now running, we need to go to the Monitor page and after a few seconds you should see an indication of Succeeded for the pipeline. Click on the pipeline name.

SQL Pipeline Status

Once you click on the pipeline name, you’ll go to another page where you can see the copy activity completed successfully and you can see the details of the copy activity.

Activity Details

We can see in the copy activity details that 9 rows were copied. To make sure that happened, we go back to our target database in the portal and run the same query we executed before running the pipeline. You should get 9 rows returned.

Copy Activity Details

SQL Target Data after Pipeline Execution

Copy Storage Account Blob

Back in ADF, open pipeline2 and choose the option Trigger now.
A window will open on the right-hand side asking for values of all the parameters we defined in the pipeline. Input the following values and click on the OK button.
- SourceAccountUrl - fully qualified name of storage account as https://stsourceXXXX.dfs.core.windows.net/
- SourceContainer - sourcecontainer
- SourceFilename - FootballClubs.csv
- TargetAccountUrl - fully qualified name of storage account as https://sttargetXXXX.dfs.core.windows.net/
- TargetContainer - targetcontainer
- TargetFilename - FootballClubs.csv

Blob Pipeline Parameters

We go back to the Monitor page to see the status of the pipeline. Click on the pipeline name.

Blob Pipeline Status

We can see in the copy activity details that 1 file was copied. To make sure that happened, we go to our target storage account in the portal and check the contents of the container. There should be now one file named FootballClubs.csv.

Copy Activity Details

Storage Container after Pipeline Execution

Wrapping Up

On this blog post we saw how we can make use of parameterized LS in ADF. The parameterization of LS will give a lot of flexibility when designing data flows and will be an important item in achieving a metadata driven process.

Don’t forget to delete any resources you’ve created to follow this blog post if you no longer plan on using them.

Thanks for reading!

Azure Data Factory Linked Services Parameterization

Table of Contents