[EN] Azure Data Factory V2 and Azure Automation – Running pipeline from runbook with PowerShell

This post explains things that are difficult to find even in English. That’s why I will break my rule and will not write it in my native language! Po wersję polską zapraszam do google translate :>

 

Introduction


Azure Automation is just a PowerShell and python running platform in the cloud.

In marketing language, it’s a swiss army knife 😛

Here how Microsoft describes it:

Azure Automation delivers a cloud-based automation and configuration service that provides consistent management across your Azure and non-Azure environments. It consists of process automation, update management, and configuration features. Azure Automation provides complete control during deployment, operations, and decommissioning of workloads and resources.

Apart from this gibberish, I will point out some important issues…

Know your Automation

  • It has something that is called „a feature” – Fair Share – which basically prevent you from running scripts longer than 3 hours.
  • Well, at least it will pause your script after 3 hours. And if you didn’t implement it as a workflow with some checkpoints – it will RESTART your script from the beginning.
  • And if you implement checkpoints, it will resume your script from last known checkpoint. BUT it will do this only 3 times! So you are not able to implement logic that takes more than 9 hours to process…
  • The workaround is to connect your own machine (server or laptop) as a hybrid worker.

Read more about fair share here: https://docs.microsoft.com/en-us/azure/automation/automation-runbook-execution#fair-share

 

Since Azure Data Factory cannot just simply pause and resume activity, we have to assume that pipeline will not run more than 3 hours.

Any other scenarios require you to write your custom logic and maybe divide pipelines to shorter ones and implement checkpoints between running them…

Preparations


Before we create runbook, we must set credential and some variables.

 

Adding credential

We have to set credential, that PowerShell will use to handle pipeline run in Azure Data Factory V2

  1. Go to Automation account, under Shared Resources click „Credentials
  2. Add a credential. It must be an account with privileges to run and monitor a pipeline in ADF. I will name it „AzureDataFactoryUser”. Set login and password.

 

Adding variables

We will use variables to parametrize some account information just not to hardcode them in our script.

  1. Go to Automation account, under Shared Resources click „Variables
  2. Add four string variables and set values for them. First will point to the credential name, second will provide data factory name, third – resource group name and fourth – ADF`s subscription id.
 

 

Adding AzureRM.DataFactoryV2 module

You have to add powershell module to your automation account. Justo go to Modules, click „Browse gallery” and search for „AzureRM.DataFactoryV2„.

Select it from the results list and click „Import„.

 

Creating the runbook

Now we can create a PowerShell runbook.

Bear in mind, that working with PowerShell in Azure Portal is not the best way to create, debug and test your runbooks.  I really suggest using PowerShell ISE addon for Azure Automation. Go to https://azure.microsoft.com/en-us/blog/announcing-azure-automation-powershell-ise-add-on/ and see for yourself.
  1. Go to Automation portal, under „PROCESS AUTOMATION” click „Runbooks
  2. Select „Add a runbook
  3. We will use quick create, so select „Create a new runbook„, then name it and select type as „PowerShell„.
  4. Use the script below in „Edit” mode, then save it and publish.

 

PowerShell script


Parameters

It has two parameters:

PipelineName – the name of the pipeline to run

CheckLoopTime  – a number of seconds between checking status of a trigerred pipeline run

 

Invoke-AzureRmDataFactoryV2Pipeline is a cmdlet which I use to trigger a pipeline. Unfortunately, it is an asynchronous operation, so after triggering, we have to periodically check for running state and status.
This script will do it in a simple loop and there will be some wait logic before every iteration. You can parametrize the number of seconds. Every loop also prints out last known pipeline status and timestamp of that check.
Any status other than „Succeeded” will be treated as a failure.

 

The code

 

Test run

I will run my testing pipeline which simply starts wait activity (5 sec.) then will try to run unexisting procedure (after which pipeline should fail)

Go to saved runbook, click „Start„.

Provide parameters, like this one and click OK:

 

Runbook will be queued. Go to Output and wait for results.

 

 

Dodaj komentarz