Uploading files to Azure Data Lake Storage Gen2 from PowerShell using REST API, OAuth 2.0 bearer token and Access Control List (ACL) privileges

 

Introduction


In my previous article “Connecting to Azure Data Lake Storage Gen2 from PowerShell using REST API – a step-by-step guide“, I showed and explained the connection using access keys.

As you probably know, access key grants a lot of privileges. In fact, your storage account key is similar to the root password for your storage account. As Microsoft says:

Always be careful to protect your account key. Avoid distributing it to other users, hard-coding it, or saving it anywhere in plaintext that is accessible to others. Regenerate your account key using the Azure portal if you believe it may have been compromised.

So what if you don’t want to use access keys at all? And what if you need to grant access only to particular folder? Fortunately, there is an alternative.

  • You can create an application in the Azure Active Directory that you can use to grant ACL privileges in ADLS Gen2 to particular path.
  • With ApplicationID and given secret, you can authorized your connection and get the bearer token…
  • …and use this token in your REST connection to ADLS Gen2 endpoint 🙂

This time you don’t need to use any sophisticated algorithm to encrypt headers to make any connection 🙂 Unfortunately in case of data upload, you need to do few more steps to make it work. But don’t worry, it’s easy-peasy 😉

What you will learn

With this detailed step-by-step guide you will get to know how to:

  • Add a new application registered in AAD, create a secret and how to get proper service principal id (because there are many IDs and it can be confusing…) [GO]
  • How to get service principal id of that application if you don’t have permission to access Azure Active Directory in portal. [GO]
  • How to grant this application particular ACL permission in ADLS Gen2 to allow access only to desired path. For example to read/write only in folder2 and not in folder1 when the path looks like this: /folder1/folder2/. No IAM (RBAC) roles will be used, pure ACL! [GO]
  • How to access ADLS Gen2 using above application credentials via REST API. [GO]
  • How to simply upload a file into ADLS Gen2 using REST API in three steps (CREATE, UPDATE, FLUSH) [GO]

 

Applications in Azure Active Directory


Let me start with an important information:

Accessing Azure Active Directory in most scenarios is restricted only to privileged admins. Even if you have the owner permission on a subscription, this part of functionality may be blocked for you. There is still possibility to register an app by AAD users using REST or PowerShell, but I know from experience that administrators also turn this off. Follow this article on Microsoft site (click) to get more information.

 

In case you need to add an application, please ask your AAD administrator to create a new one. This is a quite common task, so the admin should already know how to do it 🙂 If not – just paste him link to this site, I’ll show how to do it.

 

If you already have an application created for you, and you don’t have its service principal id, just head to the paragraph below and check how to get it manually. Just bear in mind, that this is a quite confusing part because there are many different ids available for application: ApplicationID (same as ClientID) Registered apps ObjectID, Enterprise App ObjectID (which is the service principal id), DirectoryID (also known as TenantID). As you can see it can give you a headache…

So even if you have some ID or application name given you by your colleague – please check it anyway!

Application names are not unique, IDs (guids) are everywhere in Azure.  There are many ways to confuse us 🙂

Application is an object, that you can create with service principal and use to manage some identity features. Unfortunately it’s not so straightforward when it comes to describe what are the differences between apps and service principals. It always confuses people. But I think it would be easier for you to read at least two first paragraphs in this documentation. That should help you understand the relation and understand the reason why we create an app and still using service principal id, (and not application id) in permissions.

 

Creating application in Azure portal


We will create the app, then get service principal id.

1. Enter the Azure portal, go to “Azure Active Directory” and then “App registrations”. THIS LINK should redirect you exactly to the proper page.

 

2. Click “New registration”, choose proper app type and provide a name. In my example it will be “ADLS_ACCESS_APP”

We will use regular app scoped to the AAD directory only. You can read about other types in screenshot below:

 

3. Save somewhere Directory (tenant) ID and Application (client) ID, they will be needed in our upload example for authorization. Remember, ApplicationID IS NOT your service principal id!

 

Adding a secret to your application


Just click “Certificates & secrets” in your App registrations, then select “+ New client secret”.

It will generate a secret for you. Don’t worry, my app does not exist anymore so don’t waste time using values to do the something evil xD

Copy the value now, it will be needed in our OAuth example.

 

How to get service principal id of the application using Azure portal


1. Now we need to go to Azure Active Directory pane once again and click this time “Enterprise applications”.

There should be your application available on the list. Click it. In my example it’s “ADLS_ACCESS_APP” application!

 

2. Click “Properties” from left pane in “Manage” section. As you can see the Application ID in form is the same. But the ObjectID is different. That’s because this is service principal id 🙂 Copy it and save somewhere as other values.

 

How to get service principal id of the application using PowerShell


So you don’t have permission to access Azure Active Directory?

If you have an application already prepared by someone and you want to get/check its service principal id,  then you can get this id as long as you are member of that directory 🙂 This example uses good old AzureRM module (I still don’t like new Az…). If you don’t have this module, install it with the command below. Of course don’t forget to log in into proper Azure account!

 

Search by Application Display Name (or App name)

The value from “Id” will be your service principal id!

 

Search by Application ID

Again, Id is your service principal id 🙂

 

Check if given service principal id belongs to your application

You should always check if your admin gave you the proper id 🙂

 

 

ACLs in ADLS Gen2


Once we will have proper service principal id, we can grant permissions in Azure Data Lake Stor[e|age] Gen2. You can read more about ACL in Microsoft Docs. 

Just remember some important facts:

  • There is a limit of maximum entries in Access Control List and it is currently 32. So once you will add our application, you will have 31 left. This is why they suggest to add you application into proper AAD group and then add the group as ACL entry. Makes sense, but choose wisely 🙂
  • To allow access you need to have ACL permissions set somehow on every level of the given path. Starting with the filesystem itself, then on every folder and subfolder that user needs to access to.
  • The basic permission to allow “access” is the “Execute” and remember: it will not let user to read or list anything! Just to “traverse” the path!
  • So if you want to grant read access only to folder2 in a path: my-filesystem:/folder1/folder2/ you need to grant Execute permission on: my-filesystem, folder1, folder2 and Read on: folder2.
  • To make it easier to propagate the permissions, ACLs are implementing something called “Default” permission. Setting this permission on folder1 will copy the permission to folder2 BUT ONLY if folder2 did not exist before applying those default permissions. So only to newly created files and folders. This is something definitely different than what everyone got used to in Windows. Permissions do not inherit! Defaults are copied only at creation time!
  • Adding an application or adding an AAD group will not be presented in the ACL list with their user friendly names.  Unfortunately they are not unique (this is a reason that they are giving) and you will always be struggling with Object IDs in Azure Storage Explorer or portal. That’s pain in the ass! Hope it will change someday 😐 Because well… It looks ugly and difficult to manage:

 

Applying ACL permissions using Azure Storage Explorer


Currently the “Storage Explorer (preview)” in portal does not have the possibility to manage ACL on filesystem.. Only on files. So we need to use Azure Storage Explorer, an application that you can download from here.

We will implement permissions according to the following assumptions:

  • Path will be: my-filesystem:/folder1/folder2
  • We want the user to be able to read and write only to folder2. So no R/W permissions on my-filesystem and folder1 and it means that he will not be able to even list the content of the filesystem nor folder1.

 

1. Open Azure Storage Explorer and go to your adls gen2 and target filesystem.

 

2. Click right mouse button on the filesystem. Choose “Manage Access…”

 

3. In our example we will use service principal id from the application, that was created at the beginning of this tutorial. Of course you can add ObjectID of the AAD group. You can also add user principal name (UPN), so regular user like user@yourdomain.com to allow access to this storage – for example to use Azure Databricks and Credential Passthrough feature 🙂

Add your service principal id and grant “Access” with “Execute” then “Default” also with “Execute”

The Default, as it sais, will automatically add this permission to all new children of this directory. Click “Save”.

 

4. Now we will create folder1, then folder2 inside folder1. Note that folder1 and folder2 already have our service principal id added and “Execute” permission was applied.

Note also the fact, the if you will remove this permission from folder1 it will not be removed from folder2! That is very important information. I must repeat it, permissions are not inherited from parent objects. Bad management can lead to unexpected security breaches so watch out!

 

5. Click RMB on folder2, click LMB on “Manage access…” and change permissions of the application. Now we want to read and write everywhere in folder2, even in all future subdirectories.

So we will grant R/W/E on both “layers” : “Access” and “Default”.

 

UPLOADING A FILE


To upload a file we need all required parameters and several important steps:

  1. Get the bearer token from Azure OAuth 2.0 API
  2. Create an empty file on ADLS Gen2. This is required! Uploading a file can be done only as ‘append’ operation to already existing object.
  3. Upload the content using proper data stream and position offset (with single upload the position is zero)
  4. Flush the file on ADLS side to commit uploaded data with the offset (the position of the last byte, so the file size 🙂 )

 

Summarizing required parameters


So far we have all parameters of created app, all permissions are granted. We want to upload a file… so we need also a file and a target exact location 🙂

Let’s summarize all available details:

  • ApplicationID: 27e1347d-40b9-421e-9546-1a4727ae1fa4
  • Application Secret: 8h]D//c3FHp_FNF49]SygxL3_lm2O:G4
  • Application Service Principal ID: 8d1f246e-e8fa-46a5-9dce-06f60c7f1c3a
  • DirectoryID (also known as TenantID): [GUID but REDUCTED:P]
  • Datalake Storage Account Name: upload0example
  • Filesystem name: my-filesystem
  • ADLS file path: folder1\folder2\file.txt
  • Local file path: C:\tmp\My_50MB_file_that_i_want_to_upload.txt

Again, do not worry about secrets and id values ;] They are not valid anymore. I put them just to make the example as real as possible.

Get the bearer token


We need to get a token from oauth2 endpoint: https://login.microsoftonline.com/$DirectoryID/oauth2/v2.0/token

It requires some important variables. Set scope to https://storage.azure.com/.default and grant type to client_credentials . Use your ApplicationID and app secret and set token value to variable.

 

The content of $BearerToken will look like this (reallly looooong value)

 

 

Create a file


We need to create the file before we will start uploading to it. Important facts:

  • In my example there is additional conditional header: If-None-Match  with a value * , that fails the creation if file already exist.
  • Use $BearerToken from previous example to pass the token.
  • You can always replace cmdlet Invoke-RestMethod with the other – Invoke-WebRequest to read all additional headers returned from REST API
 

Our file now exists and does not have any size (yet 🙂 )

If you have permission errors during the file creation, please check if:

  • your application is added into ACL as server principal id, not application id
  • permissions on your destination folder are set to at least Read and Execute
  • every upper folder and the filesystem needs to have “EXECUTE” permission granted to our application service principal id

 

So what if you want to create a file in a folder that you do not have access to?

 

Bazinga!

 

Uploading the content of a file


Ok, the most important phase of our process. I need to mention other important information:

  • Uploading a file does not make it uploaded yet! You need to flush the file (see next step)
  • My example is really, really simple and is not bulletproof.
  • I’m using single upload, my file is 50MB in size and I do not have any problems sending it and reading it after the upload.
  • There is a huge topic regarding proper process usage and content describing, like acquiring locks (lease) before file sending or giving in headers Content-Length, Content-MD5, x-ms-content-type, x-ms-content-encoding, x-ms-content-language etc. This is not required, but it should be done as part of good practice!
  • ADLS Gen2 REST API is capable of receiving your file in parallel connections, as small chunks! This is why it requires from us the position of a file.
  • I’m not sure how big file can be in a single upload. So feel warned 🙂
  • And by the way, what if your 50GB file upload will be interrupted by the connection error during last seconds of 1h uploading phase?
  • In fact this is why azcopy (which is used by Azure Storage Explorer) uploads the data into ADLS Gen2 in chunks, parallelized in several connections. Check it. Find folder .azcopy in your user home directory (Windows: c:\Users\your_account\.azcopy, Linux: /home/your_account/.azcopy). Azcopy logs by default to it in really detailed manner 🙂

 

Ok, now the script. Use other variables from previous steps.

File still does not have it’s size when you check it in Storage Explorer.

Currently the uploaded content exists only in a temporary cache of our ADLS service. If you want to finally finish the process – move to the next step.

We need to get the information about sended file size. Because it will be used as a final position to flush the content from the cache. That’s why I’m returning it into $file_details  variable.

 

Flush the content of a file from cache to final destination


And now the final process:

 

And finally our upload process is finished, the file has proper size.

 

File verification

Just a little check.

Is this exactly what we have just tried to send? 🙂 Download the file and compare it with the origin using checksum:

 

 

Success! Have a beer! 🙂

 

UPLOAD A FILE – The complete PowerShell script!


Everything from above examples placed in one script. Enjoy!

 

 

One thought on “Uploading files to Azure Data Lake Storage Gen2 from PowerShell using REST API, OAuth 2.0 bearer token and Access Control List (ACL) privileges

Leave a Reply