Use blob index tags to manage and find data with Python
This article shows how to use blob index tags to manage and find data using the Azure Storage client library for Python.
To learn about setting blob index tags using asynchronous APIs, see Set blob index tags asynchronously.
Prerequisites
- Azure subscription - create one for trial
- Azure storage account - create a storage account
- Python 3.8+
Set up your environment
If you don't have an existing project, this section shows you how to set up a project to work with the Azure Blob Storage client library for Python. For more details, see Get started with Azure Blob Storage and Python.
To work with the code examples in this article, follow these steps to set up your project.
Install packages
Install the following packages using pip install
:
pip install azure-storage-blob azure-identity
Add import statements
Add the following import
statements:
from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient, ContentSettings
Authorization
The authorization mechanism must have the necessary permissions to work with blob index tags. For authorization with Microsoft Entra ID (recommended), you need Azure RBAC built-in role Storage Blob Data Owner or higher. To learn more, see the authorization guidance for Get Blob Tags (REST API), Set Blob Tags (REST API), or Find Blobs by Tags (REST API).
Create a client object
To connect an app to Blob Storage, create an instance of BlobServiceClient. The following example shows how to create a client object using DefaultAzureCredential
for authorization:
# TODO: Replace <storage-account-name> with your actual storage account name
account_url = "https://<storage-account-name>.blob.core.chinacloudapi.cn"
credential = DefaultAzureCredential()
# Create the BlobServiceClient object
blob_service_client = BlobServiceClient(account_url, credential=credential)
You can also create client objects for specific containers or blobs, either directly or from the BlobServiceClient
object. To learn more about creating and managing client objects, see Create and manage client objects that interact with data resources.
About blob index tags
Blob index tags categorize data in your storage account using key-value tag attributes. These tags are automatically indexed and exposed as a searchable multi-dimensional index to easily find data. This article shows you how to set, get, and find data using blob index tags.
Blob index tags aren't supported for storage accounts with hierarchical namespace enabled. To learn more about the blob index tag feature along with known issues and limitations, see Manage and find Azure Blob data with blob index tags.
Set tags
You can set index tags if your code has authorized access to blob data through one of the following mechanisms:
- Security principal that is assigned an Azure RBAC role with the Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags/write action. The Storage Blob Data Owner is a built-in role that includes this action.
- Shared Access Signature (SAS) with permission to access the blob's tags (
t
permission) - Account key
For more information, see Setting blob index tags.
You can set tags by using the following method:
The specified tags in this method will replace existing tags. If old values must be preserved, they must be downloaded and included in the call to this method. The following example shows how to set tags:
def set_blob_tags(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
# Get any existing tags for the blob if they need to be preserved
tags = blob_client.get_blob_tags()
# Add or modify tags
updated_tags = {'Sealed': 'false', 'Content': 'image', 'Date': '2022-01-01'}
tags.update(updated_tags)
blob_client.set_blob_tags(tags)
You can delete all tags by passing an empty dict
object into the set_blob_tags
method:
def clear_blob_tags(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
# Pass in empty dict object to clear tags
tags = dict()
blob_client.set_blob_tags(tags)
Get tags
You can get index tags if your code has authorized access to blob data through one of the following mechanisms:
- Security principal that is assigned an Azure RBAC role with the Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags/read action. The Storage Blob Data Owner is a built-in role that includes this action.
- Shared Access Signature (SAS) with permission to access the blob's tags (
t
permission) - Account key
For more information, see Getting and listing blob index tags.
You can get tags by using the following method:
The following example shows how to retrieve and iterate over the blob's tags:
def get_blob_tags(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
tags = blob_client.get_blob_tags()
print("Blob tags: ")
for k, v in tags.items():
print(k, v)
Filter and find data with blob index tags
You can use index tags to find and filter data if your code has authorized access to blob data through one of the following mechanisms:
- Security principal that is assigned an Azure RBAC role with the Microsoft.Storage/storageAccounts/blobServices/containers/blobs/filter/action action. The Storage Blob Data Owner is a built-in role that includes this action.
- Shared Access Signature (SAS) with permission to filter blobs by tags (
f
permission) - Account key
For more information, see Finding data using blob index tags.
Note
You can't use index tags to retrieve previous versions. Tags for previous versions aren't passed to the blob index engine. For more information, see Conditions and known issues.
You can find data by using the following method:
The following example finds and lists all blobs tagged as an image:
def find_blobs_by_tags(self, blob_service_client: BlobServiceClient, container_name):
container_client = blob_service_client.get_container_client(container=container_name)
query = "\"Content\"='image'"
blob_list = container_client.find_blobs_by_tags(filter_expression=query)
print("Blobs tagged as images")
for blob in blob_list:
print(blob.name)
Set blob index tags asynchronously
The Azure Blob Storage client library for Python supports working with blob index tags asynchronously. To learn more about project setup requirements, see Asynchronous programming.
Follow these steps to set blob index tags using asynchronous APIs:
- Add the following import statements:
import asyncio
from azure.identity.aio import DefaultAzureCredential
from azure.storage.blob.aio import BlobServiceClient
- Add code to run the program using
asyncio.run
. This function runs the passed coroutine,main()
in our example, and manages theasyncio
event loop. Coroutines are declared with the async/await syntax. In this example, themain()
coroutine first creates the top levelBlobServiceClient
usingasync with
, then calls the method that sets the blob index tags. Note that only the top level client needs to useasync with
, as other clients created from it share the same connection pool.
async def main():
sample = BlobSamples()
# TODO: Replace <storage-account-name> with your actual storage account name
account_url = "https://<storage-account-name>.blob.core.chinacloudapi.cn"
credential = DefaultAzureCredential()
async with BlobServiceClient(account_url, credential=credential) as blob_service_client:
await sample.set_blob_tags(blob_service_client, "sample-container")
if __name__ == '__main__':
asyncio.run(main())
- Add code to set the blob index tags. The code is the same as the synchronous example, except that the method is declared with the
async
keyword and theawait
keyword is used when calling theget_blob_tags
andset_blob_tags
methods.
async def set_blob_tags(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
# Get any existing tags for the blob if they need to be preserved
tags = await blob_client.get_blob_tags()
# Add or modify tags
updated_tags = {'Sealed': 'false', 'Content': 'image', 'Date': '2022-01-01'}
tags.update(updated_tags)
await blob_client.set_blob_tags(tags)
With this basic setup in place, you can implement other examples in this article as coroutines using async/await syntax.
Resources
To learn more about how to use index tags to manage and find data using the Azure Blob Storage client library for Python, see the following resources.
Code samples
- View synchronous or asynchronous code samples from this article (GitHub)
REST API operations
The Azure SDK for Python contains libraries that build on top of the Azure REST API, allowing you to interact with REST API operations through familiar Python paradigms. The client library methods for managing and using blob index tags use the following REST API operations:
- Get Blob Tags (REST API)
- Set Blob Tags (REST API)
- Find Blobs by Tags (REST API)
Client library resources
See also
- Manage and find Azure Blob data with blob index tags
- Use blob index tags to manage and find data on Azure Blob Storage
Related content
- This article is part of the Blob Storage developer guide for Python. To learn more, see the full list of developer guide articles at Build your Python app.