Manage blob properties and metadata with Python
In addition to the data they contain, blobs support system properties and user-defined metadata. This article shows how to manage system properties and user-defined metadata using the Azure Storage client library for Python.
To learn about managing properties and metadata using asynchronous APIs, see Set blob metadata asynchronously.
Prerequisites
- Azure subscription - create one for trial
- Azure storage account - create a storage account
- Python 3.8+
Set up your environment
If you don't have an existing project, this section shows you how to set up a project to work with the Azure Blob Storage client library for Python. For more details, see Get started with Azure Blob Storage and Python.
To work with the code examples in this article, follow these steps to set up your project.
Install packages
Install the following packages using pip install
:
pip install azure-storage-blob azure-identity
Add import statements
Add the following import
statements:
from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient, ContentSettings
Authorization
The authorization mechanism must have the necessary permissions to work with container properties or metadata. For authorization with Microsoft Entra ID (recommended), you need Azure RBAC built-in role Storage Blob Data Reader or higher for the get operations, and Storage Blob Data Contributor or higher for the set operations. To learn more, see the authorization guidance for Set Blob Properties (REST API), Get Blob Properties (REST API), Set Blob Metadata (REST API), or Get Blob Metadata (REST API).
Create a client object
To connect an app to Blob Storage, create an instance of BlobServiceClient. The following example shows how to create a client object using DefaultAzureCredential
for authorization:
# TODO: Replace <storage-account-name> with your actual storage account name
account_url = "https://<storage-account-name>.blob.core.chinacloudapi.cn"
credential = DefaultAzureCredential()
# Create the BlobServiceClient object
blob_service_client = BlobServiceClient(account_url, credential=credential)
You can also create client objects for specific containers or blobs, either directly or from the BlobServiceClient
object. To learn more about creating and managing client objects, see Create and manage client objects that interact with data resources.
About properties and metadata
System properties: System properties exist on each Blob storage resource. Some of them can be read or set, while others are read-only. Under the covers, some system properties correspond to certain standard HTTP headers. The Azure Storage client library for Python maintains these properties for you.
User-defined metadata: User-defined metadata consists of one or more name-value pairs that you specify for a Blob storage resource. You can use metadata to store additional values with the resource. Metadata values are for your own purposes only, and don't affect how the resource behaves.
Metadata name/value pairs are valid HTTP headers and should adhere to all restrictions governing HTTP headers. For more information about metadata naming requirements, see Metadata names.
Note
Blob index tags also provide the ability to store arbitrary user-defined key/value attributes alongside an Azure Blob storage resource. While similar to metadata, only blob index tags are automatically indexed and made searchable by the native blob service. Metadata cannot be indexed and queried unless you utilize a separate service such as Azure Search.
To learn more about this feature, see Manage and find data on Azure Blob storage with blob index (preview).
Set and retrieve properties
To set properties on a blob, use the following method:
Any properties not explicitly set are cleared. To preserve any existing properties, you can first retrieve the blob properties, then use them to populate the headers that aren't being updated.
The following code example sets the content_type
and content_language
system properties on a blob, while preserving the existing properties:
def set_properties(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
# Get the existing blob properties
properties = blob_client.get_blob_properties()
# Set the content_type and content_language headers, and populate the remaining headers from the existing properties
blob_headers = ContentSettings(content_type="text/plain",
content_encoding=properties.content_settings.content_encoding,
content_language="en-US",
content_disposition=properties.content_settings.content_disposition,
cache_control=properties.content_settings.cache_control,
content_md5=properties.content_settings.content_md5)
blob_client.set_http_headers(blob_headers)
To retrieve properties on a blob, use the following method:
The following code example gets a blob's system properties and displays some of the values:
def get_properties(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
properties = blob_client.get_blob_properties()
print(f"Blob type: {properties.blob_type}")
print(f"Blob size: {properties.size}")
print(f"Content type: {properties.content_settings.content_type}")
print(f"Content language: {properties.content_settings.content_language}")
Set and retrieve metadata
You can specify metadata as one or more name-value pairs on a blob or container resource. To set metadata, send a dictionary containing name-value pairs using the following method:
The following code example sets metadata on a blob:
def set_metadata(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
# Retrieve existing metadata, if desired
blob_metadata = blob_client.get_blob_properties().metadata
more_blob_metadata = {'docType': 'text', 'docCategory': 'reference'}
blob_metadata.update(more_blob_metadata)
# Set metadata on the blob
blob_client.set_blob_metadata(metadata=blob_metadata)
To retrieve metadata, call the get_blob_properties method on your blob to populate the metadata collection, then read the values, as shown in the example below. The get_blob_properties
method retrieves blob properties and metadata by calling both the Get Blob Properties operation and the Get Blob Metadata operation.
The following code example reads metadata on a blob and prints each key/value pair:
def get_metadata(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
# Retrieve existing metadata, if desired
blob_metadata = blob_client.get_blob_properties().metadata
for k, v in blob_metadata.items():
print(k, v)
Set blob metadata asynchronously
The Azure Blob Storage client library for Python supports managing blob properties and metadata asynchronously. To learn more about project setup requirements, see Asynchronous programming.
Follow these steps to set blob metadata using asynchronous APIs:
- Add the following import statements:
import asyncio
from azure.identity.aio import DefaultAzureCredential
from azure.storage.blob.aio import BlobServiceClient
- Add code to run the program using
asyncio.run
. This function runs the passed coroutine,main()
in our example, and manages theasyncio
event loop. Coroutines are declared with the async/await syntax. In this example, themain()
coroutine first creates the top levelBlobServiceClient
usingasync with
, then calls the method that sets the blob metadata. Note that only the top level client needs to useasync with
, as other clients created from it share the same connection pool.
async def main():
sample = BlobSamples()
# TODO: Replace <storage-account-name> with your actual storage account name
account_url = "https://<storage-account-name>.blob.core.chinacloudapi.cn"
credential = DefaultAzureCredential()
async with BlobServiceClient(account_url, credential=credential) as blob_service_client:
await sample.set_metadata(blob_service_client, "sample-container")
if __name__ == '__main__':
asyncio.run(main())
- Add code to set the blob metadata. The code is the same as the synchronous example, except that the method is declared with the
async
keyword and theawait
keyword is used when calling theget_blob_properties
andset_blob_metadata
methods.
async def set_metadata(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
# Retrieve existing metadata, if desired
properties = await blob_client.get_blob_properties()
blob_metadata = properties.metadata
more_blob_metadata = {'docType': 'text', 'docCategory': 'reference'}
blob_metadata.update(more_blob_metadata)
# Set metadata on the blob
await blob_client.set_blob_metadata(metadata=blob_metadata)
With this basic setup in place, you can implement other examples in this article as coroutines using async/await syntax.
Resources
To learn more about how to manage system properties and user-defined metadata using the Azure Blob Storage client library for Python, see the following resources.
Code samples
- View synchronous or asynchronous code samples from this article (GitHub)
REST API operations
The Azure SDK for Python contains libraries that build on top of the Azure REST API, allowing you to interact with REST API operations through familiar Python paradigms. The client library methods for managing system properties and user-defined metadata use the following REST API operations:
- Set Blob Properties (REST API)
- Get Blob Properties (REST API)
- Set Blob Metadata (REST API)
- Get Blob Metadata (REST API)
Client library resources
Related content
- This article is part of the Blob Storage developer guide for Python. To learn more, see the full list of developer guide articles at Build your Python app.