Use summarization Docker containers on-premises

2025-10-16

Containers enable you to host the Summarization API on your own infrastructure. If you have security or data governance requirements that can't be fulfilled by calling Summarization remotely, then containers might be a good option.

Prerequisites

If you don't have an Azure subscription, create a Trial.
Docker installed on a host computer. Docker must be configured to allow the containers to connect with and send billing data to Azure.
- On Windows, Docker must also be configured to support Linux containers.
- You should have a basic understanding of Docker concepts.
A Language resource with the free (F0) or standard (S) pricing tier.

Gather required parameters

Three primary parameters for all Azure AI containers are required. The Microsoft Software License Terms must be present with a value of accept. An Endpoint URI and API key are also needed.

Endpoint URI

The {ENDPOINT_URI} value is available on the Azure portal Overview page of the corresponding Azure AI services resource. Go to the Overview page, hover over the endpoint, and a Copy to clipboard icon appears. Copy and use the endpoint where needed.

Screenshot that shows gathering the endpoint URI for later use.

Keys

The {API_KEY} value is used to start the container and is available on the Azure portal's Keys page of the corresponding Azure AI services resource. Go to the Keys page, and select the Copy to clipboard icon.

Screenshot that shows getting one of the two keys for later use.

Important

These subscription keys are used to access your Azure AI services API. Don't share your keys. Store them securely. For example, use Azure Key Vault. We also recommend that you regenerate these keys regularly. Only one key is necessary to make an API call. When you regenerate the first key, you can use the second key for continued access to the service.

Host computer requirements and recommendations

The host is an x64-based computer that runs the Docker container. It can be a computer on your premises or a Docker hosting service in Azure, such as:

Azure Kubernetes Service.
Azure Container Instances.
A Kubernetes cluster deployed to Azure Stack. For more information, see Deploy Kubernetes to Azure Stack.

The following table describes the minimum and recommended specifications for the summarization container skills. Listed CPU/memory combinations are for a 4000 token input (conversation consumption is for all the aspects in the same request).

Container Type	Recommended number of CPU cores	Recommended memory	Notes
Summarization CPU container	16	48 GB
Summarization GPU container	2	24 GB	Requires an NVIDIA GPU that supports Cuda 11.8 with 16GB VRAM.

CPU core and memory correspond to the --cpus and --memory settings, which are used as part of the docker run command.

Get the container image with `docker pull`

The Summarization container image can be found on the mcr.microsoft.com container registry syndicate. It resides within the azure-cognitive-services/textanalytics/ repository and is named summarization. The fully qualified container image name is, mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization

To use the latest version of the container, you can use the latest tag. You can also find a full list of tags on the MCR.

Use the docker pull command to download a container image from the Microsoft Container Registry.

docker pull mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization:cpu

for CPU containers,

docker pull mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization:gpu

for GPU containers.

Tip

You can use the docker images command to list your downloaded container images. For example, the following command lists the ID, repository, and tag of each downloaded container image, formatted as a table:

docker images --format "table {{.ID}}\t{{.Repository}}\t{{.Tag}}"

IMAGE ID         REPOSITORY                TAG
<image-id>       <repository-path/name>    <tag-name>

Download the summarization container models

A pre-requisite for running the summarization container is to download the models first. This can be done by running one of the following commands using a CPU container image as an example:

docker run -v {HOST_MODELS_PATH}:/models mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization:cpu downloadModels=ExtractiveSummarization billing={ENDPOINT_URI} apikey={API_KEY}
docker run -v {HOST_MODELS_PATH}:/models mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization:cpu downloadModels=AbstractiveSummarization billing={ENDPOINT_URI} apikey={API_KEY}
docker run -v {HOST_MODELS_PATH}:/models mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization:cpu downloadModels=ConversationSummarization billing={ENDPOINT_URI} apikey={API_KEY}

It's not recommended to download models for all skills inside the same HOST_MODELS_PATH, as the container loads all models inside the HOST_MODELS_PATH. Doing so would use a large amount of memory. It's recommended to only download the model for the skill you need in a particular HOST_MODELS_PATH.

In order to ensure compatibility between models and the container, re-download the utilized models whenever you create a container using a new image version.

Run the container with `docker run`

Once the Summarization container is on the host computer, use the following docker run command to run the containers. The container will continue to run until you stop it. Replace the placeholders below with your own values:

Placeholder	Value	Format or example
{HOST_MODELS_PATH}	The host computer volume mount, which Docker uses to persist the model.	An example is c:\SummarizationModel where the c:\ drive is located on the host machine.
{ENDPOINT_URI}	The endpoint for accessing the summarization API. You can find it on your resource's Key and endpoint page, on the Azure portal.	`https://<your-custom-subdomain>.cognitiveservices.azure.cn`
{API_KEY}	The key for your Language resource. You can find it on your resource's Key and endpoint page, on the Azure portal.	`xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`

docker run -p 5000:5000 -v {HOST_MODELS_PATH}:/models mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization:cpu eula=accept rai_terms=accept billing={ENDPOINT_URI} apikey={API_KEY}

Or if you are running a GPU container, use this command instead.

docker run -p 5000:5000 --gpus all -v {HOST_MODELS_PATH}:/models mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization:gpu eula=accept rai_terms=accept billing={ENDPOINT_URI} apikey={API_KEY}

If there is more than one GPU on the machine, replace --gpus all with --gpus device={DEVICE_ID}.

Important

The docker commands in the following sections use the back slash, \, as a line continuation character. Replace or remove this based on your host operating system's requirements.
The Eula, Billing, rai_terms and ApiKey options must be specified to run the container; otherwise, the container won't start. For more information, see Billing.

This command:

Runs a Summarization container from the container image
Allocates one CPU core and 4 gigabytes (GB) of memory
Exposes TCP port 5000 and allocates a pseudo-TTY for the container
Automatically removes the container after it exits. The container image is still available on the host computer.

Run multiple containers on the same host

If you intend to run multiple containers with exposed ports, make sure to run each container with a different exposed port. For example, run the first container on port 5000 and the second container on port 5001.

You can have this container and a different Azure AI services container running on the HOST together. You also can have multiple containers of the same Azure AI services container running.

Query the container's prediction endpoint

The container provides REST-based query prediction endpoint APIs.

Use the host, http://localhost:5000, for container APIs.

Validate that a container is running

There are several ways to validate that the container is running. Locate the External IP address and exposed port of the container in question, and open your favorite web browser. Use the various request URLs that follow to validate the container is running. The example request URLs listed here are http://localhost:5000, but your specific container might vary. Make sure to rely on your container's External IP address and exposed port.

Request URL	Purpose
`http://localhost:5000/`	The container provides a home page.
`http://localhost:5000/ready`	Requested with GET, this URL provides a verification that the container is ready to accept a query against the model. This request can be used for Kubernetes liveness and readiness probes.
`http://localhost:5000/status`	Also requested with GET, this URL verifies if the api-key used to start the container is valid without causing an endpoint query. This request can be used for Kubernetes liveness and readiness probes.
`http://localhost:5000/swagger`	The container provides a full set of documentation for the endpoints and a Try it out feature. With this feature, you can enter your settings into a web-based HTML form and make the query without having to write any code. After the query returns, an example CURL command is provided to demonstrate the HTTP headers and body format that's required.

Container's home page

Stop the container

To shut down the container, in the command-line environment where the container is running, select Ctrl+C.

Troubleshooting

If you run the container with an output mount and logging enabled, the container generates log files that are helpful to troubleshoot issues that happen while starting or running the container.

Tip

For more troubleshooting information and guidance, see Azure AI containers frequently asked questions (FAQ).

Billing

The summarization containers send billing information to Azure, using a Language resource on your Azure account.

Queries to the container are billed at the pricing tier of the Azure resource that's used for the ApiKey parameter.

Azure AI services containers aren't licensed to run without being connected to the metering or billing endpoint. You must enable the containers to communicate billing information with the billing endpoint at all times. Azure AI services containers don't send customer data, such as the image or text that's being analyzed, to Azure.

Connect to Azure

The container needs the billing argument values to run. These values allow the container to connect to the billing endpoint. The container reports usage about every 10 to 15 minutes. If the container doesn't connect to Azure within the allowed time window, the container continues to run but doesn't serve queries until the billing endpoint is restored. The connection is attempted 10 times at the same time interval of 10 to 15 minutes. If it can't connect to the billing endpoint within the 10 tries, the container stops serving requests.

Billing arguments

The docker run command will start the container when all three of the following options are provided with valid values:

Option	Description
`ApiKey`	The API key of the Azure AI services resource that's used to track billing information. The value of this option must be set to an API key for the provisioned resource that's specified in `Billing`.
`Billing`	The endpoint of the Azure AI services resource that's used to track billing information. The value of this option must be set to the endpoint URI of a provisioned Azure resource.
`Eula`	Indicates that you accepted the license for the container. The value of this option must be set to accept.

For more information about these options, see Configure containers.

Summary

In this article, you learned concepts and workflow for downloading, installing, and running summarization containers. In summary:

Summarization provides Linux containers for Docker
Container images are downloaded from the Microsoft Container Registry (MCR).
Container images run in Docker.
You must specify billing information when instantiating a container.

Important

This container is not licensed to run without being connected to Azure for metering. Customers need to enable the containers to communicate billing information with the metering service at all times. Azure AI containers do not send customer data (e.g. text that is being analyzed) to Microsoft.

Next steps

See Configure containers for configuration settings.

Use summarization Docker containers on-premises

Prerequisites

Gather required parameters

Endpoint URI

Keys

Host computer requirements and recommendations

Get the container image with docker pull

Download the summarization container models

Run the container with docker run

Run multiple containers on the same host

Query the container's prediction endpoint

Validate that a container is running

Stop the container

Troubleshooting

Billing

Connect to Azure

Billing arguments

Summary

Next steps

Additional resources

Get the container image with `docker pull`

Run the container with `docker run`