Captions, subtitles and text tracks

Media Services logo v3


Warning

Azure Media Services will be retired June 30th, 2024. For more information, see the AMS Retirement Guide.

You can provide captions, subtitles and other text tracks to the client player. This article discusses captions and subtitle formats.

Captions vs subtitles

In general, captions are text tracks that are used for closed captioning that enables viewers who need accessibility features in a video player to understand what is being said and the sounds that exist in a video. Subtitles, are text tracks that show the speech and sounds in a different language than the source language.

For example, if the source language of a video is English, then the text track would be considered a caption. If an additional language text track is provided, it's considered a subtitle. You can use one text track for both a caption and subtitles, but implementation varies between video player clients.

The following caption/subtitle formats are supported by Media Services

Captioning and subtitles

The following table lists the types of captions supported by Media Services.

Standard Notes
WebVTT WebVTT is the W3C standard for displaying timed text for the HTML5 text track element. This is the standard that Media Services uses for live transcription during live events which is provided by Azure Cognitive Services. You can also use this standard to easily add captions and subtitles to be consumed by a player client such as Azure Media Player.
TTML inside .ismt (Smooth Streaming text tracks) Media Services dynamic packaging enables your clients to stream content in any of the following formats: DASH, HLS, or Smooth Streaming. However, if you ingest fragmented MP4 (Smooth Streaming) with captions inside .ismt (Smooth Streaming text tracks), you can deliver the stream to only Smooth Streaming clients.

Security considerations for closed captions, subtitles, and timed-metadata delivery

The dynamic encryption and DRM features of Azure Media Services has limits to consider when attempting to secure content delivery that includes live transcriptions, captions, subtitles, or timed-metadata. The DRM subsystems, including PlayReady, and FairPlay do not support the encryption and licensing of text tracks. The lack of DRM encryption for text tracks limits your ability to secure the contents of live transcriptions, manual inserted captions, uploaded subtitles, or timed-metadata signals that may be inserted as separate tracks.

To secure your captions, subtitles, or timed-metadata tracks, follow these guidelines:

Use AES-128 Clear Key encryption. When enabling AES-128 clear key encryption, the text tracks can be configured to be encrypted using a full "envelope" encryption technique that follows the same encryption pattern as the audio and video segments. These segments can then be decrypted by a client application after requesting the decryption key from the Media Services Key Delivery service using an authenticated JWT token. This method is supported by the Azure Media Player, but may not be supported on all devices and can require some client-side development work to make sure it succeeds on all platforms.

Warning

If you do not follow the guidelines above, your subtitles, captions, or timed-metadata text will be accessible as un-encrypted content that could be intercepted or shared outside of your intended client delivery path. This can result in leaked information. If you are concerned about the contents of the captions or subtitles being leaked in a secure delivery scenario, reach out to the Media Services support team for more information on the above guidelines for securing your content delivery.

Text tracks

Media Services allows you to provide text tracks in WebVTT or TTML format. You can update the manifest file to tell the player about the text tracks with the portal or with the Tracks API available as REST or with the SDKs.

See the Tracks API article for more information about updating asset tracks programmatically.