Media Encoder Standard schema

This article describes some of the elements and types of the XML schema on which Media Encoder Standard presets are based. The article gives explanation of elements and their valid values.

Preset (root element)

Defines an encoding preset.

Elements

Name Type Description
Encoding Encoding Root element, indicates that the input sources are to be encoded.
Outputs Outputs Collection of desired output files.
StretchMode
minOccurs="0"
default="AutoSize
xs:string Control the output video frame size, padding, pixel, or display aspect ratio. StretchMode could be one of the following values: None, AutoSize (default), or AutoFit.

None: Strictly follow the output resolution (for example, the Width and Height in the preset) without considering the pixel aspect ratio or display aspect ratio of the input video. Use for scenarios such as cropping, where the output video has a different aspect ratio compared to the input.

AutoSize: The output resolution will fit inside the window (Width * Height) specified by preset. However, the encoder produces an output video that has square (1:1) pixel aspect ratio. Therefore, either output Width or output Height could be overridden in order to match the display aspect ratio of the input, without padding. For example, if the input is 1920x1080 and the encoding preset asks for 1280x1280, then the Height value in the preset is overridden, and the output will be at 1280x720, which maintains the input aspect ratio of 16:9.

AutoFit: If needed, pad the output video (with either letterbox or pillarbox) to honor the desired output resolution, while ensuring that the active video region in the output has the same aspect ratio as the input. For example, suppose the input is 1920x1080 and the encoding preset asks for 1280x1280. Then the output video will be at 1280x1280, but it will contain an inner 1280x720 rectangle of ‘active video’ with aspect ratio of 16:9, and letterbox regions 280 pixels high at the top and bottom. For another example, if the input is 1440x1080 and the encoding preset asks for 1280x720, then the output will be at 1280x720, which contains an inner rectangle of 960x720 at aspect ratio of 4:3, and pillar box regions 160 pixels wide at the left and right.

Attributes

Name Type Description
Version

Required
xs: decimal The preset version. The following restrictions apply: xs:fractionDigits value="1" and xs:minInclusive value="1" For example, version="1.0".

Encoding

Contains a sequence of the following elements:

Elements

Name Type Description
H264Video H264Video Settings for H.264 encoding of video.
AACAudio AACAudio Settings for AAC encoding of audio.
BmpImage BmpImage Settings for Bmp image.
PngImage PngImage Settings for Png image.
JpgImage JpgImage Settings for Jpg image.

H264Video

Elements

Name Type Description
TwoPass

minOccurs="0"
xs:boolean Currently, only one-pass encoding is supported.
KeyFrameInterval

minOccurs="0"

default="00:00:02"
xs:time Determines the fixed spacing between IDR frames in units of seconds. Also referred to as the GOP duration. See SceneChangeDetection for controlling whether the encoder can deviate from this value.
SceneChangeDetection

minOccurs="0"

default=”false”
xs: boolean If set to true, encoder attempts to detect scene change in the video and inserts an IDR frame.
Complexity

minOccurs="0"

default="Balanced"
xs:string Controls the trade-off between encode speed and video quality. Could be one of the following values: Speed, Balanced, or Quality

Default: Balanced
SyncMode

minOccurs="0"
Feature will be exposed in a future release.
H264Layers

minOccurs="0"
H264Layers Collection of output video layers.

Attributes

Name Type Description
Condition xs:string When the input has no video, you may want to force the encoder to insert a monochrome video track. To do that, use Condition="InsertBlackIfNoVideoBottomLayerOnly" (to insert a video at only the lowest bitrate) or Condition="InsertBlackIfNoVideo" (to insert a video at all output bitrates). For more information, see this article.

H264Layers

By default, if you send an input to the encoder that contains only audio, and no video, the output asset contains files with audio data only. Some players may not be able to handle such output streams. You can use the H264Video's InsertBlackIfNoVideo attribute setting to force the encoder to add a video track to the output in that scenario. For more information, see this article.

Elements

Name Type Description
H264Layer

minOccurs="0" maxOccurs="unbounded"
H264Layer A collection of H264 layers.

H264Layer

Note

Video limits are based on the values described in the H264 Levels table.

Elements

Name Type Description
Profile

minOccurs="0"

default=”Auto”
xs: string Could be of one of the following xs: string values: Auto, Baseline, Main, High.
Level

minOccurs="0"

default=”Auto”
xs: string
Bitrate

minOccurs="0"
xs:int The bitrate used for this video layer, specified in kbps.
MaxBitrate

minOccurs="0"
xs: int The maximum bitrate used for this video layer, specified in kbps.
BufferWindow

minOccurs="0"

default="00:00:05"
xs: time Length of the video buffer.
Width

minOccurs="0"
xs: int Width of the output video frame, in pixels.

Currently, you must specify both Width and Height. The Width and Height need to be even numbers.
Height

minOccurs="0"
xs:int Height of the output video frame, in pixels.

Currently, you must specify both Width and Height. The Width and Height need to be even numbers.
BFrames

minOccurs="0"
xs: int Number of B frames between reference frames.
ReferenceFrames

minOccurs="0"

default=”3”
xs:int Number of reference frames in a GOP.
EntropyMode

minOccurs="0"

default=”Cabac”
xs: string Could be one of the following values: Cabac and Cavlc.
FrameRate

minOccurs="0"
rational number Determines the frame rate of the output video. Use default of "0/1" to let the encoder use the same frame rate as the input video. Allowed values are expected to be common video frame rates. However, any valid rational is allowed. For example, 1/1 would be 1 fps and is valid.

- 12/1 (12 fps)

- 15/1 (15 fps)

- 24/1 (24 fps)

- 24000/1001 (23.976 fps)

- 25/1 (25 fps)

- 30/1 (30 fps)

- 30000/1001 (29.97 fps)

NOTE If you are creating a custom preset for multiple-bitrate encoding, then all layers of the preset must use the same value of FrameRate.
AdaptiveBFrame

minOccurs="0"
xs: boolean Copy from Azure media encoder
Slices

minOccurs="0"

default="0"
xs:int Determines how many slices a frame is divided into. Recommend using default.

AACAudio

Contains a sequence of the following elements and groups.

For more information about AAC, see AAC.

Elements

Name Type Description
Profile

minOccurs="0 "

default="AACLC"
xs: string Could be one of the following values: AACLC, HEAACV1, or HEAACV2.

Attributes

Name Type Description
Condition xs: string To force the encoder to produce an asset that contains a silent audio track when input has no audio, specify the "InsertSilenceIfNoAudio" value.

By default, if you send an input to the encoder that contains only video, and no audio, then the output asset contains files that contain only video data. Some players may not be able to handle such output streams. You can use this setting to force the encoder to add a silent audio track to the output in that scenario.

Groups

Reference Description
AudioGroup

minOccurs="0"
See description of AudioGroup to know the appropriate number of channels, sampling rate, and bit rate that could be set for each profile.

AudioGroup

For details about what values are valid for each profile, see the “Audio codec details” table that follows.

Elements

Name Type Description
Channels

minOccurs="0"
xs: int The number of audio channels encoded. The following are valid options: 1, 2, 5, 6, 8.

Default: 2.
SamplingRate

minOccurs="0"
xs: int The audio sampling rate, specified in Hz.
Bitrate

minOccurs="0"
xs: int The bitrate used when encoding the audio, specified in kbps.

Audio codec details

Audio Codec Details
AACLC 1:

- 11025: 8 <= bitrate < 16

- 12000: 8 <= bitrate < 16

- 16000: 8 <= bitrate <32

- 22050: 24 <= bitrate < 32

- 24000: 24 <= bitrate < 32

- 32000: 32 <= bitrate <= 192

- 44100: 56 <= bitrate <= 288

- 48000: 56 <= bitrate <= 288

- 88200 : 128 <= bitrate <= 288

- 96000 : 128 <= bitrate <= 288

2:

- 11025: 16 <= bitrate < 24

- 12000: 16 <= bitrate < 24

- 16000: 16 <= bitrate < 40

- 22050: 32 <= bitrate < 40

- 24000 : 32 <= bitrate < 40

- 32000: 40 <= bitrate <= 384

- 44100: 96 <= bitrate <= 576

- 48000 : 96 <= bitrate <= 576

- 88200: 256 <= bitrate <= 576

- 96000: 256 <= bitrate <= 576

5/6:

- 32000: 160 <= bitrate <= 896

- 44100: 240 <= bitrate <= 1024

- 48000: 240 <= bitrate <= 1024

- 88200: 640 <= bitrate <= 1024

- 96000: 640 <= bitrate <= 1024

8:

- 32000 : 224 <= bitrate <= 1024

- 44100 : 384 <= bitrate <= 1024

- 48000: 384 <= bitrate <= 1024

- 88200: 896 <= bitrate <= 1024

- 96000: 896 <= bitrate <= 1024
HEAACV1 1:

- 22050: bitrate = 8

- 24000: 8 <= bitrate <= 10

- 32000: 12 <= bitrate <= 64

- 44100: 20 <= bitrate <= 64

- 48000: 20 <= bitrate <= 64

- 88200: bitrate = 64

2:

- 32000: 16 <= bitrate <= 128

- 44100: 16 <= bitrate <= 128

- 48000: 16 <= bitrate <= 128

- 88200 : 96 <= bitrate <= 128

- 96000: 96 <= bitrate <= 128

5/6:

- 32000 : 64 <= bitrate <= 320

- 44100: 64 <= bitrate <= 320

- 48000: 64 <= bitrate <= 320

- 88200 : 256 <= bitrate <= 320

- 96000: 256 <= bitrate <= 320

8:

- 32000: 96 <= bitrate <= 448

- 44100: 96 <= bitrate <= 448

- 48000: 96 <= bitrate <= 448

- 88200: 384 <= bitrate <= 448

- 96000: 384 <= bitrate <= 448
HEAACV2 2:

- 22050: 8 <= bitrate <= 10

- 24000: 8 <= bitrate <= 10

- 32000: 12 <= bitrate <= 64

- 44100: 20 <= bitrate <= 64

- 48000: 20 <= bitrate <= 64

- 88200: 64 <= bitrate <= 64

Clip

Attributes

Name Type Description
StartTime xs:duration Specifies the start time of a presentation. The value of StartTime needs to match the absolute timestamps of the input video. For example, if the first frame of the input video has a timestamp of 12:00:10.000, then StartTime should be at least 12:00:10.000 or greater.
Duration xs:duration Specifies the duration of a presentation (for example, appearance of an overlay in the video).

Output

Attributes

Name Type Description
FileName xs:string The name of the output file.

You can use macros described in the following table to build the output file names. For example:

"Outputs": [ { "FileName": "{Basename}{Resolution}{Bitrate}.mp4", "Format": { "Type": "MP4Format" } } ]

Macros

Macro Description
{Basename} If you are doing VoD encoding, the {Basename} is the first 32 characters of the AssetFile.Name property of the primary file in the input asset. {Basename} is limited to 64 chars if CopyCodec (CopyAudio or CopyVideo) is used in the preset to avoid duplicated output file names.

If the input asset is a live archive, then the {Basename} is derived from the trackName attributes in the server manifest. If you are submitting a subclip job using the TopBitrate, as in: "<VideoStream>TopBitrate</VideoStream>", and the output file contains video, then the {Basename} is the first 32 characters of the trackName of the video layer with the highest bitrate.

If instead you are submitting a subclip job using all of the input bitrates, such as "<VideoStream>*</VideoStream>", and the output file contains video, then {Basename} is the first 32 characters of the trackName of the corresponding video layer.
{Codec} Maps to "H264" for video and "AAC" for audio.
{Bitrate} The target video bitrate if the output file contains video and audio, or target audio bitrate if the output file contains audio only. The value used is the bitrate in kbps.
{Channel} Audio channel count if the file contains audio.
{Width} Width of the video, in pixels, in the output file, if the file contains video.
{Height} Height of the video, in pixels, in the output file, if the file contains video.
{Extension} Inherits from the "Type" property for the output file. The output file name has an extension which is one of: "mp4", "ts", "jpg", "png", or "bmp".
{Index} Mandatory for thumbnail. Should only be present once.

Video (complex type inherits from Codec)

Attributes

Name Type Description
Start xs:string
Step xs:string
Range xs:string
PreserveResolutionAfterRotation xs:boolean For detailed explanation, see the following section: PreserveResolutionAfterRotation

PreserveResolutionAfterRotation

Use the PreserveResolutionAfterRotation flag in combination with resolution values expressed in percentage terms (Width="100%" , Height="100%").

By default, the encode resolution settings (Width, Height) in the Media Encoder Standard (MES) presets are targeted at videos with 0-degree rotation. For example, if your input video is 1280x720 with zero-degree rotation, then the default presets ensure that the output has the same resolution.

MESRoation1

If the input video has been captured with non-zero rotation (for example, a smartphone or tablet held vertically), then MES by default applies the encode resolution settings (Width, Height) to the input video, and then compensate for the rotation. For example, see the picture that follows. The preset uses Width = "100%", Height = "100%", which MES interprets as requiring the output to be 1280 pixels wide and 720 pixels tall. After rotating the video, it then shrinks the picture to fit into that window, leading to pillar-box areas on the left and right.

MESRoation2

Alternatively, you can make use of the PreserveResolutionAfterRotation flag and set it to "true" (default is "false"). So if your preset has Width = "100%", Height = "100%" and PreserveResolutionAfterRotation set to "true", an input video, which is 1280 pixels wide and 720 pixels tall with 90-degree rotation produces an output with zero-degree rotation, but 720 pixels wide and 1280 pixels tall. See the following picture:

MESRoation3

FormatGroup (group)

Elements

Name Type Description
BmpFormat BmpFormat
PngFormat PngFormat
JpgFormat JpgFormat

BmpLayer

Element

Name Type Description
Width

minOccurs="0"
xs:int
Height

minOccurs="0"
xs:int

Attributes

Name Type Description
Condition xs:string

PngLayer

Element

Name Type Description
Width

minOccurs="0"
xs:int
Height

minOccurs="0"
xs:int

Attributes

Name Type Description
Condition xs:string

JpgLayer

Element

Name Type Description
Width

minOccurs="0"
xs:int
Height

minOccurs="0"
xs:int
Quality

minOccurs="0"
xs:int Valid values: 1(worst)-100(best)

Attributes

Name Type Description
Condition xs:string

PngLayers

Elements

Name Type Description
PngLayer

minOccurs="0" maxOccurs="unbounded"
PngLayer

BmpLayers

Elements

Name Type Description
BmpLayer

minOccurs="0" maxOccurs="unbounded"
BmpLayer

JpgLayers

Elements

Name Type Description
JpgLayer

minOccurs="0" maxOccurs="unbounded"
JpgLayer

BmpImage (complex type inherits from Video)

Elements

Name Type Description
PngLayers

minOccurs="0"
PngLayers Png layers

JpgImage (complex type inherits from Video)

Elements

Name Type Description
PngLayers

minOccurs="0"
PngLayers Png layers

PngImage (complex type inherits from Video)

Elements

Name Type Description
PngLayers

minOccurs="0"
PngLayers Png layers

Examples

See examples of XML presets that are built based on this schema, see Task Presets for MES (Media Encoder Standard).