Azure AI Vision multimodal embeddings skill
Important
This skill is in public preview under Supplemental Terms of Use. The 2024-05-01-Preview REST API supports this feature.
The Azure AI Vision multimodal embeddings skill uses Azure AI Vision's multimodal embeddings API to generate embeddings for image or text input.
The skill is only supported in search services located in a region that supports the Azure AI Vision Multimodal embeddings API. Currently this is China East, France Central, Korea Central, China North, China East, China North, and China North.
Note
This skill is bound to Azure AI services and requires a billable resource for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing Azure AI services Standard Pay-in-Advance Offer price.
In addition, image extraction is billable by Azure AI Search.
@odata.type
Microsoft.Skills.Vision.VectorizeSkill
Data limits
Consider using the Text Split skill if you need data chunking for text inputs.
Skill parameters
Parameters are case-sensitive.
Inputs | Description |
---|---|
modelVersion |
(Required) The model version to be passed to the Azure AI Vision multimodal embeddings API for generating embeddings. It's important that all embeddings stored in a given index field are generated using the same modelVersion . |
Skill inputs
Input | Description |
---|---|
text |
The input text to be vectorized. If you're using data chunking, the source might be /document/pages/* . |
image |
Complex Type. Currently only works with "/document/normalized_images" field, produced by the Azure blob indexer when imageAction is set to a value other than none . |
url |
The URL to download the image to be vectorized. |
queryString |
The query string of the URL to download the image to be vectorized. Useful if you store the URL and SAS token in separate paths. |
Only one of text
, image
or url
/queryString
can be configured for a single instance of the skill. If you want to vectorize both images and text within the same skillset, include two instances of this skill in the skillset definition, one for each input type you would like to use.
Skill outputs
Output | Description |
---|---|
vector |
Output embedding array of floats for the input text or image. |
Sample definition
For text input, consider a record that has the following fields:
{
"content": "Microsoft released Windows 10."
}
Then your skill definition might look like this:
{
"@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill",
"context": "/document",
"modelVersion": "2023-04-15",
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "vector"
}
]
}
For image input, your skill definition might look like this:
{
"@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill",
"context": "/document/normalized_images/*",
"modelVersion": "2023-04-15",
"inputs": [
{
"name": "image",
"source": "/document/normalized_images/*"
}
],
"outputs": [
{
"name": "vector"
}
]
}
If you want to vectorize images directly from your blob storage datasource, your skill definition might look like this:
{
"@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill",
"context": "/document",
"modelVersion": "2023-04-15",
"inputs": [
{
"name": "url",
"source": "/document/metadata_storage_path"
},
{
"name": "queryString",
"source": "/document/metadata_storage_sas_token"
}
],
"outputs": [
{
"name": "vector"
}
]
}
Sample output
For the given input text, a vectorized embedding output is produced.
{
"vector": [
0.018990106880664825,
-0.0073809814639389515,
....
0.021276434883475304,
]
}
The output resides in memory. To send this output to a field in the search index, you must define an outputFieldMapping that maps the vectorized embedding output (which is an array) to a vector field. Assuming the skill output resides in the document's vector node, and content_vector is the field in the search index, the outputFieldMapping in indexer should look like:
"outputFieldMappings": [
{
"sourceFieldName": "/document/vector/*",
"targetFieldName": "content_vector"
}
]
For mapping image embeddings to the index, you'll need to use the Index Projections feature. The payload for indexProjections
might look something like this:
"indexProjections": {
"selectors": [
{
"targetIndexName": "myTargetIndex",
"parentKeyFieldName": "ParentKey",
"sourceContext": "/document/normalized_images/*",
"mappings": [
{
"name": "content_vector",
"source": "/document/normalized_images/*/vector"
}
]
}
]
}