Azure OpenAI Embedding skill

Important

This feature is in public preview under Supplemental Terms of Use. The 2023-10-01-Preview REST API supports this feature.

The Azure OpenAI Embedding skill connects to a deployed embedding model on your Azure OpenAI resource to generate embeddings.

The Import and vectorize data uses the Azure OpenAI Embedding skill to vectorize content. You can run the wizard and review the generated skillset to see how the wizard builds it.

Note

This skill is bound to Azure OpenAI and is charged at the existing Azure OpenAI Standard Pay-in-Advance Offer price.

@odata.type

Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill

Data limits

The maximum size of a text input should be 8,000 tokens. If input exceeds the maximum allowed, the model throws an invalid request error. For more information, see the tokens key concept in the Azure OpenAI documentation. Consider using the Text Split skill if you need data chunking.

Skill parameters

Parameters are case-sensitive.

Inputs Description
resourceUri The URI of a model provider, such as an Azure OpenAI resource or an OpenAI URL.
apiKey The secret key used to access the model. If you provide a key, leave authIdentity empty. If you set both the apiKey and authIdentity, the apiKey is used on the connection.
deploymentId The name of the deployed Azure OpenAI embedding model. The model should be an embedding model, such as text-embedding-ada-002. See the List of Azure OpenAI models for supported models.
authIdentity A user-managed identity used by the search service for connecting to Azure OpenAI. You can use either a system or user managed identity. To use a system manged identity, leave apiKey and authIdentity blank. The system-managed identity is used automatically. A managed identity must have Cognitive Services OpenAI User permissions to send text to Azure OpenAI.

Skill inputs

Input Description
text The input text to be vectorized. If you're using data chunking, the source might be /document/pages/*.

Skill outputs

Output Description
embedding Vectorized embedding for the input text.

Sample definition

Consider a record that has the following fields:

{
    "content": "Microsoft released Windows 10."
}

Then your skill definition might look like this:

{
  "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
  "description": "Connects a deployed embedding model.",
  "resourceUri": "https://my-demo-openai-chinaeast.openai.azure.com/",
  "deploymentId": "my-text-embedding-ada-002-model",
  "inputs": [
    {
      "name": "text",
      "source": "/document/content"
    }
  ],
  "outputs": [
    {
      "name": "embedding"
    }
  ]
}

Sample output

For the given input text, a vectorized embedding output is produced.

{
  "embedding": [
        0.018990106880664825,
        -0.0073809814639389515,
        .... 
        0.021276434883475304,
      ]
}

The output resides in memory. To send this output to a field in the search index, you must define an outputFieldMapping that maps the vectorized embedding output (which is an array) to a vector field. Assuming the skill output resides in the document's embedding node, and content_vector is the field in the search index, the outputFieldMapping in indexer should look like:

  "outputFieldMappings": [
    {
      "sourceFieldName": "/document/embedding/*",
      "targetFieldName": "content_vector"
    }
  ]

Errors and warnings

Condition Result
Null or invalid URI Error
Null or invalid deploymentID Error
Text is empty Warning
Text is larger than 8,000 tokens Error

See also