Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to: ✅ Azure Data Explorer
The ai_embeddings plugin allows embedding of text using language models, enabling various AI-related scenarios such as Retrieval Augmented Generation (RAG) applications and semantic search. The plugin uses the Azure OpenAI Service embedding models and can be accessed using either a managed identity or the user's identity (impersonation).
Prerequisites
- An Azure OpenAI Service configured with at least the (Cognitive Services OpenAI User) role assigned to the identity being used.
- A Callout Policy configured to allow calls to AI services.
- When using managed identity to access Azure OpenAI Service, configure the Managed Identity Policy to allow communication with the service.
Syntax
evaluate ai_embeddings (text, connectionString [, options [, IncludeErrorMessages]])
Learn more about syntax conventions.
Parameters
| Name | Type | Required | Description | 
|---|---|---|---|
| text | string | ✔️ | The text to embed. The value can be a column reference or a constant scalar. | 
| connectionString | string | ✔️ | The connection string for the language model in the format <ModelDeploymentUri>;<AuthenticationMethod>; replace<ModelDeploymentUri>and<AuthenticationMethod>with the AI model deployment URI and the authentication method respectively. | 
| options | dynamic | The options that control calls to the embedding model endpoint. See Options. | |
| IncludeErrorMessages | bool | Indicates whether to output errors in a new column in the output table. Default value: false. | 
Options
The following table describes the options that control the way the requests are made to the embedding model endpoint.
| Name | Type | Description | 
|---|---|---|
| RecordsPerRequest | int | Specifies the number of records to process per request. Default value: 1. | 
| CharsPerRequest | int | Specifies the maximum number of characters to process per request. Default value: 0(unlimited). Azure OpenAI counts tokens, with each token approximately translating to four characters. | 
| RetriesOnThrottling | int | Specifies the number of retry attempts when throttling occurs. Default value: 0. | 
| GlobalTimeout | timespan | Specifies the maximum time to wait for a response from the embedding model. Default value: null | 
| ModelParameters | dynamic | Parameters specific to the embedding model, such as embedding dimensions or user identifiers for monitoring purposes. Default value: null. | 
| ReturnSuccessfulOnly | bool | Indicates whether to return only the successfully processed items. Default value: false. If the IncludeErrorMessages parameter is set totrue, this option is always set tofalse. | 
Configure Callout Policy
The azure_openai callout policy enables external calls to Azure AI services.
To configure the callout policy to authorize the AI model endpoint domain:
.alter-merge cluster policy callout
```
[
  {
    "CalloutType": "azure_openai",
    "CalloutUriRegex": "https://[A-Za-z0-9\\-]{3,63}\\.openai\\.chinacloudapi\\.cn/.*",
    "CanCall": true
  }
]
```
Configure Managed Identity
When using managed identity to access Azure OpenAI Service, you must configure the Managed Identity policy to allow the system-assigned managed identity to authenticate to Azure OpenAI Service.
To configure the managed identity:
.alter-merge cluster policy managed_identity
```
[
  {
    "ObjectId": "system",
    "AllowedUsages": "AzureAI"
  }
]
```
Returns
Returns the following new embedding columns:
- A column with the _embeddings suffix that contains the embedding values
- If configured to return errors, a column with the _embedding_error suffix, which contains error strings or is left empty if the operation is successful.
Depending on the input type, the plugin returns different results:
- Column reference: Returns one or more records with additional columns are prefixed by the reference column name. For example, if the input column is named TextData, the output columns are named TextData_embedding and, if configured to return errors, TextData_embedding_error.
- Constant scalar: Returns a single record with additional columns that are not prefixed. The column names are _embedding and, if configured to return errors, _embedding_error.
Examples
The following example embeds the text Embed this text using AI using the Azure OpenAI Embedding model.
let expression = 'Embed this text using AI';
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
evaluate ai_embeddings(expression, connectionString)
The following example embeds multiple texts using the Azure OpenAI Embedding model.
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
let options = dynamic({
  "RecordsPerRequest": 10,
  "CharsPerRequest": 10000,
  "RetriesOnThrottling": 1,
  "GlobalTimeout": 2m
});
datatable(TextData: string)
[
    "First text to embed",
    "Second text to embed",
    "Third text to embed"
]
| evaluate ai_embeddings(TextData, connectionString, options , true)
Best practices
Azure OpenAI embedding models are subject to heavy throttling, and frequent calls to this plugin can quickly reach throttling limits.
To efficiently use the ai_embeddings plugin while minimizing throttling and costs, follow these best practices:
- Control request size: Adjust the number of records (RecordsPerRequest) and characters per request (CharsPerRequest).
- Control query timeout: Set GlobalTimeoutto a value lower than the query timeout to ensure progress isn't lost on successful calls up to that point.
- Handle rate limits more gracefully: Set retries on throttling (RetriesOnThrottling).