Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to: ✅ Azure Data Explorer
The ai_embeddings plugin allows embedding of text using language models, enabling various AI-related scenarios such as Retrieval Augmented Generation (RAG) applications and semantic search. The plugin uses the Azure OpenAI Service embedding models and can be accessed using either a managed identity or the user's identity (impersonation).
Prerequisites
- An Azure OpenAI Service configured with at least the (Cognitive Services OpenAI User) role assigned to the identity being used.
- A Callout Policy configured to allow calls to AI services.
- When using managed identity to access Azure OpenAI Service, configure the Managed Identity Policy to allow communication with the service.
Syntax
evaluate ai_embeddings (text, connectionString [, options [, IncludeErrorMessages]])
Learn more about syntax conventions.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| text | string |
✔️ | The text to embed. The value can be a column reference or a constant scalar. |
| connectionString | string |
✔️ | The connection string for the language model in the format <ModelDeploymentUri>;<AuthenticationMethod>; replace <ModelDeploymentUri> and <AuthenticationMethod> with the AI model deployment URI and the authentication method respectively. |
| options | dynamic |
The options that control calls to the embedding model endpoint. See Options. | |
| IncludeErrorMessages | bool |
Indicates whether to output errors in a new column in the output table. Default value: false. |
Options
The following table describes the options that control the way the requests are made to the embedding model endpoint.
| Name | Type | Description |
|---|---|---|
RecordsPerRequest |
int |
Specifies the number of records to process per request. Default value: 1. |
CharsPerRequest |
int |
Specifies the maximum number of characters to process per request. Default value: 0 (unlimited). Azure OpenAI counts tokens, with each token approximately translating to four characters. |
RetriesOnThrottling |
int |
Specifies the number of retry attempts when throttling occurs. Default value: 0. |
GlobalTimeout |
timespan |
Specifies the maximum time to wait for a response from the embedding model. Default value: null |
ModelParameters |
dynamic |
Parameters specific to the embedding model, such as embedding dimensions or user identifiers for monitoring purposes. Default value: null. |
ReturnSuccessfulOnly |
bool |
Indicates whether to return only the successfully processed items. Default value: false. If the IncludeErrorMessages parameter is set to true, this option is always set to false. |
Configure Callout Policy
The azure_openai callout policy enables external calls to Azure AI services.
To configure the callout policy to authorize the AI model endpoint domain:
.alter-merge cluster policy callout
```
[
{
"CalloutType": "azure_openai",
"CalloutUriRegex": "https://[A-Za-z0-9-]{3,63}\.(?:openai\\.chinacloudapi\\.cn|cognitiveservices\\.chinacloudapi\\.cn|cognitive\\.microsoft\\.com|services\\.ai\\.chinacloudapi\\.cn)(?:/.*)?",
"CanCall": true
}
]
```
Configure Managed Identity
When using managed identity to access Azure OpenAI Service, you must configure the Managed Identity policy to allow the system-assigned managed identity to authenticate to Azure OpenAI Service.
To configure the managed identity:
.alter-merge cluster policy managed_identity
```
[
{
"ObjectId": "system",
"AllowedUsages": "AzureAI"
}
]
```
Returns
Returns the following new embedding columns:
- A column with the _embeddings suffix that contains the embedding values
- If configured to return errors, a column with the _embedding_error suffix, which contains error strings or is left empty if the operation is successful.
Depending on the input type, the plugin returns different results:
- Column reference: Returns one or more records with additional columns are prefixed by the reference column name. For example, if the input column is named TextData, the output columns are named TextData_embedding and, if configured to return errors, TextData_embedding_error.
- Constant scalar: Returns a single record with additional columns that are not prefixed. The column names are _embedding and, if configured to return errors, _embedding_error.
Examples
The following example embeds the text Embed this text using AI using the Azure OpenAI Embedding model.
let expression = 'Embed this text using AI';
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
evaluate ai_embeddings(expression, connectionString)
The following example embeds multiple texts using the Azure OpenAI Embedding model.
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
let options = dynamic({
"RecordsPerRequest": 10,
"CharsPerRequest": 10000,
"RetriesOnThrottling": 1,
"GlobalTimeout": 2m
});
datatable(TextData: string)
[
"First text to embed",
"Second text to embed",
"Third text to embed"
]
| evaluate ai_embeddings(TextData, connectionString, options , true)
Best practices
Azure OpenAI embedding models are subject to heavy throttling, and frequent calls to this plugin can quickly reach throttling limits.
To efficiently use the ai_embeddings plugin while minimizing throttling and costs, follow these best practices:
- Control request size: Adjust the number of records (
RecordsPerRequest) and characters per request (CharsPerRequest). - Control query timeout: Set
GlobalTimeoutto a value lower than the query timeout to ensure progress isn't lost on successful calls up to that point. - Handle rate limits more gracefully: Set retries on throttling (
RetriesOnThrottling).