Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to: ✅ Azure Data Explorer
The ai_embeddings
plugin allows embedding of text using language models, enabling various AI-related scenarios such as Retrieval Augmented Generation (RAG) applications and semantic search. The plugin uses the Azure OpenAI Service embedding models and can be accessed using either a managed identity or the user's identity (impersonation).
Prerequisites
- An Azure OpenAI Service configured with at least the (Cognitive Services OpenAI User) role assigned to the identity being used.
- A Callout Policy configured to allow calls to AI services.
- When using managed identity to access Azure OpenAI Service, configure the Managed Identity Policy to allow communication with the service.
Syntax
evaluate
ai_embeddings
(
text, connectionString [,
options [,
IncludeErrorMessages]])
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
text | string |
✔️ | The text to embed. The value can be a column reference or a constant scalar. |
connectionString | string |
✔️ | The connection string for the language model in the format <ModelDeploymentUri>;<AuthenticationMethod> ; replace <ModelDeploymentUri> and <AuthenticationMethod> with the AI model deployment URI and the authentication method respectively. |
options | dynamic |
The options that control calls to the embedding model endpoint. See Options. | |
IncludeErrorMessages | bool |
Indicates whether to output errors in a new column in the output table. Default value: false . |
Options
The following table describes the options that control the way the requests are made to the embedding model endpoint.
Name | Type | Description |
---|---|---|
RecordsPerRequest |
int |
Specifies the number of records to process per request. Default value: 1 . |
CharsPerRequest |
int |
Specifies the maximum number of characters to process per request. Default value: 0 (unlimited). Azure OpenAI counts tokens, with each token approximately translating to four characters. |
RetriesOnThrottling |
int |
Specifies the number of retry attempts when throttling occurs. Default value: 0 . |
GlobalTimeout |
timespan |
Specifies the maximum time to wait for a response from the embedding model. Default value: null |
ModelParameters |
dynamic |
Parameters specific to the embedding model, such as embedding dimensions or user identifiers for monitoring purposes. Default value: null . |
ReturnSuccessfulOnly |
bool |
Indicates whether to return only the successfully processed items. Default value: false . If the IncludeErrorMessages parameter is set to true , this option is always set to false . |
Configure Callout Policy
The azure_openai
callout policy enables external calls to Azure AI services.
To configure the callout policy to authorize the AI model endpoint domain:
.alter-merge cluster policy callout
```
[
{
"CalloutType": "azure_openai",
"CalloutUriRegex": "https://[A-Za-z0-9\\-]{3,63}\\.openai\\.chinacloudapi\\.cn/.*",
"CanCall": true
}
]
```
Configure Managed Identity
When using managed identity to access Azure OpenAI Service, you must configure the Managed Identity policy to allow the system-assigned managed identity to authenticate to Azure OpenAI Service.
To configure the managed identity:
.alter-merge cluster policy managed_identity
```
[
{
"ObjectId": "system",
"AllowedUsages": "AzureAI"
}
]
```
Returns
Returns the following new embedding columns:
- A column with the _embeddings suffix that contains the embedding values
- If configured to return errors, a column with the _embedding_error suffix, which contains error strings or is left empty if the operation is successful.
Depending on the input type, the plugin returns different results:
- Column reference: Returns one or more records with additional columns are prefixed by the reference column name. For example, if the input column is named TextData, the output columns are named TextData_embedding and, if configured to return errors, TextData_embedding_error.
- Constant scalar: Returns a single record with additional columns that are not prefixed. The column names are _embedding and, if configured to return errors, _embedding_error.
Examples
The following example embeds the text Embed this text using AI
using the Azure OpenAI Embedding model.
let expression = 'Embed this text using AI';
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
evaluate ai_embeddings(expression, connectionString)
The following example embeds multiple texts using the Azure OpenAI Embedding model.
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
let options = dynamic({
"RecordsPerRequest": 10,
"CharsPerRequest": 10000,
"RetriesOnThrottling": 1,
"GlobalTimeout": 2m
});
datatable(TextData: string)
[
"First text to embed",
"Second text to embed",
"Third text to embed"
]
| evaluate ai_embeddings(TextData, connectionString, options , true)
Best practices
Azure OpenAI embedding models are subject to heavy throttling, and frequent calls to this plugin can quickly reach throttling limits.
To efficiently use the ai_embeddings
plugin while minimizing throttling and costs, follow these best practices:
- Control request size: Adjust the number of records (
RecordsPerRequest
) and characters per request (CharsPerRequest
). - Control query timeout: Set
GlobalTimeout
to a value lower than the query timeout to ensure progress isn't lost on successful calls up to that point. - Handle rate limits more gracefully: Set retries on throttling (
RetriesOnThrottling
).