Tutorial: Choose embedding and chat models for RAG in Azure AI Search
A RAG solution built on Azure AI Search takes a dependency on embedding models for vectorization, and on chat models for conversational search over your data.
In this tutorial, you:
- Learn which models in the Azure cloud work with built-in integration
- Learn about the Azure models used for chat
- Deploy models and collect model information for your code
- Configure search engine access to Azure models
- Learn about custom skills and vectorizers for attaching non-Azure models
If you don't have an Azure subscription, create a trial subscription before you begin.
Prerequisites
The Azure portal, used to deploy models and configure role assignments in the Azure cloud.
An Owner role on your Azure subscription, necessary for creating role assignments. Your model provider has more role requirements for deploying and accessing models. Those are noted in the following steps.
A model provider, such as Azure OpenAI, Azure AI Vision via an Azure AI multi-service account, or Azure AI Studio.
We use Azure OpenAI in this tutorial. Other providers are listed so that you know your options for integrated vectorization.
Azure AI Search, Basic tier or higher provides a managed identity used in role assignments.
A shared region. To complete all of the tutorials in this series, the region must support both Azure AI Search and the model provider. See supported regions for:
Azure AI Search is currently facing limited availability in some regions, such as China North and China North 2/3. To confirm region status, check the Azure AI Search region list.
Review models supporting built-in vectorization
Vectorized content improves the query results in a RAG solution. Azure AI Search supports a built-in vectorization action in an indexing pipeline. It also supports vectorization at query time, converting text or image inputs into embeddings for a vector search. In this step, identify an embedding model that works for your content and queries. If you're providing raw vector data and raw vector queries, or if your RAG solution doesn't include vector data, skip this step.
Vector queries that include a text-to-vector conversion step must use the same embedding model that was used during indexing. The search engine doesn't throw an error if you use different models, but you get poor results.
To meet the same-model requirement, choose embedding models that can be referenced through skills during indexing and through vectorizers during query execution. The following table lists the skill and vectorizer pairs. To see how the embedding models are used, skip ahead to Create an indexing pipeline for code that calls an embedding skill and a matching vectorizer.
Azure AI Search provides skill and vectorizer support for the following embedding models in the Azure cloud.
Client | Embedding models | Skill | Vectorizer |
---|---|---|---|
Azure OpenAI | text-embedding-ada-002, text-embedding-3-large, text-embedding-3-small | AzureOpenAIEmbedding | AzureOpenAIEmbedding |
Azure AI Vision | multimodal 4.0 1 | AzureAIVision | AzureAIVision |
1 Supports image and text vectorization.
You can use other models besides those listed here. For more information, see Use non-Azure models for embeddings in this article.
Note
Inputs to an embedding models are typically chunked data. In an Azure AI Search RAG pattern, chunking is handled in the indexer pipeline, covered in another tutorial in this series.
Review models used for generative AI at query time
Azure AI Search doesn't have integration code for chat models, so you should choose an LLM that you're familiar with and that meets your requirements. You can modify query code to try different models without having to rebuild an index or rerun any part of the indexing pipeline. Review Search and generate answers for code that calls the chat model.
The following models are commonly used for a chat search experience:
Client | Chat models |
---|---|
Azure OpenAI | GPT-35-Turbo, GPT-4, GPT-4o, GPT-4 Turbo |
GPT-35-Turbo and GPT-4 models are optimized to work with inputs formatted as a conversation.
We use GPT-4o in this tutorial. During testing, we found that it's less likely to supplement with its own training data. For example, given the query "how much of the earth is covered by water?", GPT-35-Turbo answered using its built-in knowledge of earth to state that 71% of the earth is covered by water, even though the sample data doesn't provide that fact. In contrast, GPT-4o responded (correctly) with "I don't know".
Deploy models and collect information
Models must be deployed and accessible through an endpoint. Both embedding-related skills and vectorizers need the number of dimensions and the model name.
This tutorial series uses the following models and model providers:
- Text-embedding-3-large on Azure OpenAI for embeddings
- GPT-4o on Azure OpenAI for chat completion
You must have Cognitive Services OpenAI Contributor or higher to deploy models in Azure OpenAI.
Go to Azure OpenAI Studio.
Select Deployments on the left menu.
Select Deploy model > Deploy base model.
Select text-embedding-3-large from the dropdown list and confirm the selection.
Specify a deployment name. We recommend "text-embedding-3-large".
Accept the defaults.
Select Deploy.
Repeat the previous steps for gpt-4o.
Make a note of the model names and endpoint. Embedding skills and vectorizers assemble the full endpoint internally, so you only need the resource URI. For example, given
https://MY-FAKE-ACCOUNT.openai.azure.com/openai/deployments/text-embedding-3-large/embeddings?api-version=2024-06-01
, the endpoint you should provide in skill and vectorizer definitions ishttps://MY-FAKE-ACCOUNT.openai.azure.com
.
Configure search engine access to Azure models
For pipeline and query execution, this tutorial uses Microsoft Entra ID for authentication and roles for authorization.
Assign yourself and the search service identity permissions on Azure OpenAI. The code for this tutorial runs locally. Requests to Azure OpenAI originate from your system. Also, search results from the search engine are passed to Azure OpenAI. For these reasons, both you and the search service need permissions on Azure OpenAI.
Sign in to the Azure portal and find your search service.
Configure Azure AI Search to use a system-managed identity.
Find your Azure OpenAI resource.
Select Access control (IAM) on the left menu.
Select Add role assignment.
Select Cognitive Services OpenAI User.
Select Managed identity and then select Members. Find the system-managed identity for your search service in the dropdown list.
Next, select User, group, or service principal and then select Members. Search for your user account and then select it from the dropdown list.
Select Review and Assign to create the role assignments.
For access to models on Azure AI Vision, assign Cognitive Services OpenAI User. For Azure AI Studio, assign Azure AI Developer.
Use non-Azure models for embeddings
The pattern for integrating any embedding model is to wrap it in a custom skill and custom vectorizer. This section provides links to reference articles. For a code example that calls a non-Azure model, see custom-embeddings demo.
Client | Embedding models | Skill | Vectorizer |
---|---|---|---|
Any | Any | custom skill | custom vectorizer |