Tutorial: Choose embedding and chat models for classic RAG in Azure AI Search

A RAG solution built on Azure AI Search takes a dependency on embedding models for vectorization, and on chat completion models for conversational search over your data.

Note

We now recommend agentic retrieval for RAG workflows, but classic RAG is simpler. If it meets your application requirements, it's still a good choice.

In this tutorial, you:

Learn about the Azure models supported for built-in vectorization
Learn about the Azure models supported for chat completion
Deploy models and collect model information for your code
Configure search engine access to Azure models
Learn about custom skills and vectorizers for attaching non-Azure models

If you don't have an Azure subscription, create a trial subscription before you begin.

Prerequisites

The Azure portal, used to deploy models and configure role assignments in the Azure cloud.
An Owner or User Access Administrator role on your Azure subscription, necessary for creating role assignments. You use at least three Azure resources in this tutorial. The connections are authenticated using Microsoft Entra ID, which requires the ability to create roles. Role assignments for connecting to models are documented in this article. If you can't create roles, you can use API keys instead.
A model provider, such as Azure OpenAI, Azure AI Vision via an Azure AI services multi-service resource, or Azure AI Foundry.

We use Azure OpenAI in this tutorial. Other providers are listed so that you know your options for integrated vectorization.
Azure AI Search, Basic tier or higher provides a managed identity used in role assignments.

Review models supporting built-in vectorization

Vectorized content improves the query results in a RAG solution. Azure AI Search supports a built-in vectorization action in an indexing pipeline. It also supports vectorization at query time, converting text or image inputs into embeddings for a vector search. In this step, identify an embedding model that works for your content and queries. If you're providing raw vector data and raw vector queries, or if your RAG solution doesn't include vector data, skip this step.

Vector queries that include a text-to-vector conversion step must use the same embedding model that was used during indexing. The search engine doesn't throw an error if you use different models, but you get poor results.

To meet the same-model requirement, choose embedding models that can be referenced through skills during indexing and through vectorizers during query execution. The following table lists the skill and vectorizer pairs. To see how the embedding models are used, skip ahead to Create an indexing pipeline for code that calls an embedding skill and a matching vectorizer.

Azure AI Search provides skill and vectorizer support for the following embedding models in the Azure cloud.

Client	Embedding models	Skill	Vectorizer
Azure OpenAI	text-embedding-ada-002 text-embedding-3-large text-embedding-3-small	AzureOpenAIEmbedding	AzureOpenAIEmbedding
Azure Vision	multimodal 4.0 ¹	AzureAIVision	AzureAIVision

¹ Supports text and image vectorization.

You can use other models besides the ones listed here. For more information, see Use non-Azure models for embeddings in this article.

Note

Inputs to an embedding models are typically chunked data. In an Azure AI Search RAG pattern, chunking is handled in the indexer pipeline, covered in another tutorial in this series.

Review models used for generative AI at query time

Azure AI Search doesn't have integration code for chat models, so you should choose an LLM that you're familiar with and that meets your requirements. You can modify query code to try different models without having to rebuild an index or rerun any part of the indexing pipeline. Review Search and generate answers for code that calls the chat model.

The following models are commonly used for a chat search experience:

Client	Chat models
Azure OpenAI	GPT-4, GPT-4o, GPT-4.1. GPT-5

GPT-4 and GPT-5 models are optimized to work with inputs formatted as a conversation.

We use GPT-4o in this tutorial.

Deploy models and collect information

Models must be deployed and accessible through an endpoint. Both embedding-related skills and vectorizers need the number of dimensions and the model name.

This tutorial series uses the following models and model providers:

Text-embedding-3-large on Azure OpenAI for embeddings
GPT-4o on Azure OpenAI for chat completion

You must have Cognitive Services OpenAI Contributor or higher to deploy models in Azure OpenAI.

Sign in to the Foundry portal.
Select text-embedding-3-large, and then select Use this model.
Specify a deployment name. We recommend text-embedding-3-large.
Accept the defaults.
Select Deploy.
Repeat the previous steps for gpt-4o.
Make a note of the model names and endpoint. Embedding skills and vectorizers assemble the full endpoint internally, so you only need the resource URI. For example, given https://MY-FAKE-ACCOUNT.openai.azure.com/openai/deployments/text-embedding-3-large/embeddings?api-version=2024-06-01, the endpoint you should provide in skill and vectorizer definitions is https://MY-FAKE-ACCOUNT.openai.azure.com.

Configure search engine access to Azure models

For pipeline and query execution, this tutorial uses Microsoft Entra ID for authentication and roles for authorization.

Assign yourself and the search service identity permissions on Azure OpenAI. The code for this tutorial runs locally. Requests to Azure OpenAI originate from your system. Also, search results from the search engine are passed to Azure OpenAI. For these reasons, both you and the search service need permissions on Azure OpenAI.

Sign in to the Azure portal and find your search service.
Configure Azure AI Search to use a system-managed identity.
Find your Azure OpenAI resource.
Select Access control (IAM) on the left menu.
Select Add role assignment.
Select Cognitive Services OpenAI User.
Select Managed identity and then select Members. Find the system-managed identity for your search service in the dropdown list.
Next, select User, group, or service principal and then select Members. Search for your user account and then select it from the dropdown list.
Make sure you have two security principals assigned to the role.
Select Review and Assign to create the role assignments.

For access to models on Azure Vision, assign Cognitive Services OpenAI User. For Foundry, assign Azure AI Developer.

Use non-Azure models for embeddings

The pattern for integrating any embedding model is to wrap it in a custom skill and custom vectorizer. This section provides links to reference articles. For a code example that calls a non-Azure model, see custom-embeddings demo.

Client	Embedding models	Skill	Vectorizer
Any	Any	custom skill	custom vectorizer

Next step

Design an index

Last updated on 2025-12-15