Query a knowledge base using the retrieve action or MCP endpoint

Note

This feature is currently in public preview. This preview is provided without a service-level agreement and isn't recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Azure Previews.

In an agentic retrieval pipeline, the retrieve action invokes parallel query processing from a knowledge base. You can call the retrieve action directly using the Search Service REST APIs or an Azure SDK. Each knowledge base also exposes a Model Context Protocol (MCP) endpoint for consumption by MCP-compatible agents.

This article explains how to call both retrieval methods and interpret the three-pronged response.

Prerequisites

  • An Azure AI Search service with a knowledge base.

  • Permissions to query the knowledge base. Configure keyless authentication with the Search Index Data Reader role assigned to your user account (recommended) or use an API key.

  • If the knowledge base specifies an LLM, the search service must have a managed identity with Cognitive Services User permissions on the Azure AI services resource.

Call the retrieve action

You specify the retrieve action on a knowledge base. The input is chat conversation history in natural language, where the messages array contains the conversation. The agentic retrieval engine supports messages only if the retrieval reasoning effort is low or medium.

using Azure.Identity;
using Azure.Search.Documents.KnowledgeBases;
using Azure.Search.Documents.KnowledgeBases.Models;

// Create knowledge base retrieval client
var kbClient = new KnowledgeBaseRetrievalClient(
    endpoint: new Uri("<YOUR SEARCH SERVICE URL>"),
    knowledgeBaseName: "<YOUR KNOWLEDGE BASE NAME>",
    tokenCredential: new DefaultAzureCredential()
);

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent(
                "You can answer questions about the Earth at night. "
                + "Sources have a JSON format with a ref_id that must be cited in the answer. "
                + "If you do not have the answer, respond with 'I do not know'."
            )
        }
    ) { Role = "assistant" }
);
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent(
                "Why is the Phoenix nighttime street grid so sharply visible from space, "
                + "whereas large stretches of the interstate between midwestern cities remain comparatively dim?"
            )
        }
    ) { Role = "user" }
);

var result = await kbClient.RetrieveAsync(retrievalRequest);
Console.WriteLine(
    (result.Value.Response[0].Content[0] as KnowledgeBaseMessageTextContent)!.Text
);

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

from azure.identity import DefaultAzureCredential
from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient
from azure.search.documents.knowledgebases.models import (
    KnowledgeBaseMessage,
    KnowledgeBaseMessageTextContent,
    KnowledgeBaseRetrievalRequest,
    SearchIndexKnowledgeSourceParams,
)

# Create knowledge base retrieval client
kb_client = KnowledgeBaseRetrievalClient(
    endpoint="<YOUR SEARCH SERVICE URL>",
    knowledge_base_name="<YOUR KNOWLEDGE BASE NAME>",
    credential=DefaultAzureCredential(),
)

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="assistant",
            content=[
                KnowledgeBaseMessageTextContent(
                    text="You can answer questions about the Earth at night. "
                    "Sources have a JSON format with a ref_id that must be cited in the answer. "
                    "If you do not have the answer, respond with 'I do not know'."
                )
            ],
        ),
        KnowledgeBaseMessage(
            role="user",
            content=[
                KnowledgeBaseMessageTextContent(
                    text="Why is the Phoenix nighttime street grid so sharply visible from space, "
                    "whereas large stretches of the interstate between midwestern cities remain comparatively dim?"
                )
            ],
        ),
    ],
    knowledge_source_params=[
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="earth-at-night-blob-ks",
        )
    ],
)

result = kb_client.retrieve(retrieval_request=request)
print(result.response[0].content[0].text)

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

@search-url = <YOUR SEARCH SERVICE URL> // Example: https://my-service.search.azure.cn
@accessToken = <YOUR ACCESS TOKEN> // Run: az account get-access-token --scope https://search.azure.cn/.default --query accessToken -o tsv

POST {{search-url}}/knowledgebases/{{knowledge-base-name}}/retrieve?api-version=2025-11-01-preview
Content-Type: application/json
Authorization: Bearer {{accessToken}}

{
    "messages": [
        {
            "role": "assistant",
            "content": [
                {
                    "type": "text",
                    "text": "You can answer questions about the Earth at night. Sources have a JSON format with a ref_id that must be cited in the answer. If you do not have the answer, respond with 'I do not know'."
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Why is the Phoenix nighttime street grid so sharply visible from space, whereas large stretches of the interstate between midwestern cities remain comparatively dim?"
                }
            ]
        }
    ],
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": "earth-at-night-blob-ks",
            "kind": "searchIndex"
        }
    ]
}

Reference: Knowledge Retrieval - Retrieve

Request parameters

Pass the following parameters to call the retrieve action.

Name Description Type Editable Required
messages Articulates the messages sent to an LLM. The message format is similar to Azure AI services APIs. Object Yes No
messages.role Defines where the message came from, such as assistant or user. The model you use determines which roles are valid. String Yes No
messages.content The message or prompt sent to the LLM. In this preview, it must be text. String Yes No
knowledgeSourceParams Overrides default retrieval settings per knowledge source. Useful for customizing the query or response at query time. Object Yes No

Retrieval from a search index

For knowledge sources that target a search index, all searchable fields are in scope for query execution. The implied query type is semantic, and there's no search mode.

If the index includes vector fields, you need a valid vectorizer definition so the agentic retrieval engine can vectorize query inputs. Otherwise, vector fields are ignored.

For more information, see Create an index for agentic retrieval.

Call the MCP endpoint

MCP is an open protocol that standardizes how AI applications connect to external data sources and tools.

In Azure AI Search, each knowledge base is a standalone MCP server that exposes the knowledge_base_retrieve tool.

MCP endpoint format

Each knowledge base has an MCP endpoint at the following URL:

https://<your-service-name>.search.azure.cn/knowledgebases/<your-knowledge-base-name>/mcp?api-version=2025-11-01-preview

Authenticate to the MCP endpoint

The MCP endpoint requires authentication via custom headers. You have two options:

  • (Recommended) Pass a bearer token in the Authorization header. The identity behind the token must have the Search Index Data Reader role assigned on the search service. This approach avoids storing keys in configuration files. For more information, see Connect your app to Azure AI Search using identities.

  • Pass an admin key in the api-key header. An admin key provides full read-write access to the search service, so use it with caution. For more information, see Connect to Azure AI Search using API keys.

Review the response

Successful retrieval returns a 200 OK status code. If the knowledge base fails to retrieve from one or more knowledge sources, a 206 Partial Content status code returns. The response only includes results from sources that succeeded. Details about the partial response appear as errors in the activity array.

The retrieve action returns three main components:

Extracted response

The extracted response is a single unified string that you typically pass to an LLM. The LLM consumes it as grounding data and uses it to formulate a response. Your API call to the LLM includes the unified string and instructions for the model, such as whether to use the grounding exclusively or as a supplement.

The body of the response is also structured in the chat message style format. In this preview, the content is serialized JSON.

"response": [
    {
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "[{\"ref_id\":0,\"title\":\"Urban Structure\",\"terms\":\"Location of Phoenix, Grid of City Blocks, Phoenix Metropolitan Area at Night\",\"content\":\"<content chunk redacted>\"}]"
            }
        ]
    }
]

Key points:

  • content.text is a JSON array. It's a single string composed of the most relevant documents (or chunks) found in the search index, given the query and chat history inputs. This array is your grounding data that a chat completion model uses to formulate a response to the user's question.

    This portion of the response consists of 200 chunks or fewer, excluding any results that fail to meet the minimum threshold of a 2.5 reranker score.

    The string starts with the reference ID of the chunk (used for citation purposes), and any fields specified in the semantic configuration of the target index. In this example, assume the semantic configuration in the target index has a "title" field, a "terms" field, and a "content" field.

  • In this preview, content.type has one valid value: text.

  • The maxOutputSize property on the knowledge base determines the length of the string.

Activity array

The activity array outputs the query plan, which provides operational transparency for tracking operations, billing implications, and resource invocations. It also includes subqueries sent to the retrieval pipeline and errors for any retrieval failures, such as inaccessible knowledge sources.

The output includes the following components:

Section Description
modelQueryPlanning For knowledge bases that use an LLM for query planning, this section reports on the token counts used for input, and the token count for the subqueries.
source-specific activity For each knowledge source included in the query, this section reports on elapsed time and which arguments were used in the query, including semantic ranker. Knowledge source types include searchIndex, azureBlob, and other supported knowledge sources.
agenticReasoning This section reports on token consumption for agentic reasoning during retrieval, which depends on the specified retrieval reasoning effort.
modelAnswerSynthesis For knowledge bases that use answer synthesis, this section reports on the token count for formulating the answer, and the token count of the answer output.

Here's an example of the activity array:

  "activity": [
    {
      "type": "modelQueryPlanning",
      "id": 0,
      "inputTokens": 2302,
      "outputTokens": 109,
      "elapsedMs": 2396
    },
    {
      "type": "searchIndex",
      "id": 1,
      "knowledgeSourceName": "demo-financials-ks",
      "queryTime": "2025-11-04T19:25:23.683Z",
      "count": 26,
      "elapsedMs": 1137,
      "searchIndexArguments": {
        "search": "List of companies in the financial sector according to SEC GICS classification",
        "filter": null,
        "sourceDataFields": [ ],
        "searchFields": [ ],
        "semanticConfigurationName": "en-semantic-config"
      }
    },
    {
      "type": "searchIndex",
      "id": 2,
      "knowledgeSourceName": "demo-healthcare-ks",
      "queryTime": "2025-11-04T19:25:24.186Z",
      "count": 17,
      "elapsedMs": 494,
      "searchIndexArguments": {
        "search": "List of companies in the financial sector according to SEC GICS classification",
        "filter": null,
        "sourceDataFields": [ ],
        "searchFields": [ ],
        "semanticConfigurationName": "en-semantic-config"
      }
    },
    {
      "type": "agenticReasoning",
      "id": 3,
      "retrievalReasoningEffort": {
        "kind": "low"
      },
      "reasoningTokens": 103368
    },
    {
      "type": "modelAnswerSynthesis",
      "id": 4,
      "inputTokens": 5821,
      "outputTokens": 344,
      "elapsedMs": 3837
    }
  ]

References array

The references array comes directly from the underlying grounding data. It includes the sourceData used to generate the response. It consists of every document that the agentic retrieval engine finds and semantically ranks. Fields in the sourceData include an id and semantic fields: title, terms, and content.

The id acts as a reference ID for an item within a specific response. It's not the document key in the search index. You use it for providing citations.

The purpose of this array is to provide a chat message style structure for easy integration. For example, if you want to serialize the results into a different structure or you require some programmatic manipulation of the data before you returned it to the user.

You can also get the structured data from the source data object in the references array to manipulate it however you see fit.

Here's an example of the references array:

  "references": [
    {
      "type": "AzureSearchDoc",
      "id": "0",
      "activitySource": 2,
      "docKey": "earth_at_night_508_page_104_verbalized",
      "sourceData": null
    },
    {
      "type": "AzureSearchDoc",
      "id": "1",
      "activitySource": 2,
      "docKey": "earth_at_night_508_page_105_verbalized",
      "sourceData": null
    }
  ]

Examples

The following examples illustrate different ways to call the retrieve action:

Override default reasoning effort and set request limits

This example specifies answer synthesis, so retrievalReasoningEffort must be "low" or "medium".

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent("What companies are in the financial sector?")
        }
    ) { Role = "user" }
);
retrievalRequest.RetrievalReasoningEffort = new KnowledgeRetrievalLowReasoningEffort();
retrievalRequest.OutputMode = "answerSynthesis";
retrievalRequest.MaxRuntimeInSeconds = 30;
retrievalRequest.MaxOutputSize = 6000;

var result = await kbClient.RetrieveAsync(retrievalRequest);
Console.WriteLine(
    (result.Value.Response[0].Content[0] as KnowledgeBaseMessageTextContent)!.Text
);

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

from azure.search.documents.knowledgebases.models import KnowledgeRetrievalLowReasoningEffort

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="user",
            content=[KnowledgeBaseMessageTextContent(text="What companies are in the financial sector?")],
        )
    ],
    retrieval_reasoning_effort=KnowledgeRetrievalLowReasoningEffort(),
    output_mode="answerSynthesis",
    max_runtime_in_seconds=30,
    max_output_size=6000,
)

result = kb_client.retrieve(retrieval_request=request)
print(result.response[0].content[0].text)

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

POST {{search-url}}/knowledgebases/kb-override/retrieve?api-version={{api-version}}
Authorization: Bearer {{accessToken}}
Content-Type: application/json

{
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "What companies are in the financial sector?" }
            ]
        }
    ],
    "retrievalReasoningEffort": { "kind": "low" },
    "outputMode": "answerSynthesis",
    "maxRuntimeInSeconds": 30,
    "maxOutputSize": 6000
}

Reference: Knowledge Retrieval - Retrieve

Set references for each knowledge source

This example uses the default reasoning effort specified in the knowledge base. The focus of this example is specification of how much information to include in the response.

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent("What companies are in the financial sector?")
        }
    ) { Role = "user" }
);
retrievalRequest.IncludeActivity = true;
// Knowledge source params are configured per source on the request

var result = await kbClient.RetrieveAsync(retrievalRequest);
Console.WriteLine(
    (result.Value.Response[0].Content[0] as KnowledgeBaseMessageTextContent)!.Text
);

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

from azure.search.documents.knowledgebases.models import SearchIndexKnowledgeSourceParams

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="user",
            content=[KnowledgeBaseMessageTextContent(text="What companies are in the financial sector?")],
        )
    ],
    include_activity=True,
    knowledge_source_params=[
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="demo-financials-ks",
            include_references=True,
            include_reference_source_data=True,
        ),
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="demo-communicationservices-ks",
            include_references=False,
            include_reference_source_data=False,
        ),
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="demo-healthcare-ks",
            include_references=True,
            include_reference_source_data=False,
            always_query_source=True,
        ),
    ],
)

result = kb_client.retrieve(retrieval_request=request)
print(result.response[0].content[0].text)

Reference: KnowledgeBaseRetrievalClient, SearchIndexKnowledgeSourceParams

POST {{search-url}}/knowledgebases/kb-medium-example/retrieve?api-version={{api-version}}
Authorization: Bearer {{accessToken}}
Content-Type: application/json

{
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "What companies are in the financial sector?" }
            ]
        }
    ],
    "includeActivity": true,
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": "demo-financials-ks",
            "kind": "searchIndex",
            "includeReferences": true,
            "includeReferenceSourceData": true
        },
        {
            "knowledgeSourceName": "demo-communicationservices-ks",
            "kind": "searchIndex",
            "includeReferences": false,
            "includeReferenceSourceData": false
        },
        {
            "knowledgeSourceName": "demo-healthcare-ks",
            "kind": "searchIndex",
            "includeReferences": true,
            "includeReferenceSourceData": false,
            "alwaysQuerySource": true
        }
    ]
}

Reference: Knowledge Retrieval - Retrieve

Note

For indexed OneLake or indexed SharePoint knowledge sources, set includeReferenceSourceData to true to include source document URLs in citations.

Use minimal reasoning effort

In this example, there's no LLM for intelligent query planning or answer synthesis. The query string goes to the agentic retrieval engine for keyword search or hybrid search.

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Intents.Add(
    new KnowledgeRetrievalSemanticIntent("what is a brokerage")
);

var result = await kbClient.RetrieveAsync(retrievalRequest);
Console.WriteLine(
    (result.Value.Response[0].Content[0] as KnowledgeBaseMessageTextContent)!.Text
);

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

from azure.search.documents.knowledgebases.models import (
    KnowledgeBaseRetrievalRequest,
    KnowledgeRetrievalSemanticIntent,
)

request = KnowledgeBaseRetrievalRequest(
    intents=[
        KnowledgeRetrievalSemanticIntent(
            search="what is a brokerage",
        )
    ]
)

result = kb_client.retrieve(retrieval_request=request)
print(result.response[0].content[0].text)

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

POST {{search-url}}/knowledgebases/kb-minimal/retrieve?api-version={{api-version}}
Authorization: Bearer {{accessToken}}
Content-Type: application/json

{
    "intents": [
        {
            "type": "semantic",
            "search": "what is a brokerage"
        }
    ]
}

Reference: Knowledge Retrieval - Retrieve

Troubleshoot empty responses

A document can be found during the search step but still be omitted from the final response if its grounded content exceeds the maxOutputSize output budget. When this happens, the activity array shows that matches were found, but the references array and grounded response content are empty for that document. No truncation warning or explicit error is returned.

To avoid this behavior, index large source documents as smaller chunks with stable identifiers and source metadata. This applies especially to long manuals, policies, or knowledge base articles.