What is a knowledge source?

Note

This feature is currently in public preview. This preview is provided without a service-level agreement and isn't recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Azure Previews.

A knowledge source specifies the content used for agentic retrieval. It either encapsulates a search index which is populated by an external data source, or it's a direct connection to a remote source such as Bing or Sharepoint that's queried directly. A knowledge source is a required definition in a knowledge base.

Create a knowledge source as a top-level resource on your search service. Each knowledge source points to exactly one data structure, either a search index that meets the criteria for agentic retrieval or a supported external resource.
Reference one or more knowledge sources in a knowledge base. In an agentic retrieval pipeline, it's possible to query against multiple knowledge sources in a single request. Subqueries are generated for each knowledge source. Top results are returned in the retrieval response.
For certain knowledge sources, you can use a knowledge source definition to generate a full indexer pipeline (data source, skillset, indexer, and index) that works for agentic retrieval. Instead of creating multiple objects manually, information in the knowledge source is used to generate all objects, including a populated, chunked, and searchable index.

Make sure you have at least one knowledge source before creating a knowledge base. The full specification of a knowledge source and a knowledge base can be found in the preview REST API reference.

Working with a knowledge source

Creation path: first create a knowledge source, then create a knowledge base.
Deletion path: update or delete knowledge bases to remove references to a knowledge source, and then delete the knowledge source last.
A knowledge source, its index, and the knowledge base must all exist on the same search service. External content is either accessed over the public internet (Bing) or in a Azure tenant (remote SharePoint).

Supported knowledge sources

Here are the knowledge sources you can create in this preview:

"searchIndex" API wraps an existing index.
"azureBlob" API generates an indexer pipeline that pulls from a blob container.
"webParameters" API retrieves real-time grounding data from Azure Bing.

Creating knowledge sources

Knowledge sources are created as standalone objects and then specified in a knowledge base in a "knowledgeSources" array.

You must have Search Service Contributor permissions to create objects on a search service. You also need Search Index Data Contributor permissions to load an index if you're using a knowledge source that creates an indexer pipeline. Alternatively, you can use an API admin key instead of roles.

You can use the REST API or an Azure SDK preview package to create a knowledge source. Azure portal support is available for select knowledge sources. The following links provide instructions for creating a knowledge source:

After the knowledge source is created, you can reference it in a knowledge base.

Using knowledge sources

Knowledge source usage is either explicitly controlled, such as when you set alwaysQuery on the knowledge source definition, or subject to selection logic during query planning. Query planning occurs when you use a low or medium retrieval reasoning effort. For a minimal reasoning effort, all knowledge sources listed in the knowledge base are in scope for every query. For low and medium, the knowledge base and the LLM can determine at query time which knowledge sources are likely to provide the best search corpus.

Knowledge source selection logic is based on these factors:

Is alwaysQuery set? If yes, the knowledge source is always used on every query.
The name of the knowledge source.
The description of an index, assuming an indexed knowledge source.
The retrievalInstructions specified in the retrieve action or in the knowledge base definition provides guidance that includes or excludes a knowledge source. It's similar to a prompt. You can specify brevity, tone, and formatting as a retrieval instruction.
outputMode on a knowledge base also affects query output and what goes in the response.

Use a retrieval reasoning effort to control LLM usage

Not all solutions benefit from LLM query planning and execution. If simplicity and speed outweigh the benefits the LLM query planning and context engineering provide, you can specify a minimal reasoning effort to prevent LLM processing in your pipeline.

For low and medium, the level of LLM processing is either a balanced or maximal approach that improves relevance. For more information, see Set the retrieval reasoning effort.