Add a filter to a vector query in Azure AI Search

Note

prefilter and postfilter are generally available in the latest stable REST API version.

In Azure AI Search, you can use a filter expression to add inclusion or exclusion criteria to a vector query. You can also specify a filtering mode that applies the filter:

  • Before query execution, known as prefiltering.
  • After query execution, known as postfiltering.

This article uses REST for illustration. For code samples in other languages and end-to-end solutions that include vector queries, see the azure-search-vector-samples GitHub repository.

You can also use Search Explorer in the Azure portal to query vector content. If you use the JSON view, you can add filters and specify the filter mode.

How filtering works in vector queries

Filters apply to filterable nonvector fields, either string or numeric, to include or exclude search documents based on filter criteria. Although vector fields aren't filterable, you can use filters on nonvector fields in the same index to include or exclude documents that contain vector fields you're searching on.

If your index lacks suitable text or numeric fields, check for document metadata that might be useful in filtering, such as LastModified or CreatedBy properties.

The vectorFilterMode parameter controls when the filter is applied in the vector search process, with k setting the maximum number of nearest neighbors to return. Depending on the filter mode and how selective your filter is, fewer than k results might be returned.

Define a filter

Filters determine the scope of vector queries and are defined using Documents - Search Post (REST API). Unless you want to use a preview feature, use the latest stable version of the Search Service REST APIs to formulate the request.

This REST API provides:

  • filter for the criteria.
  • vectorFilterMode for pre-query or post-query filtering. For supported modes, see the next section.
POST https://{search-endpoint}/indexes/{index-name}/docs/search?api-version={api-version}
Content-Type: application/json
api-key: {admin-api-key}

    {
        "count": true,
        "select": "title, content, category",
        "filter": "category eq 'Databases'",
        "vectorFilterMode": "preFilter",
        "vectorQueries": [
            {
                "kind": "vector",
                "vector": [
                    -0.009154141,
                    0.018708462,
                    . . . // Trimmed for readability
                    -0.02178128,
                    -0.00086512347
                ],
                "exhaustive": true,
                "fields": "contentVector",
                "k": 5
            }
        ]
    }

In this example, the vector embedding targets the contentVector field, and the filter criteria apply to category, a filterable text field. Because the preFilter mode is used, the filter is applied before the search engine runs the query, so only documents in the Databases category are considered during the vector search.

Set the filter mode

The vectorFilterMode parameter determines when and how the filter is applied relative to vector query execution. There are three modes:

  • preFilter (default)
  • postFilter

Prefiltering applies filters before query execution, which reduces the candidate set for the vector search algorithm. The top-k results are then selected from this filtered set.

In a vector query, preFilter is the default mode because it favors recall and quality over latency.

Diagram of prefilters.

Benchmark testing of prefiltering and postfiltering

Important

This section applies to prefiltering and postfiltering, not strict postfiltering.

To understand the conditions under which one filter mode performs better than the other, we ran a series of tests to evaluate query outcomes over small, medium, and large indexes.

  • Small (100,000 documents, 2.5-GB index, 1,536 dimensions)
  • Medium (1 million documents, 25-GB index, 1,536 dimensions)
  • Large (1 billion documents, 1.9-TB index, 96 dimensions)

For the small and medium workloads, we used a Standard 2 (S2) service with one partition and one replica. For the large workload, we used a Standard 3 (S3) service with 12 partitions and one replica.

Indexes had an identical construction: one key field, one vector field, one text field, and one numeric filterable field. The following index is defined using the 2023-11-03 syntax.

def get_index_schema(self, index_name, dimensions):
    return {
        "name": index_name,
        "fields": [
            {"name": "id", "type": "Edm.String", "key": True, "searchable": True},
            {"name": "content_vector", "type": "Collection(Edm.Single)", "dimensions": dimensions,
              "searchable": True, "retrievable": True, "filterable": False, "facetable": False, "sortable": False,
              "vectorSearchProfile": "defaulthnsw"},
            {"name": "text", "type": "Edm.String", "searchable": True, "filterable": False, "retrievable": True,
              "sortable": False, "facetable": False},
            {"name": "score", "type": "Edm.Double", "searchable": False, "filterable": True,
              "retrievable": True, "sortable": True, "facetable": True}
        ],
      "vectorSearch": {
        "algorithms": [
            {
              "name": "defaulthnsw",
              "kind": "hnsw",
              "hnswParameters": { "metric": "euclidean" }
            }
          ],
          "profiles": [
            {
              "name": "defaulthnsw",
              "algorithm": "defaulthnsw"
            }
        ]
      }
    }

In queries, we used an identical filter for both prefilter and postfilter operations. We used a simple filter to ensure that variations in performance were due to filtering mode, not filter complexity.

Outcomes were measured in queries per second (QPS).

Takeaways

  • Prefiltering is almost always slower than postfiltering, except on small indexes where performance is approximately equal.

  • On larger datasets, prefiltering is orders of magnitude slower.

  • Why is prefilter the default if it's almost always slower? Prefiltering guarantees that k results are returned if they exist in the index, where the bias favors recall and precision over speed.

  • Use postfiltering if you:

    • Value speed over selection (postfiltering can return fewer than k results).

    • Use filters that aren't overly selective.

    • Have indexes of sufficient size such that prefiltering performance is unacceptable.

Details

  • Given a dataset with 100,000 vectors at 1,536 dimensions:

    • When filtering more than 30% of the dataset, prefiltering and postfiltering were comparable.

    • When filtering less than 0.1% of the dataset, prefiltering was about 50% slower than postfiltering.

  • Given a dataset with 1 million vectors at 1,536 dimensions:

    • When filtering more than 30% of the dataset, prefiltering was about 30% slower.

    • When filtering less than 2% of the dataset, prefiltering was about seven times slower.

  • Given a dataset with 1 billion vectors at 96 dimensions:

    • When filtering more than 5% of the dataset, prefiltering was about 50% slower.

    • When filtering less than 10% of the dataset, prefiltering was about seven times slower.

The following graph shows prefilter relative QPS, computed as prefilter QPS divided by postfilter QPS.

Chart showing QPS performance for small, medium, and large indexes for relative QPS.

The vertical axis represents the relative performance of prefiltering compared to postfiltering, expressed as a ratio of QPS (queries per second). For example:

  • A value of 0.0 means prefiltering is 100% slower than postfiltering.
  • A value of 0.5 means prefiltering is 50% slower.
  • A value of 1.0 means prefiltering and post filtering are equivalent.

The horizontal axis represents the filtering rate, or the percentage of candidate documents after applying the filter. For example, a rate of 1.00% means the filter criteria selected one percent of the search corpus.