Deploy models for batch inference and prediction
This article describes what Databricks recommends for batch inference.
Important
This feature is in Public Preview.
Databricks recommends using ai_query
with Model Serving for batch inference. ai_query
is a built-in Databricks SQL function that allows you to query existing model serving endpoints using SQL. It has been verified to reliably and consistently process datasets in the range of billions of tokens.
For quick experimentation, ai_query
can be used for batch LLM inference with pay-per-token endpoints , which are pre-configured on your workspace.
When you are ready to run batch LLM inference on large or production data, Databricks recommends using provisioned throughput endpoints for faster performance.
For a traditional ML model batch inference example, see the following notebook: