Manage model lifecycle in Unity Catalog
Important
- This article documents Models in Unity Catalog, which Databricks recommends for governing and deploying models. If your workspace is not enabled for Unity Catalog, the functionality on this page is not available. Instead, see Manage model lifecycle using the Workspace Model Registry (legacy). For guidance on how to upgrade from the Workspace Model Registry to Unity Catalog, see Migrate workflows and models to Unity Catalog.
This article describes how to use Models in Unity Catalog as part of your machine learning workflow to manage the full lifecycle of ML models. Databricks provides a hosted version of MLflow Model Registry in Unity Catalog. Models in Unity Catalog extends the benefits of Unity Catalog to ML models, including centralized access control, auditing, lineage, and model discovery across workspaces. Models in Unity Catalog is compatible with the open-source MLflow Python client.
Key features of models in Unity Catalog include:
Namespacing and governance for models, so you can group and govern models at the environment, project, or team level ("Grant data scientists read-only access to production models").
Chronological model lineage (which MLflow experiment and run produced the model at a given time).
Model versioning.
Model deployment via aliases. For example, mark the "Champion" version of a model within your
prod
catalog.
If your workspace's default catalog is configured to a catalog in Unity Catalog, models registered using MLflow APIs such as mlflow.<model-type>.log_model(..., registered_model_name)
or mlflow.register_model(model_uri, name)
are registered to Unity Catalog by default.
This article includes instructions for both the Models in Unity Catalog UI and API.
For an overview of Model Registry concepts, see ML lifecycle management using MLflow.
Requirements
Unity Catalog must be enabled in your workspace. See Get started using Unity Catalog to create a Unity Catalog Metastore, enable it in a workspace, and create a catalog. If Unity Catalog is not enabled, you can still use the classic workspace model registry.
Your workspace must be attached to a Unity Catalog metastore that supports privilege inheritance. This is true for all metastores created after August 25, 2022. If running on an older metastore, follow docs to upgrade.
You must have access to run commands on a cluster with access to Unity Catalog.
To create new registered models, you need the
CREATE_MODEL
privilege on a schema, in addition to theUSE SCHEMA
andUSE CATALOG
privileges on the schema and its enclosing catalog.CREATE_MODEL
is a new schema-level privilege that you can grant using the Catalog Explorer UI or the SQL GRANT command, as shown below.GRANT CREATE_MODEL ON SCHEMA <schema-name> TO <principal>
Upgrade training workloads to Unity Catalog
This section includes instructions to upgrade existing training workloads to Unity Catalog.
Install MLflow Python client
You can also use models in Unity Catalog on Databricks Runtime 11.3 LTS and above by installing the latest version of the MLflow Python client in your notebook, using the code below.
%pip install --upgrade "mlflow-skinny[databricks]"
dbutils.library.restartPython()
Configure MLflow client to access models in Unity Catalog
By default, the MLflow Python client creates models in the Databricks workspace model registry. To upgrade to models in Unity Catalog, configure the MLflow client:
import mlflow
mlflow.set_registry_uri("databricks-uc")
Note
If your workspace's default catalog is in Unity Catalog (rather than hive_metastore
) and you are running a cluster using Databricks Runtime 13.3 LTS or above (Databricks Runtime 15.0 or above in Azure operated by 21Vianet regions), models are automatically created in and loaded from the default catalog, with no configuration required. There is no change in behavior for other Databricks Runtime versions. A small number of workspaces where both the default catalog was configured to a catalog in Unity Catalog prior to January 2024 and the workspace model registry was used prior to January 2024 are exempt from this behavior.
Train and register Unity Catalog-compatible models
Permissions required: To create a new registered model, you need the CREATE_MODEL
and USE SCHEMA
privileges on the enclosing schema, and USE CATALOG
privilege on the enclosing catalog. To create new model versions under a registered model, you must be the owner of the registered model and have USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
ML model versions in UC must have a model signature. If you're not already logging MLflow models with signatures in your model training workloads, you can either:
- Use Databricks autologging, which automatically logs models with signatures for many popular ML frameworks. See supported frameworks in the MLflow docs.
- With MLflow 2.5.0 and above, you can specify an input example in your
mlflow.<flavor>.log_model
call, and the model signature is automatically inferred. For further information, refer to the MLflow documentation.
Then, pass the three-level name of the model to MLflow APIs, in the form <catalog>.<schema>.<model>
.
The examples in this section create and access models in the ml_team
schema under the prod
catalog.
The model training examples in this section create a new model version and register it in the prod
catalog. Using the prod
catalog doesn't necessarily mean that the model version serves production traffic. The model version's enclosing catalog, schema, and registered model reflect its environment (prod
) and associated governance rules (for example, privileges can be set up so that only admins can delete from the prod
catalog), but not its deployment status. To manage the deployment status, use model aliases.
Register a model to Unity Catalog using autologging
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
# Train a sklearn model on the iris dataset
X, y = datasets.load_iris(return_X_y=True, as_frame=True)
clf = RandomForestClassifier(max_depth=7)
clf.fit(X, y)
# Note that the UC model name follows the pattern
# <catalog_name>.<schema_name>.<model_name>, corresponding to
# the catalog, schema, and registered model name
# in Unity Catalog under which to create the version
# The registered model will be created if it doesn't already exist
autolog_run = mlflow.last_active_run()
model_uri = "runs:/{}/model".format(autolog_run.info.run_id)
mlflow.register_model(model_uri, "prod.ml_team.iris_model")
Register a model to Unity Catalog with automatically inferred signature
Support for automatically inferred signatures is available in MLflow version 2.5.0 and above, and is supported in Databricks Runtime 11.3 LTS ML and above. To use automatically inferred signatures, use the following code to install the latest MLflow Python client in your notebook:
%pip install --upgrade "mlflow-skinny[databricks]"
dbutils.library.restartPython()
The following code shows an example of an automatically inferred signature.
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
with mlflow.start_run():
# Train a sklearn model on the iris dataset
X, y = datasets.load_iris(return_X_y=True, as_frame=True)
clf = RandomForestClassifier(max_depth=7)
clf.fit(X, y)
# Take the first row of the training dataset as the model input example.
input_example = X.iloc[[0]]
# Log the model and register it as a new version in UC.
mlflow.sklearn.log_model(
sk_model=clf,
artifact_path="model",
# The signature is automatically inferred from the input example and its predicted output.
input_example=input_example,
registered_model_name="prod.ml_team.iris_model",
)
Track the data lineage of a model in Unity Catalog
Note
Support for table to model lineage in Unity Catalog is available in MLflow 2.11.0 and above.
When you train a model on a table in Unity Catalog, you can track the lineage of the model to the upstream dataset(s) it was trained and evaluated on. To do this, use mlflow.log_input. This saves the input table information with the MLflow run that generated the model. Data lineage is also automatically captured for models logged using feature store APIs. See Feature governance and lineage.
When you register the model to Unity Catalog, lineage information is automatically saved and is visible in the Lineage tab of the model version UI in Catalog Explorer.
The following code shows an example.
import mlflow
import pandas as pd
import pyspark.pandas as ps
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestRegressor
# Write a table to Unity Catalog
iris = load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df.rename(
columns = {
'sepal length (cm)':'sepal_length',
'sepal width (cm)':'sepal_width',
'petal length (cm)':'petal_length',
'petal width (cm)':'petal_width'},
inplace = True
)
iris_df['species'] = iris.target
ps.from_pandas(iris_df).to_table("prod.ml_team.iris", mode="overwrite")
# Load a Unity Catalog table, train a model, and log the input table
dataset = mlflow.data.load_delta(table_name="prod.ml_team.iris", version="0")
pd_df = dataset.df.toPandas()
X = pd_df.drop("species", axis=1)
y = pd_df["species"]
with mlflow.start_run():
clf = RandomForestRegressor(n_estimators=100)
clf.fit(X, y)
mlflow.log_input(dataset, "training")
View models in the UI
Permissions required: To view a registered model and its model versions in the UI, you need EXECUTE
privilege on the registered model,
plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model
You can view and manage registered models and model versions in Unity Catalog using the Catalog Explorer.
Control access to models
In Unity Catalog, registered models are a subtype of the FUNCTION
securable object. To grant access to a model registered in Unity Catalog, you use GRANT ON FUNCTION
. For details, see Unity Catalog privileges and securable objects. For best practices on organizing models across catalogs and schemas, see Organize your data.
You can configure model permissions programmatically using the Grants REST API. When you configure model permissions, set securable_type
to "FUNCTION"
in REST API requests. For example, use PATCH /api/2.1/unity-catalog/permissions/function/{full_name}
to update registered model permissions.
Deploy and organize models with aliases and tags
Model aliases and tags help you organize and manage models in Unity Catalog.
Model aliases allow you to assign a mutable, named reference to a particular version of a registered model. You can use aliases to indicate the deployment status of a model version. For example, you could allocate a "Champion" alias to the model version currently in production and target this alias in workloads that use the production model. You can then update the production model by reassigning the "Champion" alias to a different model version.
Tags are key-value pairs that you associate with registered models and model versions, allowing you to label and categorize them by function or status. For example, you could apply a tag with key "task"
and value "question-answering"
(displayed in the UI as task:question-answering
) to registered models intended for question answering tasks. At the model version level, you could tag versions undergoing pre-deployment validation with validation_status:pending
and those cleared for deployment with validation_status:approved
.
See the following sections for how to use aliases and tags.
Set and delete aliases on models
Permissions required: Owner of the registered model, plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
You can set, update, and remove aliases for models in Unity Catalog by using Catalog Explorer. You can manage aliases across a registered model in the model details page and configure aliases for a specific model version in the model version details page.
To set, update, and delete aliases using the MLflow Client API, see the examples below:
from mlflow import MlflowClient
client = MlflowClient()
# create "Champion" alias for version 1 of model "prod.ml_team.iris_model"
client.set_registered_model_alias("prod.ml_team.iris_model", "Champion", 1)
# reassign the "Champion" alias to version 2
client.set_registered_model_alias("prod.ml_team.iris_model", "Champion", 2)
# get a model version by alias
client.get_model_version_by_alias("prod.ml_team.iris_model", "Champion")
# delete the alias
client.delete_registered_model_alias("prod.ml_team.iris_model", "Champion")
Set and delete tags on models
Permissions required: Owner of or have APPLY_TAG
privilege on the registered model, plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
See Add and update tags using Catalog Explorer on how to set and delete tags using the UI.
To set and delete tags using the MLflow Client API, see the examples below:
from mlflow import MlflowClient
client = MlflowClient()
# Set registered model tag
client.set_registered_model_tag("prod.ml_team.iris_model", "task", "classification")
# Delete registered model tag
client.delete_registered_model_tag("prod.ml_team.iris_model", "task")
# Set model version tag
client.set_model_version_tag("prod.ml_team.iris_model", "1", "validation_status", "approved")
# Delete model version tag
client.delete_model_version_tag("prod.ml_team.iris_model", "1", "validation_status")
Both registered model and model version tags must meet the platform-wide constraints.
For more details on alias and tag client APIs, see the MLflow API documentation.
Load models for inference
Consume model versions by alias in inference workloads
Permissions required: EXECUTE
privilege on the registered model, plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
You can write batch inference workloads that reference a model version by alias. For example, the snippet below loads and applies the "Champion" model version for batch inference. If the "Champion" version is updated to reference a new model version, the batch inference workload automatically picks it up on its next execution. This allows you to decouple model deployments from your batch inference workloads.
import mlflow.pyfunc
model_version_uri = "models:/prod.ml_team.iris_model@Champion"
champion_version = mlflow.pyfunc.load_model(model_version_uri)
champion_version.predict(test_x)
You can also write deployment workflows to get a model version by alias and update a model serving endpoint to serve that version, using the model serving REST API :
import mlflow
import requests
client = mlflow.tracking.MlflowClient()
champion_version = client.get_model_version_by_alias("prod.ml_team.iris_model", "Champion")
# Invoke the model serving REST API to update endpoint to serve the current "Champion" version
model_name = champion_version.name
model_version = champion_version.version
requests.request(...)
Consume model versions by version number in inference workloads
You can also load model versions by version number:
import mlflow.pyfunc
# Load version 1 of the model "prod.ml_team.iris_model"
model_version_uri = "models:/prod.ml_team.iris_model/1"
first_version = mlflow.pyfunc.load_model(model_version_uri)
first_version.predict(test_x)
Share models across workspaces
Share models with users in the same region
As long as you have the appropriate privileges, you can access models in Unity Catalog from any workspace that is attached to the metastore containing the model. For example, you can access models from the prod
catalog in a dev workspace, to facilitate comparing newly-developed models to the production baseline.
To collaborate with other users (share write privileges) on a registered model you created, you must grant ownership of the model to a group containing yourself and the users you'd like to collaborate with. Collaborators must also have the USE CATALOG
and USE SCHEMA
privileges on the catalog and schema containing the model. See Unity Catalog privileges and securable objects for details.
Share models with users in another region or account
To share models with users in other regions or accounts, use the Delta Sharing Databricks-to-Databricks sharing flow. See Add models to a share (for providers) and Get access in the Databricks-to-Databricks model (for recipients). As a recipient, after you create a catalog from a share, you access models in that shared catalog the same way as any other model in Unity Catalog.
Promote a model across environments
Databricks recommends that you deploy ML pipelines as code. This eliminates the need to promote models across environments, as all production models can be produced through automated training workflows in a production environment.
However, in some cases, it may be too expensive to retrain models across environments. Instead, you can copy model versions across registered models in Unity Catalog to promote them across environments.
You need the following privileges to execute the example code below:
USE CATALOG
on thestaging
andprod
catalogs.USE SCHEMA
on thestaging.ml_team
andprod.ml_team
schemas.EXECUTE
onstaging.ml_team.fraud_detection
.
In addition, you must be the owner of the registered model prod.ml_team.fraud_detection
.
The following code snippet uses the copy_model_version
MLflow Client API, available in MLflow version 2.8.0 and above.
import mlflow
mlflow.set_registry_uri("databricks-uc")
client = mlflow.tracking.MlflowClient()
src_model_name = "staging.ml_team.fraud_detection"
src_model_version = "1"
src_model_uri = f"models:/{src_model_name}/{src_model_version}"
dst_model_name = "prod.ml_team.fraud_detection"
copied_model_version = client.copy_model_version(src_model_uri, dst_model_name)
After the model version is in the production environment, you can perform any necessary pre-deployment validation. Then, you can mark the model version for deployment using aliases.
client = mlflow.tracking.MlflowClient()
client.set_registered_model_alias(name="prod.ml_team.fraud_detection", alias="Champion", version=copied_model_version.version)
In the example above, only users who can read from the staging.ml_team.fraud_detection
registered model and write to the prod.ml_team.fraud_detection
registered model can promote staging models to the production environment. The same users can also use aliases to manage which model versions are deployed within the production environment. You don't need to configure any other rules or policies to govern model promotion and deployment.
You can customize this flow to promote the model version across multiple environments that match your setup, such as dev
, qa
, and prod
. Access control is enforced as configured in each environment.
Annotate a model or model version
Permissions required: Owner of the registered model, plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
You can provide information about a model or model version by annotating it. For example, you may want to include an overview of the problem or information about the methodology and algorithm used.
Annotate a model or model version using the UI
See Add comments to data.
Annotate a model or model version using the API
To update a registered model description, use the MLflow Client API update_registered_model()
method:
client = MlflowClient()
client.update_registered_model(
name="<model-name>",
description="<description>"
)
To update a model version description, use the MLflow Client API update_model_version()
method:
client = MlflowClient()
client.update_model_version(
name="<model-name>",
version=<model-version>,
description="<description>"
)
Rename a model
Permissions required: Owner of the registered model, CREATE_MODEL
privilege on the schema containing the registered model, and USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
To rename a registered model, use the MLflow Client API rename_registered_model()
method:
client=MlflowClient()
client.rename_registered_model("<full-model-name>", "<new-model-name>")
Delete a model or model version
Permissions required: Owner of the registered model, plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
You can delete a registered model or a model version within a registered model using the Catalog Explorer UI or the API.
Delete a model version or model using the API
Warning
You cannot undo this action. When you delete a model, all model artifacts stored by Unity Catalog and all the metadata associated with the registered model are deleted.
Delete a model version
To delete a model version, use the MLflow Client API delete_model_version()
method:
# Delete versions 1,2, and 3 of the model
client = MlflowClient()
versions=[1, 2, 3]
for version in versions:
client.delete_model_version(name="<model-name>", version=version)
Delete a model
To delete a model, use the MLflow Client API delete_registered_model()
method:
client = MlflowClient()
client.delete_registered_model(name="<model-name>")
List and search models
You can list registered models in Unity Catalog with MLflow's search_registered_models() Python API:
client=MlflowClient()
client.search_registered_models()
You can also search for a specific model name and list its version details using the search_model_versions()
method:
from pprint import pprint
client=MlflowClient()
[pprint(mv) for mv in client.search_model_versions("name='<model-name>'")]
Note
Not all search API fields and operators are supported for models in Unity Catalog. See Limitations for details.
Register a model version
You can register a model version from an experiment run in Unity Catalog.
Register a model version using the UI
Follow these steps:
From the experiment run page, click Register model in the upper-right corner of the UI.
In the dialog, select Unity Catalog, and select a destination model from the drop down list.
Click Register.
Registering a model can take time. To monitor progress, navigate to the destination model in Unity Catalog and refresh periodically.
Register a model version using the API
To register a model version, use MLflow Client API register_model()
method. See mlflow.register_model.
mlflow.register_model(
"runs:/<run_uuid>/model",
"<catalog_name>.<schema>.<model_name>"
)
Copy a model version
You can copy a model version from one model to another in Unity Catalog.
Copy a model version using the UI
Follow these steps:
From the model version page, click Copy this version in the upper-right corner of the UI.
Select a destination model from the drop down list and click Copy.
Copying a model can take time. To monitor progress, navigate to the destination model in Unity Catalog and refresh periodically.
Copy a model version using the API
To copy a model version, use the MLflow's copy_model_version() Python API:
client = MlflowClient()
client.copy_model_version(
"models:/<source-model-name>/<source-model-version>",
"<destination-model-name>",
)
Download model files (advanced use case)
In most cases, to load models, you should use MLflow APIs like mlflow.pyfunc.load_model
or mlflow.<flavor>.load_model
(for example, mlflow.transformers.load_model
for HuggingFace models).
In some cases you may need to download model files to debug model behavior or model loading issues. You can download model files using mlflow.artifacts.download_artifacts
, as follows:
import mlflow
mlflow.set_registry_uri("databricks-uc")
model_uri = f"models:/{model_name}/{version}" # reference model by version or alias
destination_path = "/local_disk0/model"
mlflow.artifacts.download_artifacts(artifact_uri=model_uri, dst_path=destination_path)
Example
This example illustrates how to use Models in Unity Catalog to build a machine learning application.
Models in Unity Catalog example
Migrate workflows and models to Unity Catalog
Databricks recommends using Models in Unity Catalog for improved governance, easy sharing across workspaces and environments, and more flexible MLOps workflows. The table compares the capabilities of the Workspace Model Registry and Unity Catalog.
Capability | Workspace Model Registry (legacy) | Models in Unity Catalog (recommended) |
---|---|---|
Reference model versions by named aliases | Model Registry Stages: Move model versions into one of four fixed stages to reference them by that stage. Cannot rename or add stages. | Model Registry Aliases: Create up to 10 custom and reassignable named references to model versions for each registered model. |
Create access-controlled environments for models | Model Registry Stages: Use stages within one registered model to denote the environment of its model versions, with access controls for only two of the four fixed stages (Staging and Production ). |
Registered Models: Create a registered model for each environment in your MLOps workflow, utilizing three-level namespaces and permissions of Unity Catalog to express governance. |
Promote models across environments (deploy model) | Use the transition_model_version_stage() MLflow Client API to move a model version to a different stage, potentially breaking workflows that reference the previous stage. |
Use the copy_model_version() MLflow Client API to copy a model version from one registered model to another. |
Access and share models across workspaces | Manually export and import models across workspaces, or configure connections to remote model registries using personal access tokens and workspace secret scopes. | Out of the box access to models across workspaces in the same account. No configuration required. |
Configure permissions | Set permissions at the workspace-level. | Set permissions at the account-level, which applies consistent governance across workspaces. |
Access models in the Databricks markplace | Unavailable. | Load models from the Databricks marketplace into your Unity Catalog metastore and access them across workspaces. |
The articles linked below describe how to migrate workflows (model training and batch inference jobs) and models from the Workspace Model Registry to Unity Catalog.
Limitations
- Stages are not supported for models in Unity Catalog. Databricks recommends using the three-level namespace in Unity Catalog to express the environment a model is in, and using aliases to promote models for deployment. See Promote a model across environments for details.
- Webhooks are not supported for models in Unity Catalog. See suggested alternatives in the upgrade guide.
- Some search API fields and operators are not supported for models in Unity Catalog. This can be mitigated by calling the search APIs using supported filters and scanning the results. Following are some examples:
- The
order_by
parameter is not supported in the search_model_versions or search_registered_models client APIs. - Tag-based filters (
tags.mykey = 'myvalue'
) are not supported forsearch_model_versions
orsearch_registered_models
. - Operators other than exact equality (for example,
LIKE
,ILIKE
,!=
) are not supported forsearch_model_versions
orsearch_registered_models
. - Searching registered models by name (for example,
MlflowClient().search_registered_models(filter_string="name='main.default.mymodel'")
is not supported. To fetch a particular registered model by name, use get_registered_model.
- The
- Email notifications and comment discussion threads on registered models and model versions are not supported in Unity Catalog.
- The activity log is not supported for models in Unity Catalog. However, you can track activity on models in Unity Catalog using audit logs.
search_registered_models
might return stale results for models shared through Delta Sharing. To ensure the most recent results, use the Databricks CLI or SDK to list the models in a schema.