Models in Unity Catalog example

This example illustrates how to use Models in Unity Catalog to build a machine learning application that forecasts the daily power output of a wind farm. The example shows how to:

  • Track and log models with MLflow
  • Register models to Unity Catalog
  • Describe models and deploy them for inference using aliases
  • Integrate registered models with production applications
  • Search and discover models in Unity Catalog
  • Archive and delete models

The article describes how to perform these steps using the MLflow Tracking and Models in Unity Catalog UIs and APIs.

Requirements

Make sure you meet all the requirements in Requirements. In addition, the code examples in this article assume that you have the following privileges:

  • USE CATALOG privilege on the main catalog.
  • CREATE MODEL and USE SCHEMA privileges on the main.default schema.

Notebook

All of the code in this article is provided in the following notebook.

Models in Unity Catalog example notebook

Get notebook

Install MLflow Python client

This example requires the MLflow Python client version 2.5.0 or above and TensorFlow. Add the following commands at the top of your notebook to install these dependencies.

%pip install --upgrade "mlflow-skinny[databricks]>=2.5.0" tensorflow
dbutils.library.restartPython()

Load dataset, train model, and register to Unity Catalog

This section shows how to load the wind farm dataset, train a model, and register the model to Unity Catalog. The model training run and metrics are tracked in an experiment run.

Load dataset

The following code loads a dataset containing weather data and power output information for a wind farm in the United States. The dataset contains wind direction, wind speed, and air temperature features sampled every six hours (once at 00:00, once at 08:00, and once at 16:00), as well as daily aggregate power output (power), over several years.

import pandas as pd
wind_farm_data = pd.read_csv("https://github.com/dbczumar/model-registry-demo-notebook/raw/master/dataset/windfarm_data.csv", index_col=0)

def get_training_data():
  training_data = pd.DataFrame(wind_farm_data["2014-01-01":"2018-01-01"])
  X = training_data.drop(columns="power")
  y = training_data["power"]
  return X, y

def get_validation_data():
  validation_data = pd.DataFrame(wind_farm_data["2018-01-01":"2019-01-01"])
  X = validation_data.drop(columns="power")
  y = validation_data["power"]
  return X, y

def get_weather_and_forecast():
  format_date = lambda pd_date : pd_date.date().strftime("%Y-%m-%d")
  today = pd.Timestamp('today').normalize()
  week_ago = today - pd.Timedelta(days=5)
  week_later = today + pd.Timedelta(days=5)

  past_power_output = pd.DataFrame(wind_farm_data)[format_date(week_ago):format_date(today)]
  weather_and_forecast = pd.DataFrame(wind_farm_data)[format_date(week_ago):format_date(week_later)]
  if len(weather_and_forecast) < 10:
    past_power_output = pd.DataFrame(wind_farm_data).iloc[-10:-5]
    weather_and_forecast = pd.DataFrame(wind_farm_data).iloc[-10:]

  return weather_and_forecast.drop(columns="power"), past_power_output["power"]

Configure MLflow client to access models in Unity Catalog

By default, the MLflow Python client creates models in the workspace model registry on Azure Databricks. To upgrade to models in Unity Catalog, configure the client to access models in Unity Catalog:

import mlflow
mlflow.set_registry_uri("databricks-uc")

Train and register model

The following code trains a neural network using TensorFlow Keras to predict power output based on the weather features in the dataset and uses MLflow APIs to register the fitted model to Unity Catalog.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

MODEL_NAME = "main.default.wind_forecasting"

def train_and_register_keras_model(X, y):
  with mlflow.start_run():
    model = Sequential()
    model.add(Dense(100, input_shape=(X.shape[-1],), activation="relu", name="hidden_layer"))
    model.add(Dense(1))
    model.compile(loss="mse", optimizer="adam")

    model.fit(X, y, epochs=100, batch_size=64, validation_split=.2)
    example_input = X[:10].to_numpy()
    mlflow.tensorflow.log_model(
        model,
        artifact_path="model",
        input_example=example_input,
        registered_model_name=MODEL_NAME
    )
  return model

X_train, y_train = get_training_data()
model = train_and_register_keras_model(X_train, y_train)

View the model in the UI

You can view and manage registered models and model versions in Unity Catalog using the Catalog Explorer. Look for the model you just created under the main catalog and default schema.

Registered model page

Deploy a model version for inference

Models in Unity Catalog support aliases for model deployment. Aliases provide mutable, named references (for example, “Champion” or “Challenger”) to a particular version of a registered model. You can reference and target model versions using these aliases in downstream inference workflows.

Once you’ve navigated to the registered model in Catalog Explorer, click under the Aliases column to assign the “Champion” alias to the latest model version, and press “Continue” to save changes.

Set registered model alias

Load model versions using the API

The MLflow Models component defines functions for loading models from several machine learning frameworks. For example, mlflow.tensorflow.load_model() is used to load TensorFlow models that were saved in MLflow format, and mlflow.sklearn.load_model() is used to load scikit-learn models that were saved in MLflow format.

These functions can load models from Models in Unity Catalog.

import mlflow.pyfunc

model_version_uri = "models:/{model_name}/1".format(model_name=MODEL_NAME)

print("Loading registered model version from URI: '{model_uri}'".format(model_uri=model_version_uri))
model_version_1 = mlflow.pyfunc.load_model(model_version_uri)

model_champion_uri = "models:/{model_name}@Champion".format(model_name=MODEL_NAME)

print("Loading registered model version from URI: '{model_uri}'".format(model_uri=model_champion_uri))
champion_model = mlflow.pyfunc.load_model(model_champion_uri)

Forecast power output with the champion model

In this section, the champion model is used to evaluate weather forecast data for the wind farm. The forecast_power() application loads the latest version of the forecasting model from the specified stage and uses it to forecast power production over the next five days.

from mlflow.tracking import MlflowClient

def plot(model_name, model_alias, model_version, power_predictions, past_power_output):
  import matplotlib.dates as mdates
  from matplotlib import pyplot as plt
  index = power_predictions.index
  fig = plt.figure(figsize=(11, 7))
  ax = fig.add_subplot(111)
  ax.set_xlabel("Date", size=20, labelpad=20)
  ax.set_ylabel("Power\noutput\n(MW)", size=20, labelpad=60, rotation=0)
  ax.tick_params(axis='both', which='major', labelsize=17)
  ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d'))
  ax.plot(index[:len(past_power_output)], past_power_output, label="True", color="red", alpha=0.5, linewidth=4)
  ax.plot(index, power_predictions.squeeze(), "--", label="Predicted by '%s'\nwith alias '%s' (Version %d)" % (model_name, model_alias, model_version), color="blue", linewidth=3)
  ax.set_ylim(ymin=0, ymax=max(3500, int(max(power_predictions.values) * 1.3)))
  ax.legend(fontsize=14)
  plt.title("Wind farm power output and projections", size=24, pad=20)
  plt.tight_layout()
  display(plt.show())

def forecast_power(model_name, model_alias):
  import pandas as pd
  client = MlflowClient()
  model_version = client.get_model_version_by_alias(model_name, model_alias).version
  model_uri = "models:/{model_name}@{model_alias}".format(model_name=MODEL_NAME, model_alias=model_alias)
  model = mlflow.pyfunc.load_model(model_uri)
  weather_data, past_power_output = get_weather_and_forecast()
  power_predictions = pd.DataFrame(model.predict(weather_data))
  power_predictions.index = pd.to_datetime(weather_data.index)
  print(power_predictions)
  plot(model_name, model_alias, int(model_version), power_predictions, past_power_output)

forecast_power(MODEL_NAME, "Champion")

Add model and model version descriptions using the API

The code in this section shows how you can add model and model version descriptions using the MLflow API.

client = MlflowClient()
client.update_registered_model(
  name=MODEL_NAME,
  description="This model forecasts the power output of a wind farm based on weather data. The weather data consists of three features: wind speed, wind direction, and air temperature."
)

client.update_model_version(
  name=MODEL_NAME,
  version=1,
  description="This model version was built using TensorFlow Keras. It is a feed-forward neural network with one hidden layer."
)

Create a new model version

Classical machine learning techniques are also effective for power forecasting. The following code trains a random forest model using scikit-learn and registers it to Unity Catalog using the mlflow.sklearn.log_model() function.

import mlflow.sklearn
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

with mlflow.start_run():
  n_estimators = 300
  mlflow.log_param("n_estimators", n_estimators)

  rand_forest = RandomForestRegressor(n_estimators=n_estimators)
  rand_forest.fit(X_train, y_train)

  val_x, val_y = get_validation_data()
  mse = mean_squared_error(rand_forest.predict(val_x), val_y)
  print("Validation MSE: %d" % mse)
  mlflow.log_metric("mse", mse)

  example_input = val_x.iloc[[0]]

  # Specify the `registered_model_name` parameter of the `mlflow.sklearn.log_model()`
  # function to register the model to <UC>. This automatically
  # creates a new model version
  mlflow.sklearn.log_model(
    sk_model=rand_forest,
    artifact_path="sklearn-model",
    input_example=example_input,
    registered_model_name=MODEL_NAME
  )

Fetch the new model version number

The following code shows how to retrieve the latest model version number for a model name.

client = MlflowClient()
model_version_infos = client.search_model_versions("name = '%s'" % MODEL_NAME)
new_model_version = max([model_version_info.version for model_version_info in model_version_infos])

Add a description to the new model version

client.update_model_version(
  name=MODEL_NAME,
  version=new_model_version,
  description="This model version is a random forest containing 100 decision trees that was trained in scikit-learn."
)

Mark new model version as Challenger and test the model

Before deploying a model to serve production traffic, it is a best practice to test it in on a sample of production data. Previously, you used the “Champion” alias to denote the model version serving the majority of production workloads. The following code assigns the “Challenger” alias to the new model version, and evaluates its performance.

client.set_registered_model_alias(
  name=MODEL_NAME,
  alias="Challenger",
  version=new_model_version
)

forecast_power(MODEL_NAME, "Challenger")

Deploy the new model version as the Champion model version

After verifying that the new model version performs well in tests, the following code assigns the “Champion” alias to the new model version and uses the exact same application code from the Forecast power output with the champion model section to produce a power forecast.

client.set_registered_model_alias(
  name=MODEL_NAME,
  alias="Champion",
  version=new_model_version
)

forecast_power(MODEL_NAME, "Champion")

There are now two model versions of the forecasting model: the model version trained in Keras model and the version trained in scikit-learn. Note that the “Challenger” alias remains assigned to the new scikit-learn model version, so any downstream workloads that target the “Challenger” model version continue to run successfully:

Product model versions

Archive and delete models

When a model version is no longer being used, you can delete it. You can also delete an entire registered model; this removes all associated model versions. Note that deleting a model version clears any aliases assigned to the model version.

Delete Version 1 using the MLflow API

client.delete_model_version(
   name=MODEL_NAME,
   version=1,
)

Delete the model using the MLflow API

client = MlflowClient()
client.delete_registered_model(name=MODEL_NAME)