Publish features to an online store
This article describes how to publish features to an online store for real-time serving.
Databricks Feature Store supports these online stores:
Online store provider | Publish with Feature Engineering in Unity Catalog | Publish with Workspace Feature Store | Feature lookup in Legacy MLflow Model Serving | Feature lookup in Model Serving |
---|---|---|---|---|
Azure Cosmos DB [1] | X | X (Feature Store client v0.5.0 and above) | X | X |
Azure MySQL (Single Server) | X | X | ||
Azure SQL Server | X |
Cosmos DB compatibility notes
This section includes some important things to keep in mind when using Databricks Feature Store with Cosmos DB.
Unity Catalog-enabled workspaces
In Databricks Runtime 12.2 LTS ML and below, the Cosmos DB online store provider is not compatible with Unity Catalog-enabled workspaces. Both Unity Catalog and the official Cosmos DB Spark connector modify Spark catalogs. When you publish features to Cosmos DB from a Unity Catalog-enabled workspace on a cluster running Databricks Runtime 12.2 LTS ML or below, there might be a write conflict that causes the Feature Store publish to Cosmos DB to fail.
To use Cosmos DB in a Unity Catalog-enabled workspace, you must use a cluster running Databricks Runtime 13.0 ML or above, or a cluster running Databricks Runtime 11.3 LTS ML or above with the cluster policy Unrestricted or Shared Compute.
Spark connector
To use Azure Cosmos DB, the account must be created with the Core (SQL) API and the networking connectivity method must be set to All networks. The appropriate Azure Cosmos DB Spark 3 OLTP Connector for SQL API must be installed on the cluster. Databricks recommends that you install the latest connector version for Spark 3.2 until a connector for Spark 3.3 is released.
Do not manually create a database or container - use publish_table()
The Cosmos DB online store uses a different schema than the offline store. Specifically, in the online store, primary keys are stored as a combined key in the column _feature_store_internal__primary_keys
.
To ensure that Feature Store can access the Cosmos DB online store, you must create the table in the online store by using publish_table()
. Do not manually create a database or container inside Cosmos DB. publish_table()
does that for you automatically.
Publish batch-computed features to an online store
You can create and schedule a Databricks job to regularly publish updated features. This job can also include the code to calculate the updated features, or you can create and run separate jobs to calculate and publish feature updates.
For SQL stores, the following code assumes that an online database named "recommender_system" already exists in the online store and matches the name of the offline store. If there is no table named "customer_features" in the database, this code creates one. It also assumes that features are computed each day and stored as a partitioned column _dt
.
The following code assumes that you have created secrets to access this online store.
Cosmos DB
Cosmos DB support is available in all versions of Feature Engineering in Unity Catalog client, and Feature Store client v0.5.0 and above.
import datetime
from databricks.feature_engineering.online_store_spec import AzureCosmosDBSpec
# or databricks.feature_store.online_store_spec for Workspace Feature Store
online_store = AzureCosmosDBSpec(
account_uri='<account-uri>',
read_secret_prefix='<read-scope>/<prefix>',
write_secret_prefix='<write-scope>/<prefix>'
)
fe.publish_table( # or fs.publish_table for Workspace Feature Store
name='ml.recommender_system.customer_features',
online_store=online_store,
filter_condition=f"_dt = '{str(datetime.date.today())}'",
mode='merge'
)
SQL stores
import datetime
from databricks.feature_engineering.online_store_spec import AzureMySqlSpec
# or databricks.feature_store.online_store_spec for Workspace Feature Store
online_store = AzureMySqlSpec(
hostname='<hostname>',
port='<port>',
read_secret_prefix='<read-scope>/<prefix>',
write_secret_prefix='<write-scope>/<prefix>'
)
fs.publish_table(
name='recommender_system.customer_features',
online_store=online_store,
filter_condition=f"_dt = '{str(datetime.date.today())}'",
mode='merge'
)
Publish streaming features to an online store
To continuously stream features to the online store, set streaming=True
.
fe.publish_table( # or fs.publish_table for Workspace Feature Store
name='ml.recommender_system.customer_features',
online_store=online_store,
streaming=True
)
Publish selected features to an online store
To publish only selected features to the online store, use the features
argument to specify the feature name(s) to publish. Primary keys and timestamp keys are always published. If you do not specify the features
argument or if the value is None, all features from the offline feature table are published.
Note
The entire offline table must be a valid feature table even if you are publishing only a subset of features to an online store. If the offline table contains unsupported data types, you cannot publish a subset of features from that table to an online store.
fe.publish_table( # or fs.publish_table for Workspace Feature Store
name='ml.recommender_system.customer_features',
online_store=online_store,
features=["total_purchases_30d"]
)
Publish a feature table to a specific database
In the online store spec, specify the database name (database_name
) and the table name (table_name
). If you do not specify these parameters, the offline database name and feature table name are used. database_name
must already exist in the online store.
online_store = AzureMySqlSpec(
hostname='<hostname>',
port='<port>',
database_name='<database-name>',
table_name='<table-name>',
read_secret_prefix='<read-scope>/<prefix>',
write_secret_prefix='<write-scope>/<prefix>'
)
Overwrite an existing online feature table or specific rows
Use mode='overwrite'
in the publish_table
call. The online table is completely overwritten by the data in the offline table.
Note
Azure Cosmos DB does not support overwrite mode.
fs.publish_table(
name='recommender_system.customer_features',
online_store=online_store,
mode='overwrite'
)
To overwrite only certain rows, use the filter_condition
argument:
fs.publish_table(
name='recommender_system.customer_features',
online_store=online_store,
filter_condition=f"_dt = '{str(datetime.date.today())}'",
mode='merge'
)
Delete a published table from an online store
With Feature Store client v0.12.0 and above, you can use drop_online_table
to delete a published table from an online store. When you delete a published table with drop_online_table
, the table is deleted from your online store provider and the online store metadata is removed from Databricks.
fe.drop_online_table( # or fs.drop_online_table for Workspace Feature Store
name='recommender_system.customer_features',
online_store = online_store
)
Note
drop_online_table
deletes the published table from the online store. It does not delete the feature table in Databricks.- Before you delete a published table, you should ensure that the table is not used for Model Serving feature lookup and has no other downstream dependencies. The delete is irreversible and might cause dependencies to fail.
- To check for any dependencies, consider rotating the keys for the published table you plan to delete for a day before you execute
drop_online_table
.