AutoML: Improve forecasting with covariates (external regressors)
This article shows you how to use covariates, also known as external regressors, to improve Azure Databricks AutoML forecasting models.
Covariates are additional variables outside the target time series that can improve forecasting models. For example, if you're forecasting hotel occupancy rates, knowing if it's the weekend could help predict customer behavior.
In this example, you:
- Create a randomized time-series dataset.
- Perform basic feature engineering work.
- Store the dataset as a
FeatureStore
table. - Use the
FeatureStore
as covariates in an AutoML forecasting experiment.
Create the data
This example uses randomly generated time series data for hotel occupancy rates in January 2024. Then, use AutoML to predict the occupancy_rate
for the first day of February 2024.
Run the following code to generate the sample data.
df = spark.sql("""SELECT explode(sequence(to_date('2024-01-01'), to_date('2024-01-31'), interval 1 day)) as date, rand() as occupancy_rate FROM (SELECT 1 as id) tmp ORDER BY date""")
display(df)
Feature engineering
Use the sample dataset to feature engineer a feature called is_weekend
that a binary classifier of whether or not a date
is a weekend.
from pyspark.sql.functions import dayofweek, when
def compute_hotel_weekend_features(df):
''' is_weekend feature computation code returns a DataFrame with 'date' as primary key'''
return df.select("date").withColumn(
"is_weekend",
when(dayofweek("date").isin( 1, 2, 3, 4, 5), 0) # Weekday
.when(dayofweek("date").isin(6, 7), 1) # Weekend
)
hotel_weekend_feature_df = compute_hotel_weekend_features(df)
Create the Feature Store
To use covariates on AutoML, you must use a Feature Store to join one or more covariate feature tables with the primary training data in AutoML.
Store the data frame hotel_weather_feature_df
as a Feature Store.
from databricks.feature_engineering import FeatureEngineeringClient
fe = FeatureEngineeringClient()
hotel_weekend_feature_table = fe.create_table(
name='ml.default.hotel_weekend_features', # change to desired location
primary_keys=['date'],
df=hotel_weekend_feature_df,
description='Hotel is_weekend features table'
)
Note
This example uses the Python FeatureEngineeringClient
to create and write tables. However, you can also use SQL or DeltaLiveTables to write and create tables. See Work with feature tables for more options.
Configure the AutoML experiment
Use the feature_store_lookups
parameter to pass the Feature Store to AutoML. feature_store_lookups
contains a dictionary with two fields: table_name
and lookup_key
.
hotel_weekend_feature_lookup = {
"table_name": "ml.default.hotel_weekend_features", # change to location set above
"lookup_key": ["date"]
}
feature_lookups = [hotel_weekend_feature_lookup]
Note
feature_store_lookups
can contain multiple feature table lookups.
Run the AutoML experiment
Use the following code to pass the features_lookups
to an AutoML experiment API call.
from databricks import automl
summary = automl.forecast(dataset=df, target_col="occupancy_rate", time_col="date", frequency="d", horizon=1, timeout_minutes=30, identity_col=None, feature_store_lookups=feature_lookups)