September 2020
These features and Azure Databricks platform improvements were released in September 2020.
Note
The release date and content listed below only corresponds to actual deployment of the Azure Public Cloud in most case.
It provide the evolution history of Azure Databricks service on Azure Public Cloud for your reference that may not be suitable for Azure operated by 21Vianet.
Note
Releases are staged. Your Azure Databricks account may not be updated until up to a week after the initial release date.
Databricks Runtime 7.3, 7.3 ML, and 7.3 Genomics are now GA
September 24, 2020
Databricks Runtime 7.3, Databricks Runtime 7.3 for Machine Learning, and Databricks Runtime 7.3 for Genomics are now generally available. They bring many features and improvements, including:
- Delta Lake performance optimizations significantly reduce overhead
- Clone metrics
- Delta Lake
MERGE INTO
improvements - Specify the initial position for Delta Lake Structured Streaming
- Auto Loader improvements
- Adaptive query execution
- Azure Synapse Analytics connector column length control
- Improved behavior of
dbutils.credentials.showRoles
- Simplified pandas to Spark DataFrame conversion
- New
maxResultSize
intoPandas()
call - Debuggability of pandas and PySpark UDFs
- (ML only) Conda activation on workers
- (Genomics only) Support for reading BGEN files with uncompressed or zstd-compressed genotypes
- Library upgrades
For more information, see Databricks Runtime 7.3 LTS (EoS) and Databricks Runtime 7.3 LTS for Machine Learning (EoS).
Single Node clusters (Public Preview)
September 23-29, 2020: Version 3.29
A Single Node cluster is a cluster consisting of a Spark driver and no Spark workers. In contrast, Standard mode clusters require at least one Spark worker to run Spark jobs. Single Node mode clusters are helpful in the following situations:
- Running single node machine learning workloads that need Spark to load and save data
- Lightweight exploratory data analysis (EDA)
For details, see Single-node or multi-node compute.
DBFS REST API rate limiting
September 23-29, 2020: Version 3.29
To ensure high quality of service under heavy load, Azure Databricks is now enforcing API rate limits for DBFS API calls. Limits are set per workspace to ensure fair usage and high availability. Automatic retries are available using Databricks CLI version 0.12.0 and above. We advise all customers to switch to the latest Databricks CLI version.
New sidebar icons
September 23-29, 2020
We've updated the sidebar in the Azure Databricks workspace UI. No big deal, but we think the new icons look pretty nice.
Running jobs limit increase
September 23-29, 2020: Version 3.29
The concurrent running job run limit has been increased from 150 to 1000 per workspace. No longer will runs over 150 be queued in the pending state. Instead of a queue for run requests above concurrent runs, a 429 Too Many Requests
response is returned when you request a run that cannot be started immediately. This limit increase was rolled out gradually and is now available on all workspaces in all regions.
Artifact access control lists (ACLs) in MLflow
September 23-29, 2020: Version 3.29
MLflow Experiment permissions are now enforced on artifacts in MLflow Tracking, enabling you to easily control access to your models, datasets, and other files. By default, when you create a new experiment, its run artifacts are now stored in an MLflow-managed location. The four MLflow Experiment permissions levels (NO PERMISSIONS, CAN READ, CAN EDIT, and CAN MANAGE) automatically apply to run artifacts stored in MLflow-managed locations as follows:
- CAN EDIT or CAN MANAGE permissions are required to log run artifacts to an experiment.
- CAN READ permissions are required to list and download run artifacts from an experiment.
For more information, see MLflow experiment ACLs.
MLflow usability improvements
September 23-29, 2020: Version 3.29
This release includes the following MLflow usability improvements:
- The MLflow Experiment and Registered Models pages now have tips to help new users get started.
- The model version table now shows the description text for a model version. A new column shows the first 32 characters or the first line (whichever is shorter) of the description.
New Azure Databricks Power BI connector (Public Preview)
September 22, 2020
Power BI Desktop version 2.85.681.0 includes a new Azure Databricks Power BI connector that makes the integration between Azure Databricks and Power BI far more seamless and reliable. The new connector comes with the following improvements:
- Simple connection configuration: the new Power BI Azure Databricks connector is integrated into Power BI, and you configure it using a simple dialog with a couple of clicks.
- Authentication based on Microsoft Entra ID credentials—no more need for administrators to configure PAT tokens.
- Faster imports and optimized metadata calls, thanks to the new Azure Databricks ODBC driver, which comes with significant performance improvements.
- Access to Azure Databricks data through Power BI respects Azure Databricks table access control and Azure storage account permissions associated with your Microsoft Entra ID identity.
For more information, see Connect Power BI to Azure Databricks.
Use customer-managed keys for DBFS root (Public Preview)
September 15, 2020
You can now use your own encryption key in Azure Key Vault to encrypt the DBFS storage account. See Customer-managed keys for DBFS root.
New JDBC and ODBC drivers bring faster and lower latency BI
September 15, 2020
We have released new versions of the Databricks JDBC and ODBC drivers (download) with the following improvements:
- Performance: Reduced connection and short query latency, improved result transfer speed based on Apache Arrow serialization and improved metadata retrieval performance.
- User experience: Authentication using Microsoft Entra ID OAuth2 access tokens, improved error messages and auto-retry when connecting to a shutdown cluster, more robust handling of retries on intermittent network errors.
- Support for connections using HTTP proxy.
For more information about connecting to BI tools using JDBC and ODBC, see Databricks ODBC and JDBC Drivers.
MLflow Model Serving (Public Preview)
September 9-15, 2020: Version 3.28
MLflow Model Serving is now available in Public Preview. MLflow Model Serving allows you to deploy a MLflow model registered in Model Registry as a REST API endpoint hosted and managed by Azure Databricks. When you enable model serving for a registered model, Azure Databricks creates a cluster and deploys all non-archived versions of that model.
You can query all model versions by REST API requests with standard Azure Databricks authentication. Model access rights are inherited from the Model Registry — anyone with read rights for a registered model can query any of the deployed model versions. While this service is in preview, we recommend its use for low throughput and non-critical applications.
For more information, see Legacy MLflow Model Serving on Azure Databricks.
Clusters UI improvements
September 9-15, 2020: Version 3.28
The Clusters page now has separate tabs for All-Purpose Clusters and Job Clusters. The list on each tab is now paginated. In addition, we have fixed the delay that sometimes occurred between creating a cluster and being able to see it in the UI.
Visibility controls for jobs, clusters, notebooks, and other workspace objects
September 9-15, 2020: Version 3.28
By default, any user can see all jobs, clusters, notebooks, and folders in their workspace displayed in the Azure Databricks UI and can list them using the Databricks API, even when access control is enabled for those objects and a user has no permissions on those objects.
Now any Azure Databricks admin can enable visibility controls for notebooks and folders (workspace objects), clusters, and jobs to ensure that users can view only those objects that they have been given access to through workspace, cluster, or jobs access control.
See Access controls lists can no longer be disabled.
Ability to create tokens no longer permitted by default
September 9-15, 2020: Version 3.28
For workspaces created after the release of Azure Databricks platform version 3.28, users will no longer have the ability to generate personal access tokens by default. Admins must explicitly grant those permissions, whether to the entire users
group or on a user-by-user or group-by-group basis. Workspaces created before 3.28 was released will maintain the permissions that were already in place.
See Monitor and manage access to personal access tokens.
MLflow Model Registry supports sharing of models across workspaces
September 9, 2020
Azure Databricks now supports access to the model registry from multiple workspaces. You can now register models, track model runs, and load models across workspaces. Multiple teams can now share access to models, and organizations can use multiple workspaces to handle the different stages of development. For details, see Share models across workspaces.
This functionality requires MLflow Python client version 1.11.0 or above.
Databricks Runtime 7.3 (Beta)
September 3, 2020
Databricks Runtime 7.3, Databricks Runtime 7.3 for Machine Learning, and Databricks Runtime 7.3 for Genomics are now available as Beta releases.
For information, see Databricks Runtime 7.3 LTS (EoS) and Databricks Runtime 7.3 LTS for Machine Learning (EoS).
Azure Databricks workload type name change
September 1, 2020
The names of the workload types used by your clusters have been changed:
- Data Engineering -> Jobs Compute
- Data Engineering Light -> Jobs Light Compute
- Data Analytics -> All-purpose Compute
These new names will appear on invoices and in the EA portal in combination with your pricing plan (for example, "Premium - Jobs Compute - DBU"). For details, see Azure Databricks Meters.
The user interface has also changed in platform version 3.27 (targeted for staged release between Aug 25 - Sept 3):
On the Clusters page, the list headings have changed:
- Interactive Clusters -> All-Purpose Clusters
- Automated Clusters -> Job Clusters
When you configure a cluster for a job, the Cluster Type options have changed:
- New Automated Cluster -> New Job Cluster
- Existing Interactive Cluster -> Existing All-Purpose Cluster