February 2019

These features and Azure Databricks platform improvements were released in February 2019.

Note

The release date and content listed below only corresponds to actual deployment of the Azure Public Cloud in most case.

It provide the evolution history of Azure Databricks service on Azure Public Cloud for your reference that may not be consistent with the actual deployment on Azure operated by 21Vianet.

Note

Releases are staged. Your Azure Databricks account may not be updated until up to a week after the initial release date.

Databricks Light generally available

February 26 - March 5, 2019: Version 2.92

Databricks Light (also known as Data Engineering Light) is now available. Databricks Light is the Databricks packaging of the open source Apache Spark runtime. It provides a runtime option for jobs that don't need the advanced performance, reliability, or autoscaling benefits provided by Databricks Runtime. You can select Databricks Light only when you create a cluster to run a JAR, Python, or spark-submit job; you cannot select this runtime for clusters on which you run interactive or notebook job workloads. See Databricks Light.

Managed MLflow on Azure Databricks Public Preview

February 26 - March 5, 2019: Version 2.92

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It tackles three primary functions:

  • Tracking experiments to record and compare parameters and results.
  • Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms.
  • Packaging ML code in a reusable, reproducible form to share with other data scientists or transfer to production.

Azure Databricks now provides a fully managed and hosted version of MLflow integrated with enterprise security features, high availability, and other Azure Databricks workspace features such as experiment management, run management, and notebook revision capture. MLflow on Azure Databricks offers an integrated experience for tracking and securing machine learning model training runs and running machine learning projects. By using managed MLflow on Azure Databricks, you get the advantages of both platforms, including:

  • Workspaces: Collaboratively track and organize experiments and results within Azure Databricks Workspaces with a hosted MLflow Tracking Server and integrated experiment UI. When you use MLflow in notebooks, Azure Databricks automatically captures notebook revisions so you can reproduce the same code and runs later.
  • Security: Take advantage of one common security model for the entire ML lifecycle via ACLs.
  • Jobs: Run MLflow projects as Azure Databricks jobs remotely and directly from Azure Databricks notebooks.

Here's a demo of a tracking workflow in an Azure Databricks Workspace:

Track runs and organize experiment workflow

For details, see Track model development using MLflow.

Azure Data Lake Storage connector is generally available

February 15, 2019