Use open-source machine learning libraries and platforms with Azure Machine Learning

2023-06-30

In this article, learn about open-source Python machine learning libraries and platforms you can use with Azure Machine Learning. Train, deploy, and manage the end-to-end machine learning process using open source projects you prefer. Use development tools, like Jupyter Notebooks and Visual Studio Code, to leverage your existing models and scripts in Azure Machine Learning.

Train open-source machine learning models

The machine learning training process involves the application of algorithms to your data in order to achieve a task or solve a problem. Depending on the problem, you may choose different algorithms that best fit the task and your data. For more information on what you can solve with machine learning, see the deep learning vs machine learning article and the machine learning algorithm cheat sheet.

Classical machine learning: scikit-learn

For training tasks involving classical machine learning algorithms tasks such classification, clustering, and regression you might use something like scikit-learn. To learn how to train a flower classification model, see the how to train with scikit-learn article.

Neural networks: PyTorch, TensorFlow, Keras

Open-source machine learning algorithms known as neural networks, a subset of machine learning, are useful for training deep learning models in Azure Machine Learning.

Open-source deep learning frameworks and how-to guides include:

Transfer learning

Training a deep learning model from scratch often requires large amounts of time, data, and compute resources. You can shortcut the training process by using transfer learning. Transfer learning is a technique that applies knowledge gained from solving one problem to a different but related problem. This means you can take an existing model repurpose it. See the deep learning vs machine learning article to learn more about transfer learning.

Reinforcement learning: Ray RLLib

Reinforcement learning is an artificial intelligence technique that trains models using actions, states, and rewards: Reinforcement learning agents learn to take a set of predefined actions that maximize the specified rewards based on the current state of their environment.

The Ray RLLib project has a set of features that allow for high scalability throughout the training process. The iterative process is both time- and resource-intensive as reinforcement learning agents try to learn the optimal way of achieving a task. Ray RLLib also natively supports deep learning frameworks like TensorFlow and PyTorch.

To learn how to use Ray RLLib with Azure Machine Learning, see the how to train a reinforcement learning model.

Monitor model performance: TensorBoard

Training a single or multiple models requires the visualization and inspection of desired metrics to make sure the model performs as expected. You can use TensorBoard in Azure Machine Learning to track and visualize experiment metrics

Responsible AI: Privacy and fairness

Preserve data privacy with differential privacy

To train a machine learning model, you need data. Sometimes that data is sensitive, and it's important to make sure that the data is secure and private. Differential privacy is a technique of preserving the confidentiality of information in a dataset. To learn more, see the article on preserving data privacy.

Open-source differential privacy toolkits like SmartNoise help you preserve the privacy of data in Azure Machine Learning solutions.

Frameworks for interpretable and fair models

Machine learning systems are used in different areas of society such as banking, education, and healthcare. As such, it's important for these systems to be accountable for the predictions and recommendations they make to prevent unintended consequences.

Open-source frameworks like InterpretML and Fairlearn (https://github.com/fairlearn/fairlearn) work with Azure Machine Learning to create more transparent and equitable machine learning models.

For more information on how to build fair and interpretable models, see the following articles:

Model deployment

Once models are trained and ready for production, you have to choose how to deploy it. Azure Machine Learning provides various deployment targets. For more information, see the where and how to deploy article.

Standardize model formats with ONNX

After training, the contents of the model such as learned parameters are serialized and saved to a file. Each framework has its own serialization format. When working with different frameworks and tools, it means you have to deploy models according to the framework's requirements. To standardize this process, you can use the Open Neural Network Exchange (ONNX) format. ONNX is an open-source format for artificial intelligence models. ONNX supports interoperability between frameworks. This means you can train a model in one of the many popular machine learning frameworks like PyTorch, convert it into ONNX format, and consume the ONNX model in a different framework like ML.NET.

For more information on ONNX and how to consume ONNX models, see the following articles:

Package and deploy models as containers

Container technologies such as Docker are one way to deploy models as web services. Containers provide a platform and resource agnostic way to build and orchestrate reproducible software environments. With these core technologies, you can use preconfigured environments, preconfigured container images or custom ones to deploy your machine learning models to such as Kubernetes clusters. For GPU intensive workflows, you can use tools like NVIDIA Triton Inference server to make predictions using GPUs.

Secure deployments with homomorphic encryption

Securing deployments is an important part of the deployment process. To deploy encrypted inferencing services, use the encrypted-inference open-source Python library. The encrypted inferencing package provides bindings based on Microsoft SEAL, a homomorphic encryption library.

Machine learning operations (MLOps)

Machine Learning Operations (MLOps), commonly thought of as DevOps for machine learning allows you to build more transparent, resilient, and reproducible machine learning workflows. See the what is MLOps article to learn more about MLOps.

Using DevOps practices like continuous integration (CI) and continuous deployment (CD), you can automate the end-to-end machine learning lifecycle and capture governance data around it. You can define your machine learning CI/CD pipeline in GitHub Actions to run Azure Machine Learning training and deployment tasks.

Capturing software dependencies, metrics, metadata, data and model versioning are an important part of the MLOps process in order to build transparent, reproducible, and auditable pipelines. For this task, you can use MLFlow in Azure Machine Learning as well as when training machine learning models in Azure Databricks. You can also deploy MLflow models as an Azure web service.