Azure Data Science Virtual Machine 上的机器学习和数据科学工具Machine learning and data science tools on Azure Data Science Virtual Machines

Azure Data Science Virtual Machine (DSVM) 拥有丰富的用于机器学习的工具和库,这些工具和库以 Python、R 和 Julia 等热门语言提供。Azure Data Science Virtual Machines (DSVMs) have a rich set of tools and libraries for machine learning available in popular languages, such as Python, R, and Julia.

以下是 DSVM 上的部分机器学习工具和库。Here are some of the machine-learning tools and libraries on DSVMs.

适用于 Python 的 Azure 机器学习 SDKAzure Machine Learning SDK for Python

查看适用于 Python 的 Azure 机器学习 SDK 的完整参考。See the full reference for the Azure Machine Learning SDK for Python.

类别Category Value
它是什么?What is it? Azure 机器学习是一款可用于开发和部署机器学习模型的云服务。Azure Machine Learning is a cloud service that you can use to develop and deploy machine-learning models. 可以在使用 Python SDK 构建、训练、缩放和管理模型时跟踪模型。You can track your models as you build, train, scale, and manage them by using the Python SDK. 将模型部署为容器,并在云中、在本地或在 Azure IoT Edge 上运行它们。Deploy models as containers and run them in the cloud, on-premises, or on Azure IoT Edge.
支持的版本Supported editions Windows(conda 环境:AzureML),Linux(conda 环境:py36)Windows (conda environment: AzureML), Linux (conda environment: py36)
典型用途Typical uses 常规机器学习平台General machine-learning platform
如何配置或安装它?How is it configured or installed? 使用 GPU 支持安装Installed with GPU support
如何使用或运行它How to use or run it 作为 Python SDK 在 Azure CLI 中使用。As a Python SDK and in the Azure CLI. 激活到 Windows 版本上的 conda 环境 AzureML 或 Linux 版本上的 py36Activate to the conda environment AzureML on Windows edition or to py36 on Linux edition.
指向示例的链接Link to samples Notebook 下的 AzureML 目录中包含了示例 Jupyter Notebook。Sample Jupyter notebooks are included in the AzureML directory under notebooks.
相关工具Related tools Visual Studio Code、JupyterVisual Studio Code, Jupyter

H2OH2O

类别Category Value
它是什么?What is it? 一个支持内存中、分布式、快速且可缩放机器学习的开放源代码 AI 平台。An open-source AI platform that supports in-memory, distributed, fast, and scalable machine learning.
支持的版本Supported versions LinuxLinux
典型用途Typical uses 常规用途的分布式可缩放机器学习General-purpose distributed, scalable machine learning
如何配置或安装它?How is it configured or installed? H2O 安装在 /dsvm/tools/h2o 中。H2O is installed in /dsvm/tools/h2o.
如何使用或运行它How to use or run it 使用 X2Go 连接到 VM。Connect to the VM by using X2Go. 启动新的终端并运行 java -jar /dsvm/tools/h2o/current/h2o.jarStart a new terminal, and run java -jar /dsvm/tools/h2o/current/h2o.jar. 然后启动 Web 浏览器并连接到 http://localhost:54321Then start a web browser and connect to http://localhost:54321.
指向示例的链接Link to samples h2o 目录的 Jupyter 中的虚拟机上提供有示例。Samples are available on the VM in Jupyter under the h2o directory.
相关工具Related tools Apache Spark、MXNet、XGBoost、Sparkling Water、Deep WaterApache Spark, MXNet, XGBoost, Sparkling Water, Deep Water

DSVM 上还有其他几个机器学习库,如适用于 DSVM 的 Anaconda Python 分发版中的常用 scikit-learn 包。There are several other machine-learning libraries on DSVMs, such as the popular scikit-learn package that's part of the Anaconda Python distribution for DSVMs. 若要查看 Python、R 和 Julia 中可用的程序包列表,请运行相应的程序包管理器。To check out the list of packages available in Python, R, and Julia, run the respective package managers.

LightGBMLightGBM

类别Category Value
它是什么?What is it? 一个快速、分布式、高性能的梯度提升(GBDT、GBRT、GBM 或 MART)框架,基于决策树算法。A fast, distributed, high-performance gradient-boosting (GBDT, GBRT, GBM, or MART) framework based on decision tree algorithms. 它用于排名、分类和许多其他机器学习任务。It's used for ranking, classification, and many other machine-learning tasks.
支持的版本Supported versions Windows、LinuxWindows, Linux
典型用途Typical uses 常规用途的梯度提升框架General-purpose gradient-boosting framework
如何配置或安装它?How is it configured or installed? 在 Windows 上,LightGBM 作为 Python 程序包安装。On Windows, LightGBM is installed as a Python package. 在 Linux 上,命令行可执行文件位于 /opt/LightGBM/lightgbm 中,安装了 R 程序包,并安装了 Python 程序包。On Linux, the command-line executable is in /opt/LightGBM/lightgbm, the R package is installed, and Python packages are installed.
指向示例的链接Link to samples LightGBM 指南LightGBM guide
相关工具Related tools MXNet、XgBoostMXNet, XgBoost

RattleRattle

类别Category Value
它是什么?What is it? 一个用于使用 R 进行数据挖掘的图形用户界面。A graphical user interface for data mining by using R.
支持的版本Supported editions Windows、LinuxWindows, Linux
典型用途Typical uses R 适用的常规 UI 数据挖掘工具General UI data-mining tool for R
如何使用或运行它How to use or run it 作为 UI 工具使用。As a UI tool. 在 Windows 上,启动命令提示符,运行 R,然后在 R 中运行 rattle()On Windows, start a command prompt, run R, and then inside R, run rattle(). 在 Linux 上,使用 X2Go 连接,启动终端,运行 R,然后在 R 中运行 rattle()On Linux, connect with X2Go, start a terminal, run R, and then inside R, run rattle().
指向示例的链接Link to samples RattleRattle
相关工具Related tools LightGBM、Weka、XGBoostLightGBM, Weka, XGBoost

Vowpal WabbitVowpal Wabbit

类别Category Value
它是什么?What is it? 一个快速的开放源代码外存学习系统库A fast, open-source, out-of-core learning system library
支持的版本Supported editions Windows、LinuxWindows, Linux
典型用途Typical uses 常规机器学习库General machine-learning library
如何配置或安装它?How is it configured or installed? Windows:msi 安装程序Windows: msi installer
Linux:apt-getLinux: apt-get
如何使用或运行它How to use or run it 作为 on-path 命令行工具(Windows 上为 C:\Program Files\VowpalWabbit\vw.exe,Linux 上为 /usr/bin/vwAs an on-path command-line tool (C:\Program Files\VowpalWabbit\vw.exe on Windows, /usr/bin/vw on Linux)
指向示例的链接Link to samples VowPal Wabbit 示例VowPal Wabbit samples
相关工具Related tools LightGBM、MXNet、XGBoostLightGBM, MXNet, XGBoost

WekaWeka

类别Category Value
它是什么?What is it? 适用于数据挖掘任务的机器学习算法的集合。A collection of machine-learning algorithms for data-mining tasks. 这些算法可以直接应用于数据集,也可以从你自己的 Java 代码中调用。The algorithms can be either applied directly to a data set or called from your own Java code. Weka 包含用于数据预处理、分类、回归、群集、关联规则和可视化的工具。Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
支持的版本Supported editions Windows、LinuxWindows, Linux
典型用途Typical uses 常规机器学习工具General machine-learning tool
如何使用或运行它How to use or run it 在 Windows 上,在“开始”菜单中搜索 Weka。On Windows, search for Weka on the Start menu. 在 Linux 上,使用 X2Go 进行登录,然后转到“应用程序” > “开发” > “Weka”。 On Linux, sign in with X2Go, and then go to Applications > Development > Weka.
指向示例的链接Link to samples Weka 示例Weka samples
相关工具Related tools LightGBM、Rattle、XGBoostLightGBM, Rattle, XGBoost

XGBoostXGBoost

类别Category Value
它是什么?What is it? 一个快速、可移植的分布式梯度提升(GBDT、GBRT 或 GBM)库,适用于 Python、R、Java、Scala、C++ 等。A fast, portable, and distributed gradient-boosting (GBDT, GBRT, or GBM) library for Python, R, Java, Scala, C++, and more. 它可在单台计算机、Apache Hadoop 和 Spark 上运行。It runs on a single machine, and on Apache Hadoop and Spark.
支持的版本Supported editions Windows、LinuxWindows, Linux
典型用途Typical uses 常规机器学习库General machine-learning library
如何配置或安装它?How is it configured or installed? 使用 GPU 支持安装Installed with GPU support
如何使用或运行它How to use or run it 作为 Python 库(2.7 和 3.5)、R 程序包和 on path 命令行工具(C:\dsvm\tools\xgboost\bin\xgboost.exe 适用于 Windows,/dsvm/tools/xgboost/xgboost 适用于 Linux)运行As a Python library (2.7 and 3.5), R package, and on-path command-line tool (C:\dsvm\tools\xgboost\bin\xgboost.exe for Windows and /dsvm/tools/xgboost/xgboost for Linux)
指向示例的链接Links to samples 虚拟机上包含了示例,在 Linux 上位于 /dsvm/tools/xgboost/demo 中,在 Windows 上位于 C:\dsvm\tools\xgboost\demo 中。Samples are included on the VM, in /dsvm/tools/xgboost/demo on Linux, and C:\dsvm\tools\xgboost\demo on Windows.
相关工具Related tools LightGBM、MXNetLightGBM, MXNet

Apache DrillApache Drill

类别Category Value
它是什么?What is it? 大数据的开放源代码 SQL 查询引擎Open-source SQL query engine on big data
支持的 DSVM 版本Supported DSVM versions Windows 2019、LinuxWindows 2019, Linux
如何在 DSVM 上配置和安装它?How is it configured and installed on the DSVM? 仅以嵌入模式安装在 /dsvm/tools/drill*Installed in /dsvm/tools/drill* in embedded mode only
典型用途Typical uses 用于在不提取、转换、加载 (ETL) 数据的情况下进行原状数据探索。For in-place data exploration without requiring extract, transform, load (ETL). 查询不同的数据源和格式,包括 CSV、JSON、关系表和 Hadoop。Query different data sources and formats, including CSV, JSON, relational tables, and Hadoop.
如何使用和运行它How to use and run it 桌面快捷方式Desktop shortcut
10 分钟后即可开始钻取Get started with Drill in 10 minutes
DSVM 上的相关工具Related tools on the DSVM Rattle、Weka、SQL Server Management StudioRattle, Weka, SQL Server Management Studio