数据科学虚拟机数据引入工具Data Science Virtual Machine data ingestion tools

数据科学或 AI 项目中的初始技术步骤之一就是必须识别要使用的数据集并将其引入到分析环境中。As one of the first technical steps in a data science or AI project, you must identify the datasets to be used and bring them into your analytics environment. Data Science Virtual Machine (DSVM) 提供工具和库,可将来自不同来源的数据导入 DSVM 上的本地分析数据存储,或者云或本地的数据平台中。The Data Science Virtual Machine (DSVM) provides tools and libraries to bring data from different sources into analytical data storage locally on the DSVM, or into a data platform either on the cloud or on-premises.

下面是 DSVM 中可用的一些数据移动工具。Here are some data movement tools that are available in the DSVM.

AdlCopyAdlCopy

类别Category Value
它是什么?What is it? 可将数据从 Azure Blob 存储复制到 Azure Data Lake Store 的工具。A tool to copy data from Azure Blob storage into Azure Data Lake Store. 此外,也可在两个 Azure Data Lake Store 帐户之间复制数据。It can also copy data between two Azure Data Lake Store accounts.
支持的 DSVM 版本Supported DSVM versions WindowsWindows
典型用途Typical uses 将多个 blob 从 Azure Blob 存储复制到 Azure Data Lake Store。Importing multiple blobs from Azure Blob storage into Azure Data Lake Store.
如何使用/运行它?How to use / run it? 打开命令提示符,键入 adlcopy 可获取帮助。Open a command prompt and type adlcopy to get help.
指向示例的链接Links to samples 使用 AdlCopyUsing AdlCopy
DSVM 上的相关工具Related tools on the DSVM AzCopy、Azure CLIAzCopy, Azure CLI

Azure CLIAzure CLI

类别Category Value
它是什么?What is it? Azure 的管理工具。A management tool for Azure. 它还包含可从 Azure 数据平台(如 Azure Blob 存储和 Azure Data Lake Storage)移动数据的命令谓词。It also contains command verbs to move data from Azure data platforms like Azure Blob storage and Azure Data Lake Store.
支持的 DSVM 版本Supported DSVM versions Windows、LinuxWindows, Linux
典型用途Typical uses 从 Azure 存储、Azure Data Lake Store 导出数据或将数据导入其中。Importing and exporting data to and from Azure Storage and Azure Data Lake Store.
如何使用/运行它?How to use / run it? 打开命令提示符,键入 az 可获取帮助。Open a command prompt and type az to get help.
指向示例的链接Links to samples 使用 Azure CLIUsing Azure CLI
DSVM 上的相关工具Related tools on the DSVM AzCopy、AdlCopyAzCopy, AdlCopy

AzCopyAzCopy

类别Category Value
它是什么?What is it? 用于从本地文件、Azure BLob 存储、文件和表复制数据以及将数据复制到其中的工具。A tool to copy data to and from local files, Azure Blob storage, files, and tables.
支持的 DSVM 版本Supported DSVM versions WindowsWindows
典型用途Typical uses 将文件复制到 Azure Blob 存储以及在帐户之间复制 Blob。Copying files to Azure Blob storage and copying blobs between accounts.
如何使用/运行它?How to use / run it? 打开命令提示符,键入 azcopy 可获取帮助。Open a command prompt and type azcopy to get help.
指向示例的链接Links to samples AzCopy on WindowsAzCopy on Windows
DSVM 上的相关工具Related tools on the DSVM AdlCopyAdlCopy

Azure Cosmos DB 数据迁移工具Azure Cosmos DB Data Migration tool

类别Category Value
它是什么?What is it? 将不同来源的数据导入 Azure Cosmos DB(一种云中的 NoSQL 数据库)的工具。Tool to import data from various sources into Azure Cosmos DB, a NoSQL database in the cloud. 这些数据可以来源于 JSON 文件、CSV 文件、SQL、MongoDB、Azure 表存储、Amazon DynamoDB 以及 Azure Cosmos DB SQL API 集合。These sources include JSON files, CSV files, SQL, MongoDB, Azure Table storage, Amazon DynamoDB, and Azure Cosmos DB SQL API collections.
支持的 DSVM 版本Supported DSVM versions WindowsWindows
典型用途Typical uses 将文件从虚拟机导入 CosmosDB,将数据从 Azure 表存储导入 CosmosDB 以及将数据从 Microsoft SQL Server 数据库导入到 CosmosDB。Importing files from a VM to CosmosDB, importing data from Azure table storage to CosmosDB, and importing data from a Microsoft SQL Server database to CosmosDB.
如何使用/运行它?How to use / run it? 要使用命令行版本,请打开命令提示符,键入 dtTo use the command-line version, open a command prompt and type dt. 要使用 GUI 工具,请打开命令提示符,键入 dtuiTo use the GUI tool, open a command prompt and type dtui.
指向示例的链接Links to samples CosmosDB 导入数据CosmosDB Import data
DSVM 上的相关工具Related tools on the DSVM AzCopy、AdlCopyAzCopy, AdlCopy

Azure 存储资源管理器Azure Storage Explorer

类别Category Value
它是什么?What is it? 用于与 Azure 云中存储的文件进行交互的图形用户界面。Graphical User Interface for interacting with files stored in the Azure cloud.
支持的 DSVM 版本Supported DSVM versions WindowsWindows
典型用途Typical uses 从 DSVM 导出数据或将数据导入其中。Importing and exporting data from the DSVM.
如何使用/运行它?How to use / run it? 在“开始”菜单中搜索“Azure 存储资源管理器”。Search for "Azure Storage Explorer" in the Start menu.
指向示例的链接Links to samples Azure 存储资源管理器Azure Storage Explorer

bcpbcp

类别Category Value
它是什么?What is it? 在 SQL Server 和数据文件之间复制数据的 SQL Server 工具。SQL Server tool to copy data between SQL Server and a data file.
支持的 DSVM 版本Supported DSVM versions WindowsWindows
典型用途Typical uses 将 CSV 文件导入到 SQL Server 表中以及将 SQL Server 表导出到文件。Importing a CSV file into a SQL Server table and exporting a SQL Server table to a file.
如何使用/运行它?How to use / run it? 打开命令提示符,键入 bcp 可获取帮助。Open a command prompt and type bcp to get help.
指向示例的链接Links to samples bcp 实用工具bcp utility
DSVM 上的相关工具Related tools on the DSVM SQL Server、sqlcmdSQL Server, sqlcmd

blobfuseblobfuse

类别Category Value
它是什么?What is it? 用于在 Linux 文件系统中装载 Azure Blob 存储容器的工具。A tool to mount an Azure Blob storage container in the Linux file system.
支持的 DSVM 版本Supported DSVM versions LinuxLinux
典型用途Typical uses 读取和写入到容器中的 Blob。Reading and writing to blobs in a container.
如何使用和运行它?How to use and run it? 在终端中运行 blobfuseRun blobfuse at a terminal.
指向示例的链接Links to samples GitHub 上的 blobfuseblobfuse on GitHub
DSVM 上的相关工具Related tools on the DSVM Azure CLIAzure CLI