使用 SSIS 连接器将数据移入或移出 Azure Blob 存储Move data to or from Azure Blob Storage using SSIS connectors

用于 Azure 的 SQL Server Integration Services 功能包提供了可用于连接到 Azure,在 Azure 和本地数据源之间传输数据以及处理存储在 Azure 中的数据的组件。The SQL Server Integration Services Feature Pack for Azure provides components to connect to Azure, transfer data between Azure and on-premises data sources, and process data stored in Azure.

此菜单链接到可用于将数据移入和移出 Azure Blob 存储的技术:This menu links to technologies you can use to move data to and from Azure Blob storage:

客户将本地数据移到云中后,便可以从任何 Azure 服务访问其数据,以利用 Azure 技术套件的完整功能。Once customers have moved on-premises data into the cloud, they can access their data from any Azure service to leverage the full power of the suite of Azure technologies. 这些数据可以随后使用,例如,用在 Azure 机器学习或 HDInsight 群集中。The data may be subsequently used, for example, in Azure Machine Learning or on an HDInsight cluster.

有关使用这些 Azure 资源的示例,请参阅 SQLHDInsight 演练。Examples for using these Azure resources are in the SQL and HDInsight walkthroughs.

若要深入了解使用 SSIS 完成混合数据集成方案中常见的业务需求的规范方案讨论,请参阅Doing more with SQL Server Integration Services Feature Pack for Azure(使用用于 Azure 的 SQL Server Integration Services 功能包执行更多操作)博客。For a discussion of canonical scenarios that use SSIS to accomplish business needs common in hybrid data integration scenarios, see Doing more with SQL Server Integration Services Feature Pack for Azure blog.

备注

有关 Azure Blob 存储的完整介绍,请参阅 Azure Blob 基本知识Azure Blob 服务For a complete introduction to Azure blob storage, refer to Azure Blob Basics and to Azure Blob Service.

先决条件Prerequisites

若要执行本文所述任务,必须设置 Azure 订阅和 Azure 存储帐户。To perform the tasks described in this article, you must have an Azure subscription and an Azure Storage account set up. 若要上传或下载数据,需要 Azure 存储的帐户名和帐户密钥。You need the Azure Storage account name and account key to upload or download data.

若要使用 SSIS 连接器,则必须下载:To use the SSIS connectors, you must download:

备注

SSIS 随 SQL Server 一起安装,但并不包括在 Express 版本中。SSIS is installed with SQL Server, but is not included in the Express version. 若要深入了解 SQL Server 各种版本中包含哪些应用程序,请参阅 SQL Server Editions(SQL Server 版本)For information on what applications are included in various editions of SQL Server, see SQL Server Editions

有关 SSIS 的培训资料,请参阅 Hands On Training for SSIS(SSIS 培训指导)For training materials on SSIS, see Hands On Training for SSIS

有关如何使用 SISS 生成简单的提取、转换和加载 (ETL) 包进行启动并运行的信息,请参阅 SSIS 教程:创建简单的 ETL 包For information on how to get up-and-running using SISS to build simple extraction, transformation, and load (ETL) packages, see SSIS Tutorial: Creating a Simple ETL Package.

下载 NYC 出租车数据集Download NYC Taxi dataset

此处所述的示例使用公开发布的数据集 - NYC 出租车行程The example described here use a publicly available dataset -- the NYC Taxi Trips dataset. 此数据集包含 2013 年纽约市内约 1.73 亿次出租车行程。The dataset consists of about 173 million taxi rides in NYC in the year 2013. 有两种类型的数据:行程详细信息数据和费用数据。There are two types of data: trip details data and fare data. 由于每个月都有一个文件,因此共有 24 个文件,每个文件大约 2 GB,未压缩。As there is a file for each month, we have 24 files, each of which is about 2 GB uncompressed.

将数据上传到 Azure Blob 存储Upload data to Azure blob storage

要使用 SSIS 功能包将数据从本地移动到 Azure Blob 存储,使用 Azure Blob 上传任务的实例,如下所示:To move data using the SSIS feature pack from on-premises to Azure blob storage, we use an instance of the Azure Blob Upload Task, shown here:

configure-data-science-vm

以下是任务使用的参数:The parameters that the task uses are described here:

字段Field 说明Description
AzureStorageConnectionAzureStorageConnection 指定现有 Azure 存储连接管理器或新建一个 Azure 存储连接管理器,该管理器引用指向 blob 文件托管位置的 Azure 存储帐户。Specifies an existing Azure Storage Connection Manager or creates a new one that refers to an Azure Storage account that points to where the blob files are hosted.
BlobContainerBlobContainer 指定 blob 容器的名称,该容器将上传的文件保存为 blob。Specifies the name of the blob container that holds the uploaded files as blobs.
BlobDirectoryBlobDirectory 指定将上载的文件作为块 blob 存储的 blob 目录。Specifies the blob directory where the uploaded file is stored as a block blob. 该 blob 目录是一个虚拟层次结构。The blob directory is a virtual hierarchical structure. 如果 blob 已存在,其会被替代。If the blob already exists, it ia replaced.
LocalDirectoryLocalDirectory 指定包含要上传的文件的本地目录。Specifies the local directory that contains the files to be uploaded.
FileNameFileName 指定名称筛选器以选择具有指定名称模式的文件。Specifies a name filter to select files with the specified name pattern. 例如,MySheet*.xls* 包括 MySheet001.xls 和 MySheetABC.xlsx 等文件For example, MySheet*.xls* includes files such as MySheet001.xls and MySheetABC.xlsx
TimeRangeFrom/TimeRangeToTimeRangeFrom/TimeRangeTo 指定时间范围筛选器。Specifies a time range filter. 将包括在 TimeRangeFrom 之后以及 TimeRangeTo 之前修改的文件。Files modified after TimeRangeFrom and before TimeRangeTo are included.

备注

AzureStorageConnection 凭据必须正确,且在尝试进行传输之前,BlobContainer 必须存在。The AzureStorageConnection credentials need to be correct and the BlobContainer must exist before the transfer is attempted.

从 Azure Blob 存储下载数据Download data from Azure blob storage

要使用 SSIS 将数据从 Azure Blob 存储下载到本地存储,请使用 Azure Blob 下载任务的实例。To download data from Azure blob storage to on-premises storage with SSIS, use an instance of the Azure Blob Download Task.

更高级的 SSIS-Azure 方案More advanced SSIS-Azure scenarios

SSIS 功能包能够通过将任务一起打包来处理更复杂的流。The SSIS feature pack allows for more complex flows to be handled by packaging tasks together. 例如,blob 数据可以直接传输到 HDInsight 群集,可将此群集的输出下载回 blob,再下载到本地存储。For example, the blob data could feed directly into an HDInsight cluster, whose output could be downloaded back to a blob and then to on-premises storage. SSIS 可使用附加的 SSIS 连接器在 HDInsight 群集上运行 Hive 和 Pig 作业:SSIS can run Hive and Pig jobs on an HDInsight cluster using additional SSIS connectors: