复制活动中的架构映射Schema mapping in copy activity

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

本文介绍在执行数据复制操作时,Azure 数据工厂复制活动如何将架构和数据类型从源数据映射到接收器数据。This article describes how the Azure Data Factory copy activity does schema mapping and data type mapping from source data to sink data when executing the data copy.

架构映射Schema mapping

将数据从源复制到接收器时,适用列映射。Column mapping applies when copying data from source to sink. 复制活动默认按列名称将源数据映射到接收器 。By default, copy activity map source data to sink by column names. 可以指定显式映射,根据需要自定义列映射。You can specify explicit mapping to customize the column mapping based on your need. 更具体地说,复制活动包含以下内容:More specifically, copy activity:

  1. 从源中读取数据并确定源架构Read the data from source and determine the source schema
  2. 使用默认列映射按名称映射列,或者应用显式列映射(如果已指定)。Use default column mapping to map columns by name, or apply explicit column mapping if specified.
  3. 将数据写入接收器Write the data to sink

显式映射Explicit mapping

可以指定要在复制活动中映射的列 -> translator -> mappings 属性。You can specify the columns to map in copy activity -> translator -> mappings property. 以下示例在管道中定义一个复制活动,可将数据从带分隔符的文本复制到 Azure SQL 数据库。The following example defines a copy activity in a pipeline to copy data from delimited text to Azure SQL Database.

{
    "name": "CopyActivity",
    "type": "Copy",
    "inputs": [{
        "referenceName": "DelimitedTextInput",
        "type": "DatasetReference"
    }],
    "outputs": [{
        "referenceName": "AzureSqlOutput",
        "type": "DatasetReference"
    }],
    "typeProperties": {
        "source": { "type": "DelimitedTextSource" },
        "sink": { "type": "SqlSink" },
        "translator": {
            "type": "TabularTranslator",
            "mappings": [
                {
                    "source": {
                        "name": "UserId",
                        "type": "Guid"
                    },
                    "sink": {
                        "name": "MyUserId"
                    }
                }, 
                {
                    "source": {
                        "name": "Name",
                        "type": "String"
                    },
                    "sink": {
                        "name": "MyName"
                    }
                }, 
                {
                    "source": {
                        "name": "Group",
                        "type": "String"
                    },
                    "sink": {
                        "name": "MyGroup"
                    }
                }
            ]
        }
    }
}

以下属性在 translator -> mappings 下带 sourcesink 的对象中受支持:The following properties are supported under translator -> mappings -> object with source and sink:

属性Property 说明Description 必须Required
namename 源或接收器列的名称。Name of the source or sink column. Yes
序号ordinal 列索引。Column index. 从 1 开始。Start with 1.
在使用带分隔符的文本但没有标头行时应用,为必填项。Apply and required when using delimited text without header line.
No
pathpath 要提取或映射的每个字段的 JSON 路径表达式。JSON path expression for each field to extract or map. 适用于分层数据,例如 MongoDB/REST。Apply for hierarchical data e.g. MongoDB/REST.
对于根对象下的字段,JSON 路径以根 $ 开头;对于 collectionReference 属性选择的数组内的字段,JSON 路径从数组元素开始。For fields under the root object, the JSON path starts with root $; for fields inside the array chosen by collectionReference property, JSON path starts from the array element.
No
typetype 源或接收器列的数据工厂临时数据类型。Data Factory interim data type of the source or sink column. No
cultureculture 源或接收器列的区域性。Culture of the source or sink column.
当类型为 DatetimeDatetimeoffset 时应用。Apply when type is Datetime or Datetimeoffset. 默认为 en-usThe default is en-us.
No
formatformat 当类型为 DatetimeDatetimeoffset 时要使用的格式字符串。Format string to be used when type is Datetime or Datetimeoffset. 请参阅自定义日期和时间格式字符串,了解如何设置日期时间格式。Refer to Custom Date and Time Format Strings on how to format datetime. No

以下属性在 translator -> mappings 下带 sourcesink 的对象中受支持:The following properties are supported under translator -> mappings in addition to object with source and sink:

属性Property 说明Description 必须Required
collectionReferencecollectionReference 仅当分层数据(例如 MongoDB/REST)为源时,才支持此项。Supported only when hierarchical data e.g. MongoDB/REST is source.
若要进行迭代操作,以同一模式从数组字段中的对象提取数据并按行和对象进行转换,请指定要进行交叉应用的该数组的 JSON 路径。If you want to iterate and extract data from the objects inside an array field with the same pattern and convert to per row per object, specify the JSON path of that array to do cross-apply.
No

备用列映射Alternative column mapping

可以指定复制活动 -> translator -> columnMappings,在表格形式的数据之间进行映射。You can specify copy activity -> translator -> columnMappings to map between tabular-shaped data . 在这种情况下,输入和输出数据集都需要“structure”节。In this case, the "structure" section is required for both input and output datasets. 列映射支持将源数据集“structure”中的所有列或列子集接收器数据集“structure”中的所有列 。Column mapping supports mapping all or subset of columns in the source dataset "structure" to all columns in the sink dataset "structure". 以下是导致异常的错误条件:The following are error conditions that result in an exception:

  • 源数据存储查询结果中没有输入数据集“structure”部分中指定的列名称。Source data store query result does not have a column name that is specified in the input dataset "structure" section.
  • 接收器数据存储(如果具有预定义架构)没有输出数据集“structure”部分中指定的列名称。Sink data store (if with pre-defined schema) does not have a column name that is specified in the output dataset "structure" section.
  • 接收器数据集“structure”中的列数量多于或少于映射中指定的数量。Either fewer columns or more columns in the "structure" of sink dataset than specified in the mapping.
  • 重复的映射。Duplicate mapping.

在以下示例中,输入数据集有一个结构,并且它指向本地 Oracle 数据库中的表。In the following example, the input dataset has a structure, and it points to a table in an on-premises Oracle database.

{
    "name": "OracleDataset",
    "properties": {
        "structure":
         [
            { "name": "UserId"},
            { "name": "Name"},
            { "name": "Group"}
         ],
        "type": "OracleTable",
        "linkedServiceName": {
            "referenceName": "OracleLinkedService",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "tableName": "SourceTable"
        }
    }
}

在本示例中,输出数据集有一个结构,并指向 Salesfoce 中的表。In this sample, the output dataset has a structure and it points to a table in Salesfoce.

{
    "name": "SalesforceDataset",
    "properties": {
        "structure":
        [
            { "name": "MyUserId"},
            { "name": "MyName" },
            { "name": "MyGroup"}
        ],
        "type": "SalesforceObject",
        "linkedServiceName": {
            "referenceName": "SalesforceLinkedService",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "tableName": "SinkTable"
        }
    }
}

以下 JSON 定义管道中的复制活动。The following JSON defines a copy activity in a pipeline. 使用 translator -> columnMappings 属性将源中的列映射到接收器中的列。The columns from source mapped to columns in sink by using the translator -> columnMappings property.

{
    "name": "CopyActivity",
    "type": "Copy",
    "inputs": [
        {
            "referenceName": "OracleDataset",
            "type": "DatasetReference"
        }
    ],
    "outputs": [
        {
            "referenceName": "SalesforceDataset",
            "type": "DatasetReference"
        }
    ],
    "typeProperties":    {
        "source": { "type": "OracleSource" },
        "sink": { "type": "SalesforceSink" },
        "translator":
        {
            "type": "TabularTranslator",
            "columnMappings":
            {
                "UserId": "MyUserId",
                "Group": "MyGroup",
                "Name": "MyName"
            }
        }
    }
}

如果使用语法 "columnMappings": "UserId: MyUserId, Group: MyGroup, Name: MyName" 指定列映射,则仍然按原样支持它。If you are using the syntax of "columnMappings": "UserId: MyUserId, Group: MyGroup, Name: MyName" to specify column mapping, it is still supported as-is.

备用架构映射Alternative schema mapping

可以指定复制活动 -> translator -> schemaMapping,以便在分层数据和表格形式的数据之间进行映射(例如,将数据从 MongoDB/REST 复制到文本文件以及从 Oracle 复制到 Auzre Cosmos DB 的 API for MongoDB)。You can specify copy activity -> translator -> schemaMapping to map between hierarchical-shaped data and tabular-shaped data, e.g. copy from MongoDB/REST to text file and copy from Oracle to Azure Cosmos DB's API for MongoDB. 复制活动 translator 部分支持以下属性:The following properties are supported in copy activity translator section:

属性Property 说明Description 必须Required
typetype 复制活动转换器的 type 属性必须设置为:TabularTranslatorThe type property of the copy activity translator must be set to: TabularTranslator Yes
schemaMappingschemaMapping 键值对的集合,代表从源端到接收器端的映射关系。A collection of key-value pairs, which represents the mapping relation from source side to sink side.
- 键: 代表源。- Key: represents source. 对于表格源,指定数据集结构中定义的列名称;对于分层源,指定要提取和映射的每个字段的 JSON 路径表达式。For tabular source, specify the column name as defined in dataset structure; for hierarchical source, specify the JSON path expression for each field to extract and map.
- 值: 代表接收器。- Value: represents sink. 对于表格接收器,指定数据集结构中定义的列名称;对于分层接收器,指定要提取和映射的每个字段的 JSON 路径表达式。For tabular sink, specify the column name as defined in dataset structure; for hierarchical sink, specify the JSON path expression for each field to extract and map.
在使用分层数据时,对于根对象下的字段,JSON 路径以根 $ 开头;对于按 collectionReference 属性选择的数组中的字段,JSON 路径以数组元素开头。In the case of hierarchical data, for fields under root object, JSON path starts with root $; for fields inside the array chosen by collectionReference property, JSON path starts from the array element.
Yes
collectionReferencecollectionReference 若要进行迭代操作,以同一模式从数组字段中的对象提取数据并按行和对象进行转换,请指定要进行交叉应用的该数组的 JSON 路径。If you want to iterate and extract data from the objects inside an array field with the same pattern and convert to per row per object, specify the JSON path of that array to do cross-apply. 仅当分层数据为源时,才支持此属性。This property is supported only when hierarchical data is source. No

示例:从 MongoDB 复制到 Oracle:Example: copy from MongoDB to Oracle:

例如,如果 MongoDB 文档的内容如下:For example, if you have MongoDB document with the following content:

{
    "id": {
        "$oid": "592e07800000000000000000"
    },
    "number": "01",
    "date": "20170122",
    "orders": [
        {
            "prod": "p1",
            "price": 23
        },
        {
            "prod": "p2",
            "price": 13
        },
        {
            "prod": "p3",
            "price": 231
        }
    ],
    "city": [ { "name": "Seattle" } ]
}

而你需要按以下格式通过平展数组中数据(order_pd 和 order_price)的方式将其复制到 Azure SQL 表中,并使用常见的根信息(数字、日期和城市) 进行交叉联接: and you want to copy it into an Azure SQL table in the following format, by flattening the data inside the array (order_pd and order_price) and cross join with the common root info (number, date, and city):

orderNumberorderNumber orderDateorderDate order_pdorder_pd order_priceorder_price citycity
0101 2017012220170122 P1P1 2323 西雅图Seattle
0101 2017012220170122 P2P2 1313 西雅图Seattle
0101 2017012220170122 P3P3 231231 西雅图Seattle

将架构映射规则配置为以下复制活动 JSON 示例:Configure the schema-mapping rule as the following copy activity JSON sample:

{
    "name": "CopyFromMongoDBToOracle",
    "type": "Copy",
    "typeProperties": {
        "source": {
            "type": "MongoDbV2Source"
        },
        "sink": {
            "type": "OracleSink"
        },
        "translator": {
            "type": "TabularTranslator",
            "schemaMapping": {
                "$.number": "orderNumber",
                "$.date": "orderDate",
                "prod": "order_pd",
                "price": "order_price",
                "$.city[0].name": "city"
            },
            "collectionReference":  "$.orders"
        }
    }
}

数据类型映射Data type mapping

复制活动按照以下 2 个步骤执行源类型到接收器类型的映射:Copy activity performs source types to sink types mapping with the following 2-step approach:

  1. 从本机源类型转换为 Azure 数据工厂临时数据类型Convert from native source types to Azure Data Factory interim data types
  2. 从 Azure 数据工厂临时数据类型转换为本机接收器类型Convert from Azure Data Factory interim data types to native sink type

复制活动支持以下临时数据类型:Copy activity supports the following interim data types:

  • Byte[]Byte[]
  • 布尔Boolean
  • datetimeDatetime
  • DatetimeoffsetDatetimeoffset
  • 小数Decimal
  • DoubleDouble
  • GuidGuid
  • Int16Int16
  • Int32Int32
  • Int64Int64
  • SingleSingle
  • StringString
  • TimespanTimespan

后续步骤Next steps

请参阅其他复制活动文章:See the other Copy Activity articles: