复制活动中的架构和数据类型映射Schema and data type mapping in copy activity

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

本文介绍了 Azure 数据工厂复制活动如何将架构和数据类型从源数据映射到接收器数据。This article describes how the Azure Data Factory copy activity perform schema mapping and data type mapping from source data to sink data.

架构映射Schema mapping

默认映射Default mapping

默认情况下,复制活动会以区分大小写的方式按列名称将源数据映射到接收器。By default, copy activity maps source data to sink by column names in case-sensitive manner. 如果接收器不存在(例如,在将数据写入到文件的情况下),则源字段名称将持久保存为接收器名称。If sink doesn't exist, for example, writing to file(s), the source field names will be persisted as sink names. 此类默认映射支持灵活的架构以及从源到接收器因执行而异的架构偏差 - 源数据存储返回的所有数据都可复制到接收器。Such default mapping supports flexible schemas and schema drift from source to sink from execution to execution - all the data returned by source data store can be copied to sink.

如果源是无标题行的文本文件,则需要显式映射,因为源不包含列名。If your source is text file without header line, explicit mapping is required as the source doesn't contain column names.

显式映射Explicit mapping

你还可以指定显式映射,以便根据需要自定义从源到接收器的列/字段映射。You can also specify explicit mapping to customize the column/field mapping from source to sink based on your need. 使用显式映射时,可以只将部分源数据复制到接收器,或将源数据映射到名称不同的接收器,或重塑表格/分层数据。With explicit mapping, you can copy only partial source data to sink, or map source data to sink with different names, or reshape tabular/hierarchical data. 复制活动:Copy activity:

  1. 从源中读取数据并确定源架构。Reads the data from source and determine the source schema.
  2. 应用已定义的映射。Applies your defined mapping.
  3. 将数据写入接收器。Writes the data to sink.

了解有关以下方面的详细信息:Learn more about:

可在数据工厂创作 UI >“复制活动”->“映射”选项卡上配置映射,也可以采用编程方式在“复制活动”> translator 属性中指定映射。You can configure the mapping on Data Factory authoring UI -> copy activity -> mapping tab, or programmatically specify the mapping in copy activity -> translator property. 以下属性在 translator -> mappings 数组 > 对象 -> sourcesink 中受支持,后者指向用于映射数据的特定列/字段。The following properties are supported in translator -> mappings array -> objects -> source and sink, which points to the specific column/field to map data.

属性Property 说明Description 必需Required
namename 源或接收器列/字段的名称。Name of the source or sink column/field. 适用于表格源和接收器。Apply for tabular source and sink. Yes
序号ordinal 列索引。Column index. 从 1 开始。Start from 1.
在使用带分隔符的文本但没有标头行时应用,为必填项。Apply and required when using delimited text without header line.
No
pathpath 要提取或映射的每个字段的 JSON 路径表达式。JSON path expression for each field to extract or map. 适用于分层源和接收器,例如 Cosmos DB、MongoDB 或 REST 连接器。Apply for hierarchical source and sink, for example, Cosmos DB, MongoDB, or REST connectors.
对于根对象下的字段,JSON 路径以根 $ 开头;对于 collectionReference 属性选择的数组内的字段,JSON 路径从没有 $ 的数组元素开始。For fields under the root object, the JSON path starts with root $; for fields inside the array chosen by collectionReference property, JSON path starts from the array element without $.
No
typetype 源或接收器列的数据工厂临时数据类型。Data Factory interim data type of the source or sink column. 通常无需指定或更改此属性。In general, you don't need to specify or change this property. 详细了解数据类型映射Learn more about data type mapping. No
cultureculture 源或接收器列的区域性。Culture of the source or sink column. 当类型为 DatetimeDatetimeoffset 时应用。Apply when type is Datetime or Datetimeoffset. 默认为 en-usThe default is en-us.
通常无需指定或更改此属性。In general, you don't need to specify or change this property. 详细了解数据类型映射Learn more about data type mapping.
No
formatformat 当类型为 DatetimeDatetimeoffset 时要使用的格式字符串。Format string to be used when type is Datetime or Datetimeoffset. 请参阅自定义日期和时间格式字符串,了解如何设置日期时间格式。Refer to Custom Date and Time Format Strings on how to format datetime. 通常无需指定或更改此属性。In general, you don't need to specify or change this property. 详细了解数据类型映射Learn more about data type mapping. No

mappings 外,translator 下还支持以下属性:The following properties are supported under translator in addition to mappings:

属性Property 说明Description 必须Required
collectionReferencecollectionReference 从分层源(例如 Cosmos DB、MongoDB 或 REST 连接器)复制数据时适用。Apply when copying data from hierarchical source, for example, Cosmos DB, MongoDB, or REST connectors.
若要进行迭代操作,以同一模式从数组字段中的对象提取数据并按行和对象进行转换,请指定要进行交叉应用的该数组的 JSON 路径。If you want to iterate and extract data from the objects inside an array field with the same pattern and convert to per row per object, specify the JSON path of that array to do cross-apply.
No

表格源到表格接收器Tabular source to tabular sink

例如,若要将数据从 Salesforce 复制到 Azure SQL 数据库并显式映射三列,请执行以下操作:For example, to copy data from Salesforce to Azure SQL Database and explicitly map three columns:

  1. 在“复制活动”->“映射”选项卡上,单击“导入架构”按钮以导入源架构和接收器架构。On copy activity -> mapping tab, click Import schema button to import both source and sink schemas.

  2. 映射所需字段并排除/删除其余字段。Map the needed fields and exclude/delete the rest.

从表格映射到表格

可以在复制活动有效负载(请参阅 translator)中将相同的映射配置为以下内容:The same mapping can be configured as the following in copy activity payload (see translator):

{
    "name": "CopyActivityTabularToTabular",
    "type": "Copy",
    "typeProperties": {
        "source": { "type": "SalesforceSource" },
        "sink": { "type": "SqlSink" },
        "translator": {
            "type": "TabularTranslator",
            "mappings": [
                {
                    "source": { "name": "Id" },
                    "sink": { "name": "CustomerID" }
                },
                {
                    "source": { "name": "Name" },
                    "sink": { "name": "LastName" }
                },
                {
                    "source": { "name": "LastModifiedDate" },
                    "sink": { "name": "ModifiedDate" }
                }
            ]
        }
    },
    ...
}

若要从没有标题行的带分隔符的文本文件复制数据,则这些列将按序号而不是名称来呈现。To copy data from delimited text file(s) without header line, the columns are represented by ordinal instead of names.

{
    "name": "CopyActivityTabularToTabular",
    "type": "Copy",
    "typeProperties": {
        "source": { "type": "DelimitedTextSource" },
        "sink": { "type": "SqlSink" },
        "translator": {
            "type": "TabularTranslator",
            "mappings": [
                {
                    "source": { "ordinal": "1" },
                    "sink": { "name": "CustomerID" }
                }, 
                {
                    "source": { "ordinal": "2" },
                    "sink": { "name": "LastName" }
                }, 
                {
                    "source": { "ordinal": "3" },
                    "sink": { "name": "ModifiedDate" }
                }
            ]
        }
    },
    ...
}

分层源到表格接收器Hierarchical source to tabular sink

将数据从分层源复制到表格接收器时,复制活动支持以下功能:When copying data from hierarchical source to tabular sink, copy activity supports the following capabilities:

  • 从对象和数组中提取数据。Extract data from objects and arrays.
  • 从数组中交叉应用具有相同模式的多个对象,在这种情况下,可以将一个 JSON 对象转换为表格结果中的多个记录。Cross apply multiple objects with the same pattern from an array, in which case to convert one JSON object into multiple records in tabular result.

例如,如果源 MongoDB 文档的内容如下:For example, if you have source MongoDB document with the following content:

{
    "id": {
        "$oid": "592e07800000000000000000"
    },
    "number": "01",
    "date": "20170122",
    "orders": [
        {
            "prod": "p1",
            "price": 23
        },
        {
            "prod": "p2",
            "price": 13
        },
        {
            "prod": "p3",
            "price": 231
        }
    ],
    "city": [ { "name": "Seattle" } ]
}

而你需要按以下格式通过平展数组中数据(order_pd 和 order_price)的方式将其复制到带标题行的文本文件中,并使用常见的根信息(编号、日期和城市)进行交叉联接:And you want to copy it into a text file in the following format with header line, by flattening the data inside the array (order_pd and order_price) and cross join with the common root info (number, date, and city):

orderNumberorderNumber orderDateorderDate order_pdorder_pd order_priceorder_price citycity
0101 2017012220170122 P1P1 2323 西雅图Seattle
0101 2017012220170122 P2P2 1313 西雅图Seattle
0101 2017012220170122 P3P3 231231 西雅图Seattle

可在数据工厂创作 UI 上定义此类映射:You can define such mapping on Data Factory authoring UI:

  1. 在“复制活动”->“映射”选项卡上,单击“导入架构”按钮以导入源架构和接收器架构。On copy activity -> mapping tab, click Import schema button to import both source and sink schemas. 由于数据工厂在导入架构时会对前几个对象采样,因此如果没有显示任何字段,你可以将其添加到层次结构中的正确层上:将鼠标指针悬停在现有字段名称上,然后选择添加节点、对象或数组。As Data Factory samples the top few objects when importing schema, if any field doesn't show up, you can add it to the correct layer in the hierarchy - hover on an existing field name and choose to add a node, an object, or an array.

  2. 选择要从中遍历和提取数据的数组。Select the array from which you want to iterate and extract data. 它将自动填充为“集合引用”。It will be auto populated as Collection reference. 请注意,此类操作只支持单个数组。Note only single array is supported for such operation.

  3. 将所需字段映射到接收器。Map the needed fields to sink. 数据工厂自动确定分层端对应的 JSON 路径。Data Factory automatically determines the corresponding JSON paths for the hierarchical side.

使用 UI 从分层映射到表格

还可以切换到“高级编辑器”,在这种情况下,可以直接查看和编辑字段的 JSON 路径。You can also switch to Advanced editor, in which case you can directly see and edit the fields' JSON paths. 如果选择在此视图中添加新映射,请指定 JSON 路径。If you choose to add new mapping in this view, specify the JSON path.

使用高级编辑器从分层映射到表格

可以在复制活动有效负载(请参阅 translator)中将相同的映射配置为以下内容:The same mapping can be configured as the following in copy activity payload (see translator):

{
    "name": "CopyActivityHierarchicalToTabular",
    "type": "Copy",
    "typeProperties": {
        "source": { "type": "MongoDbV2Source" },
        "sink": { "type": "DelimitedTextSink" },
        "translator": {
            "type": "TabularTranslator",
            "mappings": [
                {
                    "source": { "path": "$['number']" },
                    "sink": { "name": "orderNumber" }
                },
                {
                    "source": { "path": "$['date']" },
                    "sink": { "name": "orderDate" }
                },
                {
                    "source": { "path": "['prod']" },
                    "sink": { "name": "order_pd" }
                },
                {
                    "source": { "path": "['price']" },
                    "sink": { "name": "order_price" }
                },
                {
                    "source": { "path": "$['city'][0]['name']" },
                    "sink": { "name": "city" }
                }
            ],
            "collectionReference": "$['orders']"
        }
    },
    ...
}

表格/分层源到分层接收器Tabular/Hierarchical source to hierarchical sink

用户体验流类似于分层源到表格接收器The user experience flow is similar to Hierarchical source to tabular sink.

将数据从表格源复制到分层接收器时,不支持写入到对象内的数组。When copying data from tabular source to hierarchical sink, writing to array inside object is not supported.

将数据从分层源复制到分层接收器时,还可以通过选择对象/数组并映射到接收器而不触及内部字段,来保留整个层的分层结构。When copying data from hierarchical source to hierarchical sink, you can additionally preserve entire layer's hierarchy, by selecting the object/array and map to sink without touching the inner fields.

参数化映射Parameterize mapping

若要创建模板化管道以动态复制大量对象,请确定你是可以利用默认映射,还是需要为相应的对象定义显式映射If you want to create a templatized pipeline to copy large number of objects dynamically, determine whether you can leverage the default mapping or you need to define explicit mapping for respective objects.

如果需要显式映射,可执行以下操作:If explicit mapping is needed, you can:

  1. 在管道级别定义一个对象类型的参数,例如 mappingDefine a parameter with object type at the pipeline level, for example, mapping.

  2. 将映射参数化:在“复制活动”->“映射”选项卡上,选择添加动态内容并选择上面的参数。Parameterize the mapping: on copy activity -> mapping tab, choose to add dynamic content and select the above parameter. 活动有效负载将如下所示:The activity payload would be as the following:

    {
        "name": "CopyActivityHierarchicalToTabular",
        "type": "Copy",
        "typeProperties": {
            "source": {...},
            "sink": {...},
            "translator": {
                "value": "@pipeline().parameters.mapping",
                "type": "Expression"
            },
            ...
        }
    }
    
  3. 构造要传递给映射参数的值。Construct the value to pass into the mapping parameter. 它应该是 translator 定义的整个对象,请参阅显式映射部分中的示例。It should be the entire object of translator definition, refer to the samples in explicit mapping section. 例如,对于从表格源到表格接收器的复制,值应该是 {"type":"TabularTranslator","mappings":[{"source":{"name":"Id"},"sink":{"name":"CustomerID"}},{"source":{"name":"Name"},"sink":{"name":"LastName"}},{"source":{"name":"LastModifiedDate"},"sink":{"name":"ModifiedDate"}}]}For example, for tabular source to tabular sink copy, the value should be {"type":"TabularTranslator","mappings":[{"source":{"name":"Id"},"sink":{"name":"CustomerID"}},{"source":{"name":"Name"},"sink":{"name":"LastName"}},{"source":{"name":"LastModifiedDate"},"sink":{"name":"ModifiedDate"}}]}.

数据类型映射Data type mapping

复制活动按照以下流执行源类型到接收器类型的映射:Copy activity performs source types to sink types mapping with the following flow:

  1. 从源原生数据类型转换为 Azure 数据工厂临时数据类型。Convert from source native data types to Azure Data Factory interim data types.
  2. 根据需要自动转换临时数据类型,使之匹配相应的接收器类型(适用于默认映射显式映射)。Automatically convert interim data type as needed to match corresponding sink types, applicable for both default mapping and explicit mapping.
  3. 从 Azure 数据工厂临时数据类型转换为接收器原生数据类型。Convert from Azure Data Factory interim data types to sink native data types.

复制活动目前支持以下临时数据类型:Boolean、Byte、Byte Array、Datetime、DatetimeOffset、Decimal、Double、GUID、Int16、Int32、Int64、SByte、Single、String、Timespan、UInt16、UInt32 和 UInt64。Copy activity currently supports the following interim data types: Boolean, Byte, Byte array, Datetime, DatetimeOffset, Decimal, Double, GUID, Int16, Int32, Int64, SByte, Single, String, Timespan, UInt16, UInt32, and UInt64.

从源到接收器支持在临时类型之间进行以下数据类型转换。The following data type conversions are supported between the interim types from source to sink.

源\接收器Source\Sink 布尔Boolean Byte ArrayByte array 小数Decimal 日期/时间 (1)Date/Time (1) 浮点数 (2)Float-point (2) GUIDGUID 整数 (3)Integer (3) StringString TimeSpanTimeSpan
布尔Boolean
Byte ArrayByte array
日期/时间Date/Time
小数Decimal
浮点数Float-point
GUIDGUID
IntegerInteger
StringString
TimeSpanTimeSpan

(1) 日期/时间包括 DateTime 和 DateTimeOffset。(1) Date/Time includes DateTime and DateTimeOffset.

(2) 浮点数包括 Single 和 Double。(2) Float-point includes Single and Double.

(3) 整数包括 SByte、Byte、Int16、UInt16、Int32、UInt32、Int64 和 UInt64。(3) Integer includes SByte, Byte, Int16, UInt16, Int32, UInt32, Int64, and UInt64.

备注

  • 在表格数据之间进行复制时,目前支持此类数据类型转换。Currently such data type conversion is supported when copying between tabular data. 不支持分层源/接收器,这意味着在源和接收器临时类型之间没有系统定义的数据类型转换。Hierarchical sources/sinks are not supported, which means there is no system-defined data type conversion between source and sink interim types.
  • 此功能适用于最新的数据集模型。This feature works with the latest dataset model. 如果在 UI 中未看到此选项,请尝试创建一个新数据集。If you don't see this option from the UI, try creating a new dataset.

复制活动支持用于数据类型转换的以下属性(对于编程创作,位于 translator 部分下):The following properties are supported in copy activity for data type conversion (under translator section for programmatical authoring):

属性Property 说明Description 必需Required
typeConversiontypeConversion 实现新的数据类型转换体验。Enable the new data type conversion experience.
考虑到后向兼容性,默认值为 false。Default value is false due to backward compatibility.

对于自 2020 年 6 月末以来通过数据工厂创作 UI 创建的新复制活动,默认情况下会启用此数据类型转换以实现最佳体验。你可以在适用方案的“复制活动”->“映射”选项卡上查看以下类型的转换设置。For new copy activities created via Data Factory authoring UI since late June 2020, this data type conversion is enabled by default for the best experience, and you can see the following type conversion settings on copy activity -> mapping tab for applicable scenarios.
若要以编程方式创建管道,需要将 typeConversion 属性显式设置为 true 以启用它。To create pipeline programmatically, you need to explicitly set typeConversion property to true to enable it.
对于在此功能发布之前创建的现有复制活动,你不会在数据工厂创作 UI 上看到有关后向兼容的类型转换选项。For existing copy activities created before this feature is released, you won't see type conversion options on Data Factory authoring UI for backward compatibility.
No
typeConversionSettingstypeConversionSettings 一组类型转换设置。A group of type conversion settings. typeConversion 设置为 true 时应用。Apply when typeConversion is set to true. 以下属性都属于此组。The following properties are all under this group. No
typeConversionSettingsUnder typeConversionSettings
allowDataTruncationallowDataTruncation 复制期间使用不同类型将源数据转换为接收器数据(例如,从小数转换为整数,从 DatetimeOffset 转换为 Datetime)时,允许数据截断。Allow data truncation when converting source data to sink with different type during copy, for example, from decimal to integer, from DatetimeOffset to Datetime.
默认值为 true。Default value is true.
No
treatBooleanAsNumbertreatBooleanAsNumber 将布尔值视为数字,例如,将 true 视为 1。Treat booleans as numbers, for example, true as 1.
默认值为 false。Default value is false.
No
dateTimeFormatdateTimeFormat 在日期(不包含时区偏移)和字符串之间进行转换时的格式字符串,例如 yyyy-MM-dd HH:mm:ss.fffFormat string when converting between dates without time zone offset and strings, for example, yyyy-MM-dd HH:mm:ss.fff. 请参阅自定义日期和时间格式字符串,了解详细信息。Refer to Custom Date and Time Format Strings for detailed information. No
dateTimeOffsetFormatdateTimeOffsetFormat 在日期(包含时区偏移)和字符串之间进行转换时的格式字符串,例如 yyyy-MM-dd HH:mm:ss.fff zzzFormat string when converting between dates with time zone offset and strings, for example, yyyy-MM-dd HH:mm:ss.fff zzz. 请参阅自定义日期和时间格式字符串,了解详细信息。Refer to Custom Date and Time Format Strings for detailed information. No
timeSpanFormattimeSpanFormat 在时间段和字符串之间进行转换时的格式字符串,例如 dd\.hh\:mmFormat string when converting between time periods and strings, for example, dd\.hh\:mm. 请参阅自定义 TimeSpan 格式字符串,了解详细信息。Refer to Custom TimeSpan Format Strings for detailed information. No
cultureculture 转换类型时要使用的区域性信息,例如 en-usfr-frCulture information to be used when convert types, for example, en-us or fr-fr. No

示例:Example:

{
    "name": "CopyActivity",
    "type": "Copy",
    "typeProperties": {
        "source": {
            "type": "ParquetSource"
        },
        "sink": {
            "type": "SqlSink"
        },
        "translator": {
            "type": "TabularTranslator",
            "typeConversion": true,
            "typeConversionSettings": {
                "allowDataTruncation": true,
                "treatBooleanAsNumber": true,
                "dateTimeFormat": "yyyy-MM-dd HH:mm:ss.fff",
                "dateTimeOffsetFormat": "yyyy-MM-dd HH:mm:ss.fff zzz",
                "timeSpanFormat": "dd\.hh\:mm",
                "culture": "en-gb"
            }
        }
    },
    ...
}

旧模型Legacy models

备注

为实现后向兼容性,照旧支持下述可将源列/字段映射到接收器的模型。The following models to map source columns/fields to sink are still supported as is for backward compatibility. 建议使用架构映射中提到的新模型。We suggest that you use the new model mentioned in schema mapping. 数据工厂创作 UI 已转变为生成新模型。Data Factory authoring UI has switched to generating the new model.

备用列映射(旧模型)Alternative column-mapping (legacy model)

可以指定复制活动 -> translator -> columnMappings,以便在表格形式的数据之间进行映射。You can specify copy activity -> translator -> columnMappings to map between tabular-shaped data. 在这种情况下,输入和输出数据集都需要“structure”节。In this case, the "structure" section is required for both input and output datasets. 列映射支持将源数据集“structure”中的所有列或列子集接收器数据集“structure”中的所有列 。Column mapping supports mapping all or subset of columns in the source dataset "structure" to all columns in the sink dataset "structure". 以下是导致异常的错误条件:The following are error conditions that result in an exception:

  • 源数据存储查询结果中没有输入数据集“structure”部分中指定的列名称。Source data store query result does not have a column name that is specified in the input dataset "structure" section.
  • 接收器数据存储(如果具有预定义架构)没有输出数据集“structure”部分中指定的列名称。Sink data store (if with pre-defined schema) does not have a column name that is specified in the output dataset "structure" section.
  • 接收器数据集“structure”中的列数量多于或少于映射中指定的数量。Either fewer columns or more columns in the "structure" of sink dataset than specified in the mapping.
  • 重复的映射。Duplicate mapping.

在以下示例中,输入数据集有一个结构,并且它指向本地 Oracle 数据库中的表。In the following example, the input dataset has a structure, and it points to a table in an on-premises Oracle database.

{
    "name": "OracleDataset",
    "properties": {
        "structure":
         [
            { "name": "UserId"},
            { "name": "Name"},
            { "name": "Group"}
         ],
        "type": "OracleTable",
        "linkedServiceName": {
            "referenceName": "OracleLinkedService",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "tableName": "SourceTable"
        }
    }
}

在本示例中,输出数据集有一个结构,并指向 Salesfoce 中的表。In this sample, the output dataset has a structure and it points to a table in Salesfoce.

{
    "name": "SalesforceDataset",
    "properties": {
        "structure":
        [
            { "name": "MyUserId"},
            { "name": "MyName" },
            { "name": "MyGroup"}
        ],
        "type": "SalesforceObject",
        "linkedServiceName": {
            "referenceName": "SalesforceLinkedService",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "tableName": "SinkTable"
        }
    }
}

以下 JSON 定义管道中的复制活动。The following JSON defines a copy activity in a pipeline. 使用 translator -> columnMappings 属性将源中的列映射到接收器中的列。The columns from source mapped to columns in sink by using the translator -> columnMappings property.

{
    "name": "CopyActivity",
    "type": "Copy",
    "inputs": [
        {
            "referenceName": "OracleDataset",
            "type": "DatasetReference"
        }
    ],
    "outputs": [
        {
            "referenceName": "SalesforceDataset",
            "type": "DatasetReference"
        }
    ],
    "typeProperties":    {
        "source": { "type": "OracleSource" },
        "sink": { "type": "SalesforceSink" },
        "translator":
        {
            "type": "TabularTranslator",
            "columnMappings":
            {
                "UserId": "MyUserId",
                "Group": "MyGroup",
                "Name": "MyName"
            }
        }
    }
}

如果使用语法 "columnMappings": "UserId: MyUserId, Group: MyGroup, Name: MyName" 指定列映射,则仍然按原样支持它。If you are using the syntax of "columnMappings": "UserId: MyUserId, Group: MyGroup, Name: MyName" to specify column mapping, it is still supported as-is.

备用架构映射(旧模型)Alternative schema-mapping (legacy model)

可以指定复制活动 -> translator -> schemaMapping,以便在分层数据和表格形式的数据之间进行映射(例如,将数据从 MongoDB/REST 复制到文本文件以及从 Oracle 复制到 Azure Cosmos DB API for MongoDB)。You can specify copy activity -> translator -> schemaMapping to map between hierarchical-shaped data and tabular-shaped data, for example, copy from MongoDB/REST to text file and copy from Oracle to Azure Cosmos DB's API for MongoDB. 复制活动 translator 部分支持以下属性:The following properties are supported in copy activity translator section:

属性Property 说明Description 必需Required
typetype 复制活动转换器的 type 属性必须设置为:TabularTranslatorThe type property of the copy activity translator must be set to: TabularTranslator Yes
schemaMappingschemaMapping 键值对的集合,代表从源端到接收器端的映射关系。A collection of key-value pairs, which represents the mapping relation from source side to sink side.
- 键: 代表源。- Key: represents source. 对于表格源,指定数据集结构中定义的列名称;对于分层源,指定要提取和映射的每个字段的 JSON 路径表达式。For tabular source, specify the column name as defined in dataset structure; for hierarchical source, specify the JSON path expression for each field to extract and map.
- 值: 代表接收器。- Value: represents sink. 对于表格接收器,指定数据集结构中定义的列名称;对于分层接收器,指定要提取和映射的每个字段的 JSON 路径表达式。For tabular sink, specify the column name as defined in dataset structure; for hierarchical sink, specify the JSON path expression for each field to extract and map.
在使用分层数据时,对于根对象下的字段,JSON 路径以根 $ 开头;对于按 collectionReference 属性选择的数组中的字段,JSON 路径以数组元素开头。In the case of hierarchical data, for fields under root object, JSON path starts with root $; for fields inside the array chosen by collectionReference property, JSON path starts from the array element.
Yes
collectionReferencecollectionReference 若要进行迭代操作,以同一模式从数组字段中的对象提取数据并按行和对象进行转换,请指定要进行交叉应用的该数组的 JSON 路径。If you want to iterate and extract data from the objects inside an array field with the same pattern and convert to per row per object, specify the JSON path of that array to do cross-apply. 仅当分层数据为源时,才支持此属性。This property is supported only when hierarchical data is source. No

示例:从 MongoDB 复制到 Oracle:Example: copy from MongoDB to Oracle:

例如,如果 MongoDB 文档的内容如下:For example, if you have MongoDB document with the following content:

{
    "id": {
        "$oid": "592e07800000000000000000"
    },
    "number": "01",
    "date": "20170122",
    "orders": [
        {
            "prod": "p1",
            "price": 23
        },
        {
            "prod": "p2",
            "price": 13
        },
        {
            "prod": "p3",
            "price": 231
        }
    ],
    "city": [ { "name": "Seattle" } ]
}

而你需要按以下格式通过平展数组中数据(order_pd 和 order_price)的方式将其复制到 Azure SQL 表中,并使用常见的根信息(数字、日期和城市) 进行交叉联接:and you want to copy it into an Azure SQL table in the following format, by flattening the data inside the array (order_pd and order_price) and cross join with the common root info (number, date, and city):

orderNumberorderNumber orderDateorderDate order_pdorder_pd order_priceorder_price citycity
0101 2017012220170122 P1P1 2323 西雅图Seattle
0101 2017012220170122 P2P2 1313 西雅图Seattle
0101 2017012220170122 P3P3 231231 西雅图Seattle

将架构映射规则配置为以下复制活动 JSON 示例:Configure the schema-mapping rule as the following copy activity JSON sample:

{
    "name": "CopyFromMongoDBToOracle",
    "type": "Copy",
    "typeProperties": {
        "source": {
            "type": "MongoDbV2Source"
        },
        "sink": {
            "type": "OracleSink"
        },
        "translator": {
            "type": "TabularTranslator",
            "schemaMapping": {
                "$.number": "orderNumber",
                "$.date": "orderDate",
                "prod": "order_pd",
                "price": "order_price",
                "$.city[0].name": "city"
            },
            "collectionReference":  "$.orders"
        }
    }
}

后续步骤Next steps

请参阅其他复制活动文章:See the other Copy Activity articles: