分析映射数据流中的转换Parse transformation in mapping data flow

适用于: Azure 数据工厂 Azure Synapse Analytics

使用“分析”转换分析数据中文档形式的列。Use the Parse transformation to parse columns in your data that are in document form. 可以分析的嵌入文档的当前支持类型是 JSON 和带分隔符的文本。The current supported types of embedded documents that can be parsed are JSON and delimited text.

配置Configuration

在分析转换配置面板中,首先选择要以内联方式分析的列中包含的数据类型。In the parse transformation configuration panel, you will first pick the type of data contained in the columns that you wish to parse inline. 分析转换还包含以下配置设置。The parse transformation also contains the following configuration settings.

分析设置Parse settings

Column

与派生列和聚合类似,你可以在此处通过从下拉选取器中选择现有列来对其进行修改。Similar to derived columns and aggregates, this is where you will either modify an exiting column by selecting it from the drop-down picker. 或者,也可以在此处键入新列的名称。Or you can type in the name of a new column here. ADF 将已分析的源数据存储在此列中。ADF will store the parsed source data in this column.

表达式Expression

使用表达式生成器设置分析的源。Use the expression builder to set the source for your parsing. 可以只需简单地选择包含要分析的自包含数据的源列,也可以创建复杂的表达式进行分析。This can be as simple as just selecting the source column with the self-contained data that you wish to parse, or you can create complex expressions to parse.

输出列类型Output column type

你将在此处根据将写入单个列的分析配置目标输出架构。Here is where you will configure the target output schema from the parsing that will be written into a single column.

分析示例Parse example

在此示例中,我们定义了对传入字段“jsonString”的分析,该字段是纯文本,但格式为 JSON 结构。In this example, we have defined parsing of the incoming field "jsonString" which is plain text, but formatted as a JSON structure. 我们将使用此架构将分析结果以 JSON 格式存储在一个名为“json”的新列中:We're going to store the parsed results as JSON in a new column called "json" with this schema:

(trade as boolean, customers as string[])

请参阅检查选项卡和数据预览,验证是否正确映射输出。Refer to the inspect tab and data preview to verify your output is mapped properly.

示例Examples

source(output(
        name as string,
        location as string,
        satellites as string[],
        goods as (trade as boolean, customers as string[], orders as (orderId as string, orderTotal as double, shipped as (orderItems as (itemName as string, itemQty as string)[]))[])
    ),
    allowSchemaDrift: true,
    validateSchema: false,
    ignoreNoFilesFound: false,
    documentForm: 'documentPerLine') ~> JsonSource
source(output(
        movieId as string,
        title as string,
        genres as string
    ),
    allowSchemaDrift: true,
    validateSchema: false,
    ignoreNoFilesFound: false) ~> CsvSource
JsonSource derive(jsonString = toString(goods)) ~> StringifyJson
StringifyJson parse(json = jsonString ? (trade as boolean,
        customers as string[]),
    format: 'json',
    documentForm: 'arrayOfDocuments') ~> ParseJson
CsvSource derive(csvString = 'Id|name|year\n\'1\'|\'test1\'|\'1999\'') ~> CsvString
CsvString parse(csv = csvString ? (id as integer,
        name as string,
        year as string),
    format: 'delimited',
    columnNamesAsHeader: true,
    columnDelimiter: '|',
    nullValue: '',
    documentForm: 'documentPerLine') ~> ParseCsv
ParseJson select(mapColumn(
        jsonString,
        json
    ),
    skipDuplicateMapInputs: true,
    skipDuplicateMapOutputs: true) ~> KeepStringAndParsedJson
ParseCsv select(mapColumn(
        csvString,
        csv
    ),
    skipDuplicateMapInputs: true,
    skipDuplicateMapOutputs: true) ~> KeepStringAndParsedCsv

数据流脚本Data flow script

语法Syntax

示例Examples

parse(json = jsonString ? (trade as boolean,
                                customers as string[]),
                format: 'json',
                documentForm: 'singleDocument') ~> ParseJson

parse(csv = csvString ? (id as integer,
                                name as string,
                                year as string),
                format: 'delimited',
                columnNamesAsHeader: true,
                columnDelimiter: '|',
                nullValue: '',
                documentForm: 'documentPerLine') ~> ParseCsv

后续步骤Next steps