Azure 数据工厂中的 Delete 活动Delete Activity in Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics(预览版)

可以使用 Azure 数据工厂中的 Delete 活动从“本地存储”库或“云存储”库中删除文件或文件夹。You can use the Delete Activity in Azure Data Factory to delete files or folders from on-premises storage stores or cloud storage stores. 不再需要文件时,使用此活动来清理或存档文件。Use this activity to clean up or archive files when they are no longer needed.

警告

无法还原已删除的文件或文件夹(除非存储已启用软删除)。Deleted files or folders cannot be restored (unless the storage has soft-delete enabled). 使用 Delete 活动删除文件或文件夹时务必谨慎。Be cautious when using the Delete activity to delete files or folders.

最佳实践Best practices

以下是使用 Delete 活动的一些建议:Here are some recommendations for using the Delete activity:

  • 如果将来需要还原,则先备份文件,然后使用 Delete 活动删除文件。Back up your files before deleting them with the Delete activity in case you need to restore them in the future.

  • 确保数据工厂具有从“存储”库中删除文件夹或文件的写入权限。Make sure that Data Factory has write permissions to delete folders or files from the storage store.

  • 确保删除的不是同时在写入的文件。Make sure you are not deleting files that are being written at the same time.

  • 如果想要从本地系统中删除文件或文件夹,请确保使用的是版本大于 3.14 的自承载集成运行时。If you want to delete files or folder from an on-premises system, make sure you are using a self-hosted integration runtime with a version greater than 3.14.

支持的数据存储Supported data stores

语法Syntax

{
    "name": "DeleteActivity",
    "type": "Delete",
    "typeProperties": {
        "dataset": {
            "referenceName": "<dataset name>",
            "type": "DatasetReference"
        },
        "storeSettings": {
            "type": "<source type>",
            "recursive": true/false,
            "maxConcurrentConnections": <number>
        },
        "enableLogging": true/false,
        "logStorageSettings": {
            "linkedServiceName": {
                "referenceName": "<name of linked service>",
                "type": "LinkedServiceReference"
            },
            "path": "<path to save log file>"
        }
    }
}

Type 属性Type properties

属性Property 说明Description 必须Required
datasetdataset 提供数据集引用以确定要删除的文件或文件夹Provides the dataset reference to determine which files or folder to be deleted Yes
recursiverecursive 表明从子文件夹中以递归方式删除数据,还是只从指定文件夹中删除数据。Indicates whether the files are deleted recursively from the subfolders or only from the specified folder. 否。No. 默认为 falseThe default is false.
maxConcurrentConnectionsmaxConcurrentConnections 用于删除文件夹或文件而同时连接到“存储”库的连接数。The number of the connections to connect to storage store concurrently for deleting folder or files. 否。No. 默认为 1The default is 1.
enableloggingenablelogging 表明是否需要记录已删除的文件夹或文件名。Indicates whether you need to record the folder or file names that have been deleted. 如果为 true,则需要进一步提供存储帐户来保存日志文件,以便可以通过读取日志文件跟踪 Delete 活动的行为。If true, you need to further provide a storage account to save the log file, so that you can track the behaviors of the Delete activity by reading the log file. No
logStorageSettingslogStorageSettings 仅适用于 enablelogging = true 时。Only applicable when enablelogging = true.

可指定的一组存储属性,您要在其中保存包含已由 Delete 活动删除的文件夹或文件名的日志文件。A group of storage properties that can be specified where you want to save the log file containing the folder or file names that have been deleted by the Delete activity.
No
linkedServiceNamelinkedServiceName 仅适用于 enablelogging = true 时。Only applicable when enablelogging = true.

存储包含已由 Delete 活动删除的文件夹或文件名的日志文件的 Azure 存储Azure Data Lake Storage Gen2 链接服务。The linked service of Azure Storage, or Azure Data Lake Storage Gen2 to store the log file that contains the folder or file names that have been deleted by the Delete activity. 请注意,必须为它配置与删除活动用来删除文件的集成运行时类型相同的集成运行时类型。Be aware it must be configured with the same type of Integration Runtime from the one used by delete activity to delete files.
No
pathpath 仅适用于 enablelogging = true 时。Only applicable when enablelogging = true.

在存储帐户中保存日志文件的路径。The path to save the log file in your storage account. 如果未提供路径,服务会为用户创建一个容器。If you do not provide a path, the service creates a container for you.
No

监视Monitoring

可以在两个位置查看和监视 Delete 活动的结果:There are two places where you can see and monitor the results of the Delete activity:

  • 从 Delete 活动的输出。From the output of the Delete activity.
  • 从日志文件。From the log file.

Delete 活动的示例输出Sample output of the Delete activity

{ 
  "datasetName": "AmazonS3",
  "type": "AmazonS3Object",
  "prefix": "test",
  "bucketName": "adf",
  "recursive": true,
  "isWildcardUsed": false,
  "maxConcurrentConnections": 2,  
  "filesDeleted": 4,
  "logPath": "https://sample.blob.core.chinacloudapi.cn/mycontainer/5c698705-a6e2-40bf-911e-e0a927de3f07",
  "effectiveIntegrationRuntime": "MyAzureIR (China East 2)",
  "executionDuration": 650
}

Delete 活动的示例日志文件Sample log file of the Delete activity

名称Name 类别Category 状态Status 错误Error
test1/yyy.jsontest1/yyy.json 文件File DeletedDeleted
test2/hello789.txttest2/hello789.txt 文件File DeletedDeleted
test2/test3/hello000.txttest2/test3/hello000.txt 文件File DeletedDeleted
test2/test3/zzz.jsontest2/test3/zzz.json 文件File DeletedDeleted

使用 Delete 活动的示例Examples of using the Delete activity

删除特定文件夹或文件Delete specific folders or files

库具有以下文件夹结构:The store has the following folder structure:

Root/Root/
    Folder_A_1/    Folder_A_1/
        1.txt        1.txt
        2.txt        2.txt
        3.csv        3.csv
    Folder_A_2/    Folder_A_2/
        4.txt        4.txt
        5.csv        5.csv
        Folder_B_1/        Folder_B_1/
            6.txt            6.txt
            7.csv            7.csv
        Folder_B_2/        Folder_B_2/
            8.txt            8.txt

现在,使用的是 Delete 活动来删除文件夹或文件,方法是将来自数据集和 Delete 活动的不同属性值相结合:Now you are using the Delete activity to delete folder or files by the combination of different property value from the dataset and the Delete activity:

folderPathfolderPath fileNamefileName recursiverecursive 输出Output
Root/ Folder_A_2Root/ Folder_A_2 NullNULL FalseFalse Root/Root/
    Folder_A_1/    Folder_A_1/
        1.txt        1.txt
        2.txt        2.txt
        3.csv        3.csv
    Folder_A_2/    Folder_A_2/
        4.txt        4.txt
        5.csv        5.csv
        Folder_B_1/        Folder_B_1/
            6.txt            6.txt
            7.csv            7.csv
        Folder_B_2/        Folder_B_2/
            8.txt            8.txt
Root/ Folder_A_2Root/ Folder_A_2 NullNULL TrueTrue Root/Root/
    Folder_A_1/    Folder_A_1/
        1.txt        1.txt
        2.txt        2.txt
        3.csv        3.csv
    Folder_A_2/    Folder_A_2/
        4.txt        4.txt
        5.csv        5.csv
        Folder_B_1/        Folder_B_1/
            6.txt            6.txt
            7.csv            7.csv
        Folder_B_2/        Folder_B_2/
            8.txt            8.txt
Root/ Folder_A_2Root/ Folder_A_2 *.txt*.txt FalseFalse Root/Root/
    Folder_A_1/    Folder_A_1/
        1.txt        1.txt
        2.txt        2.txt
        3.csv        3.csv
    Folder_A_2/    Folder_A_2/
        4.txt        4.txt
        5.csv        5.csv
        Folder_B_1/        Folder_B_1/
            6.txt            6.txt
            7.csv            7.csv
        Folder_B_2/        Folder_B_2/
            8.txt            8.txt
Root/ Folder_A_2Root/ Folder_A_2 *.txt*.txt TrueTrue Root/Root/
    Folder_A_1/    Folder_A_1/
        1.txt        1.txt
        2.txt        2.txt
        3.csv        3.csv
    Folder_A_2/    Folder_A_2/
        4.txt        4.txt
        5.csv        5.csv
        Folder_B_1/        Folder_B_1/
            6.txt            6.txt
            7.csv            7.csv
        Folder_B_2/        Folder_B_2/
            8.txt            8.txt

定期清理分时文件夹或文件Periodically clean up the time-partitioned folder or files

可以创建管道来定期清理分时文件夹或文件。You can create a pipeline to periodically clean up the time partitioned folder or files. 例如,文件夹结构类似于 /mycontainer/2018/12/14/*.csvFor example, the folder structure is similar as: /mycontainer/2018/12/14/*.csv. 可以利用计划触发器中的 ADF 系统变量来确定每个运行的管道中应删除的文件夹或文件。You can leverage ADF system variable from schedule trigger to identify which folder or files should be deleted in each pipeline run.

示例管道Sample pipeline

{
    "name":"cleanup_time_partitioned_folder",
    "properties":{
        "activities":[
            {
                "name":"DeleteOneFolder",
                "type":"Delete",
                "dependsOn":[

                ],
                "policy":{
                    "timeout":"7.00:00:00",
                    "retry":0,
                    "retryIntervalInSeconds":30,
                    "secureOutput":false,
                    "secureInput":false
                },
                "userProperties":[

                ],
                "typeProperties":{
                    "dataset":{
                        "referenceName":"PartitionedFolder",
                        "type":"DatasetReference",
                        "parameters":{
                            "TriggerTime":{
                                "value":"@formatDateTime(pipeline().parameters.TriggerTime, 'yyyy/MM/dd')",
                                "type":"Expression"
                            }
                        }
                    },
                    "logStorageSettings":{
                        "linkedServiceName":{
                            "referenceName":"BloblinkedService",
                            "type":"LinkedServiceReference"
                        },
                        "path":"mycontainer/log"
                    },
                    "enableLogging":true,
                    "storeSettings":{
                        "type":"AzureBlobStorageReadSettings",
                        "recursive":true
                    }
                }
            }
        ],
        "parameters":{
            "TriggerTime":{
                "type":"string"
            }
        },
        "annotations":[

        ]
    }
}

示例数据集Sample dataset

{
    "name":"PartitionedFolder",
    "properties":{
        "linkedServiceName":{
            "referenceName":"BloblinkedService",
            "type":"LinkedServiceReference"
        },
        "parameters":{
            "TriggerTime":{
                "type":"string"
            }
        },
        "annotations":[

        ],
        "type":"Binary",
        "typeProperties":{
            "location":{
                "type":"AzureBlobStorageLocation",
                "folderPath":{
                    "value":"@dataset().TriggerTime",
                    "type":"Expression"
                },
                "container":{
                    "value":"mycontainer",
                    "type":"Expression"
                }
            }
        }
    }
}

示例触发器Sample trigger

{
    "name": "DailyTrigger",
    "properties": {
        "runtimeState": "Started",
        "pipelines": [
            {
                "pipelineReference": {
                    "referenceName": "cleanup_time_partitioned_folder",
                    "type": "PipelineReference"
                },
                "parameters": {
                    "TriggerTime": "@trigger().scheduledTime"
                }
            }
        ],
        "type": "ScheduleTrigger",
        "typeProperties": {
            "recurrence": {
                "frequency": "Day",
                "interval": 1,
                "startTime": "2018-12-13T00:00:00.000Z",
                "timeZone": "UTC",
                "schedule": {
                    "minutes": [
                        59
                    ],
                    "hours": [
                        23
                    ]
                }
            }
        }
    }
}

清理于 2018 年 1 月 1 日之前进行了最后一次修改的过期文件Clean up the expired files that were last modified before 2018.1.1

可以利用文件属性筛选器创建清理旧文件或过期文件的管道:数据集中的“LastModified”。You can create a pipeline to clean up the old or expired files by leveraging file attribute filter: “LastModified” in dataset.

示例管道Sample pipeline

{
    "name":"CleanupExpiredFiles",
    "properties":{
        "activities":[
            {
                "name":"DeleteFilebyLastModified",
                "type":"Delete",
                "dependsOn":[

                ],
                "policy":{
                    "timeout":"7.00:00:00",
                    "retry":0,
                    "retryIntervalInSeconds":30,
                    "secureOutput":false,
                    "secureInput":false
                },
                "userProperties":[

                ],
                "typeProperties":{
                    "dataset":{
                        "referenceName":"BlobFilesLastModifiedBefore201811",
                        "type":"DatasetReference"
                    },
                    "logStorageSettings":{
                        "linkedServiceName":{
                            "referenceName":"BloblinkedService",
                            "type":"LinkedServiceReference"
                        },
                        "path":"mycontainer/log"
                    },
                    "enableLogging":true,
                    "storeSettings":{
                        "type":"AzureBlobStorageReadSettings",
                        "recursive":true,
                        "modifiedDatetimeEnd":"2018-01-01T00:00:00.000Z"
                    }
                }
            }
        ],
        "annotations":[

        ]
    }
}

示例数据集Sample dataset

{
    "name":"BlobFilesLastModifiedBefore201811",
    "properties":{
        "linkedServiceName":{
            "referenceName":"BloblinkedService",
            "type":"LinkedServiceReference"
        },
        "annotations":[

        ],
        "type":"Binary",
        "typeProperties":{
            "location":{
                "type":"AzureBlobStorageLocation",
                "fileName":"*",
                "folderPath":"mydirectory",
                "container":"mycontainer"
            }
        }
    }
}

通过链接 Copy 活动和 Delete 活动来移动文件Move files by chaining the Copy activity and the Delete activity

可以通过在管道中使用 Copy 活动复制文件,然后使用 Delete 活动删除文件来移动文件。You can move a file by using a copy activity to copy a file and then a delete activity to delete a file in a pipeline. 如果要移动多个文件,可以使用 GetMetadata 活动 + Filter 活动 + Foreach 活动 + Copy 活动 + Delete 活动,如以下示例所示:When you want to move multiple files, you can use the GetMetadata activity + Filter activity + Foreach activity + Copy activity + Delete activity as in the following sample:

备注

如果想要通过仅定义包含文件夹路径的数据集,然后使用 Copy 活动和 Delete 活动引用表示某文件夹的同一数据集来移动整个文件夹,则需要十分谨慎。If you want to move the entire folder by defining a dataset containing a folder path only, and then using a copy activity and a the Delete activity to reference to the same dataset representing a folder, you need to be very careful. 因为必须确保在复制操作和删除操作之间不会有新文件进入文件夹。It is because you have to make sure that there will NOT be new files arriving into the folder between copying operation and deleting operation. 如果在 Copy 活动刚完成复制作业,但 Delete 活动尚未开始时有新文件进入文件夹,则 Delete 活动可能将通过删除整个文件夹来删除尚未复制到目标的此新文件。If there are new files arriving at the folder at the moment when your copy activity just completed the copy job but the Delete activity has not been stared, it is possible that the Delete activity will delete this new arriving file which has NOT been copied to the destination yet by deleting the entire folder.

示例管道Sample pipeline

{
    "name":"MoveFiles",
    "properties":{
        "activities":[
            {
                "name":"GetFileList",
                "type":"GetMetadata",
                "dependsOn":[

                ],
                "policy":{
                    "timeout":"7.00:00:00",
                    "retry":0,
                    "retryIntervalInSeconds":30,
                    "secureOutput":false,
                    "secureInput":false
                },
                "userProperties":[

                ],
                "typeProperties":{
                    "dataset":{
                        "referenceName":"OneSourceFolder",
                        "type":"DatasetReference",
                        "parameters":{
                            "Container":{
                                "value":"@pipeline().parameters.SourceStore_Location",
                                "type":"Expression"
                            },
                            "Directory":{
                                "value":"@pipeline().parameters.SourceStore_Directory",
                                "type":"Expression"
                            }
                        }
                    },
                    "fieldList":[
                        "childItems"
                    ],
                    "storeSettings":{
                        "type":"AzureBlobStorageReadSettings",
                        "recursive":true
                    },
                    "formatSettings":{
                        "type":"BinaryReadSettings"
                    }
                }
            },
            {
                "name":"FilterFiles",
                "type":"Filter",
                "dependsOn":[
                    {
                        "activity":"GetFileList",
                        "dependencyConditions":[
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties":[

                ],
                "typeProperties":{
                    "items":{
                        "value":"@activity('GetFileList').output.childItems",
                        "type":"Expression"
                    },
                    "condition":{
                        "value":"@equals(item().type, 'File')",
                        "type":"Expression"
                    }
                }
            },
            {
                "name":"ForEachFile",
                "type":"ForEach",
                "dependsOn":[
                    {
                        "activity":"FilterFiles",
                        "dependencyConditions":[
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties":[

                ],
                "typeProperties":{
                    "items":{
                        "value":"@activity('FilterFiles').output.value",
                        "type":"Expression"
                    },
                    "batchCount":20,
                    "activities":[
                        {
                            "name":"CopyAFile",
                            "type":"Copy",
                            "dependsOn":[

                            ],
                            "policy":{
                                "timeout":"7.00:00:00",
                                "retry":0,
                                "retryIntervalInSeconds":30,
                                "secureOutput":false,
                                "secureInput":false
                            },
                            "userProperties":[

                            ],
                            "typeProperties":{
                                "source":{
                                    "type":"BinarySource",
                                    "storeSettings":{
                                        "type":"AzureBlobStorageReadSettings",
                                        "recursive":false,
                                        "deleteFilesAfterCompletion":false
                                    },
                                    "formatSettings":{
                                        "type":"BinaryReadSettings"
                                    },
                                    "recursive":false
                                },
                                "sink":{
                                    "type":"BinarySink",
                                    "storeSettings":{
                                        "type":"AzureBlobStorageWriteSettings"
                                    }
                                },
                                "enableStaging":false,
                                "dataIntegrationUnits":0
                            },
                            "inputs":[
                                {
                                    "referenceName":"OneSourceFile",
                                    "type":"DatasetReference",
                                    "parameters":{
                                        "Container":{
                                            "value":"@pipeline().parameters.SourceStore_Location",
                                            "type":"Expression"
                                        },
                                        "Directory":{
                                            "value":"@pipeline().parameters.SourceStore_Directory",
                                            "type":"Expression"
                                        },
                                        "filename":{
                                            "value":"@item().name",
                                            "type":"Expression"
                                        }
                                    }
                                }
                            ],
                            "outputs":[
                                {
                                    "referenceName":"OneDestinationFile",
                                    "type":"DatasetReference",
                                    "parameters":{
                                        "Container":{
                                            "value":"@pipeline().parameters.DestinationStore_Location",
                                            "type":"Expression"
                                        },
                                        "Directory":{
                                            "value":"@pipeline().parameters.DestinationStore_Directory",
                                            "type":"Expression"
                                        },
                                        "filename":{
                                            "value":"@item().name",
                                            "type":"Expression"
                                        }
                                    }
                                }
                            ]
                        },
                        {
                            "name":"DeleteAFile",
                            "type":"Delete",
                            "dependsOn":[
                                {
                                    "activity":"CopyAFile",
                                    "dependencyConditions":[
                                        "Succeeded"
                                    ]
                                }
                            ],
                            "policy":{
                                "timeout":"7.00:00:00",
                                "retry":0,
                                "retryIntervalInSeconds":30,
                                "secureOutput":false,
                                "secureInput":false
                            },
                            "userProperties":[

                            ],
                            "typeProperties":{
                                "dataset":{
                                    "referenceName":"OneSourceFile",
                                    "type":"DatasetReference",
                                    "parameters":{
                                        "Container":{
                                            "value":"@pipeline().parameters.SourceStore_Location",
                                            "type":"Expression"
                                        },
                                        "Directory":{
                                            "value":"@pipeline().parameters.SourceStore_Directory",
                                            "type":"Expression"
                                        },
                                        "filename":{
                                            "value":"@item().name",
                                            "type":"Expression"
                                        }
                                    }
                                },
                                "logStorageSettings":{
                                    "linkedServiceName":{
                                        "referenceName":"BloblinkedService",
                                        "type":"LinkedServiceReference"
                                    },
                                    "path":"container/log"
                                },
                                "enableLogging":true,
                                "storeSettings":{
                                    "type":"AzureBlobStorageReadSettings",
                                    "recursive":true
                                }
                            }
                        }
                    ]
                }
            }
        ],
        "parameters":{
            "SourceStore_Location":{
                "type":"String"
            },
            "SourceStore_Directory":{
                "type":"String"
            },
            "DestinationStore_Location":{
                "type":"String"
            },
            "DestinationStore_Directory":{
                "type":"String"
            }
        },
        "annotations":[

        ]
    }
}

示例数据集Sample datasets

GetMetadata 活动用于枚举文件列表的数据集。Dataset used by GetMetadata activity to enumerate the file list.

{
    "name":"OneSourceFolder",
    "properties":{
        "linkedServiceName":{
            "referenceName":"AzureStorageLinkedService",
            "type":"LinkedServiceReference"
        },
        "parameters":{
            "Container":{
                "type":"String"
            },
            "Directory":{
                "type":"String"
            }
        },
        "annotations":[

        ],
        "type":"Binary",
        "typeProperties":{
            "location":{
                "type":"AzureBlobStorageLocation",
                "folderPath":{
                    "value":"@{dataset().Directory}",
                    "type":"Expression"
                },
                "container":{
                    "value":"@{dataset().Container}",
                    "type":"Expression"
                }
            }
        }
    }
}

Copy 活动和 Delete 活动用于数据源的数据集。Dataset for data source used by copy activity and the Delete activity.

{
    "name":"OneSourceFile",
    "properties":{
        "linkedServiceName":{
            "referenceName":"AzureStorageLinkedService",
            "type":"LinkedServiceReference"
        },
        "parameters":{
            "Container":{
                "type":"String"
            },
            "Directory":{
                "type":"String"
            },
            "filename":{
                "type":"string"
            }
        },
        "annotations":[

        ],
        "type":"Binary",
        "typeProperties":{
            "location":{
                "type":"AzureBlobStorageLocation",
                "fileName":{
                    "value":"@dataset().filename",
                    "type":"Expression"
                },
                "folderPath":{
                    "value":"@{dataset().Directory}",
                    "type":"Expression"
                },
                "container":{
                    "value":"@{dataset().Container}",
                    "type":"Expression"
                }
            }
        }
    }
}

Copy 活动用于数据目标的数据集。Dataset for data destination used by copy activity.

{
    "name":"OneDestinationFile",
    "properties":{
        "linkedServiceName":{
            "referenceName":"AzureStorageLinkedService",
            "type":"LinkedServiceReference"
        },
        "parameters":{
            "Container":{
                "type":"String"
            },
            "Directory":{
                "type":"String"
            },
            "filename":{
                "type":"string"
            }
        },
        "annotations":[

        ],
        "type":"Binary",
        "typeProperties":{
            "location":{
                "type":"AzureBlobStorageLocation",
                "fileName":{
                    "value":"@dataset().filename",
                    "type":"Expression"
                },
                "folderPath":{
                    "value":"@{dataset().Directory}",
                    "type":"Expression"
                },
                "container":{
                    "value":"@{dataset().Container}",
                    "type":"Expression"
                }
            }
        }
    }
}

还可以从此处获取移动文件的模板。You can also get the template to move files from here.

已知限制Known limitation

  • Delete 活动不支持删除通配符描述的文件夹列表。Delete activity does not support deleting list of folders described by wildcard.

  • 在删除活动中使用文件属性筛选器 modifiedDatetimeStart 和 modifiedDatetimeEnd 选择要删除的文件时,请务必在删除活动中设置 "wildcardFileName": "*"。When using file attribute filter in delete activity: modifiedDatetimeStart and modifiedDatetimeEnd to select files to be deleted, make sure to set "wildcardFileName": "*" in delete activity as well.

后续步骤Next steps

详细了解如何在 Azure 数据工厂中移动文件。Learn more about moving files in Azure Data Factory.