Quickstart: Create an Azure Data Factory using Azure CLI
This quickstart describes how to use Azure CLI to create an Azure Data Factory. The pipeline you create in this data factory copies data from one folder to another folder in an Azure Blob Storage. For information on how to transform data using Azure Data Factory, see Transform data in Azure Data Factory.
For an introduction to the Azure Data Factory service, see Introduction to Azure Data Factory.
If you don't have an Azure subscription, create a trial account before you begin.
If you prefer to run CLI reference commands locally, install the Azure CLI. If you're running on Windows or macOS, consider running Azure CLI in a Docker container. For more information, see How to run the Azure CLI in a Docker container.
If you're using a local installation, sign in to the Azure CLI by using the az login command. To finish the authentication process, follow the steps displayed in your terminal. For other sign-in options, see Sign in with the Azure CLI.
When you're prompted, install the Azure CLI extension on first use. For more information about extensions, see Use extensions with the Azure CLI.
Run az version to find the version and dependent libraries that are installed. To upgrade to the latest version, run az upgrade.
Note
To create Data Factory instances, the user account that you use to sign in to Azure must be a member of the contributor or owner role, or an administrator of the Azure subscription. For more information, see Azure roles.
Prepare a container and test file
This quickstart uses an Azure Storage account, which includes a container with a file.
To create a resource group named
ADFQuickStartRG
, use the az group create command:az group create --name ADFQuickStartRG --location chinanorth2
Create a storage account by using the az storage account create command:
az storage account create --resource-group ADFQuickStartRG \ --name adfquickstartstorage --location chinanorth2
Create a container named
adftutorial
by using the az storage container create command:az storage container create --resource-group ADFQuickStartRG --name adftutorial \ --account-name adfquickstartstorage --auth-mode key
In the local directory, create a file named
emp.txt
to upload. If you're working in Azure Cloud Shell, you can find the current working directory by using theecho $PWD
Bash command. You can use standard Bash commands, likecat
, to create a file:cat > emp.txt This is text.
Use Ctrl+D to save your new file.
To upload the new file to your Azure storage container, use the az storage blob upload command:
az storage blob upload --account-name adfquickstartstorage --name input/emp.txt \ --container-name adftutorial --file emp.txt --auth-mode key
This command uploads to a new folder named
input
.
Create a data factory
To create an Azure data factory, run the az datafactory create command:
az datafactory create --resource-group ADFQuickStartRG \
--factory-name ADFTutorialFactory
Important
Replace ADFTutorialFactory
with a globally unique data factory name, for example, ADFTutorialFactorySP1127.
You can see the data factory that you created by using the az datafactory show command:
az datafactory show --resource-group ADFQuickStartRG \
--factory-name ADFTutorialFactory
Create a linked service and datasets
Next, create a linked service and two datasets.
Get the connection string for your storage account by using the az storage account show-connection-string command:
az storage account show-connection-string --resource-group ADFQuickStartRG \ --name adfquickstartstorage --key primary
In your working directory, create a JSON file with this content, which includes your own connection string from the previous step. Name the file
AzureStorageLinkedService.json
:{ "type": "AzureBlobStorage", "typeProperties": { "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey>;EndpointSuffix=core.chinacloudapi.cn" } }
Create a linked service, named
AzureStorageLinkedService
, by using the az datafactory linked-service create command:az datafactory linked-service create --resource-group ADFQuickStartRG \ --factory-name ADFTutorialFactory --linked-service-name AzureStorageLinkedService \ --properties AzureStorageLinkedService.json
In your working directory, create a JSON file with this content, named
InputDataset.json
:{ "linkedServiceName": { "referenceName": "AzureStorageLinkedService", "type": "LinkedServiceReference" }, "annotations": [], "type": "Binary", "typeProperties": { "location": { "type": "AzureBlobStorageLocation", "fileName": "emp.txt", "folderPath": "input", "container": "adftutorial" } } }
Create an input dataset named
InputDataset
by using the az datafactory dataset create command:az datafactory dataset create --resource-group ADFQuickStartRG \ --dataset-name InputDataset --factory-name ADFTutorialFactory \ --properties InputDataset.json
In your working directory, create a JSON file with this content, named
OutputDataset.json
:{ "linkedServiceName": { "referenceName": "AzureStorageLinkedService", "type": "LinkedServiceReference" }, "annotations": [], "type": "Binary", "typeProperties": { "location": { "type": "AzureBlobStorageLocation", "folderPath": "output", "container": "adftutorial" } } }
Create an output dataset named
OutputDataset
by using the az datafactory dataset create command:az datafactory dataset create --resource-group ADFQuickStartRG \ --dataset-name OutputDataset --factory-name ADFTutorialFactory \ --properties OutputDataset.json
Create and run the pipeline
Finally, create and run the pipeline.
In your working directory, create a JSON file with this content named
Adfv2QuickStartPipeline.json
:{ "name": "Adfv2QuickStartPipeline", "properties": { "activities": [ { "name": "CopyFromBlobToBlob", "type": "Copy", "dependsOn": [], "policy": { "timeout": "7.00:00:00", "retry": 0, "retryIntervalInSeconds": 30, "secureOutput": false, "secureInput": false }, "userProperties": [], "typeProperties": { "source": { "type": "BinarySource", "storeSettings": { "type": "AzureBlobStorageReadSettings", "recursive": true } }, "sink": { "type": "BinarySink", "storeSettings": { "type": "AzureBlobStorageWriteSettings" } }, "enableStaging": false }, "inputs": [ { "referenceName": "InputDataset", "type": "DatasetReference" } ], "outputs": [ { "referenceName": "OutputDataset", "type": "DatasetReference" } ] } ], "annotations": [] } }
Create a pipeline named
Adfv2QuickStartPipeline
by using the az datafactory pipeline create command:az datafactory pipeline create --resource-group ADFQuickStartRG \ --factory-name ADFTutorialFactory --name Adfv2QuickStartPipeline \ --pipeline Adfv2QuickStartPipeline.json
Run the pipeline by using the az datafactory pipeline create-run command:
az datafactory pipeline create-run --resource-group ADFQuickStartRG \ --name Adfv2QuickStartPipeline --factory-name ADFTutorialFactory
This command returns a run ID. Copy it for use in the next command.
Verify that the pipeline run succeeded by using the az datafactory pipeline-run show command:
az datafactory pipeline-run show --resource-group ADFQuickStartRG \ --factory-name ADFTutorialFactory --run-id 00000000-0000-0000-0000-000000000000
You can also verify that your pipeline ran as expected by using the Azure portal. For more information, see Review deployed resources.
Clean up resources
All of the resources in this quickstart are part of the same resource group. To remove them all, use the az group delete command:
az group delete --name ADFQuickStartRG
If you're using this resource group for anything else, instead, delete individual resources. For instance, to remove the linked service, use the az datafactory linked-service delete command.
In this quickstart, you created the following JSON files:
- AzureStorageLinkedService.json
- InputDataset.json
- OutputDataset.json
- Adfv2QuickStartPipeline.json
Delete them by using standard Bash commands.