Create pipelines and activities
In this we will learn about creating pipelines and activities. However, we will create a pipeline with a copy activity that uses the input and output datasets. And, the copy activity copies data from the file you specified in the input dataset settings to the file you specified in the output dataset settings.
- Firstly, create a JSON file named Adfv2QuickStartPipeline.json in the C:\ADFv2QuickStartPSH folder with the following content:
- Secondly, to create the pipeline: Adfv2QuickStartPipeline, Run the Set-AzDataFactoryV2Pipeline cmdlet.
PowerShell
$DFPipeLine = Set-AzDataFactoryV2Pipeline -DataFactoryName $DataFactory.DataFactoryName
-ResourceGroupName $ResGrp.ResourceGroupName -Name "Adfv2QuickStartPipeline"
-DefinitionFile “.\Adfv2QuickStartPipeline.json”
Pipelines and activities in Azure Data Factory
A data factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. However, the pipeline allows you to manage the activities as a set instead of each one individually. You deploy and schedule the pipeline instead of the activities independently.
Data Factory has three groupings of activities: data movement activities, data transformation activities, and control activities. An activity can take zero or more input datasets and produce one or more output datasets. The following diagram shows the relationship between pipeline, activity, and dataset in Data Factory:
Here, an input dataset represents the input for an activity in the pipeline, and an output dataset represents the output for the activity. Datasets identify data within different data stores, such as tables, files, folders, and documents. After you create a dataset, you can use it with activities in a pipeline.
Data transformation activities
Azure Data Factory supports the following transformation activities that can be added to pipelines either individually or chained with another activity.
Activity policy
Policies affect the run-time behavior of an activity, giving configurability options. However, Activity Policies are only available for execution activities.
Activity policy JSON definition
Control activity
Control activities have the following top-level structure:
Multiple activities in a pipeline
You should know that you can have more than one activity in a pipeline. And, if you have multiple activities in a pipeline and subsequent activities are not dependent on previous activities, then, the activities may run in parallel. Also, you can chain two activities by using activity dependency, which defines how subsequent activities depend on previous activities.
Scheduling pipelines
Pipelines are scheduled by triggers. And, there are different types of triggers (Scheduler trigger, which allows pipelines to be triggered on a wall-clock schedule, as well as the manual trigger, which triggers pipelines on-demand).
However, to have your trigger kick off a pipeline run, you must include a pipeline reference of the particular pipeline in the trigger definition. Pipelines & triggers have an n-m relationship. Multiple triggers can kick off a single pipeline, and the same trigger can kick off multiple pipelines. Once the trigger is defined, you must start the trigger to have it start triggering the pipeline.
For example, say you have a Scheduler trigger, “Trigger A,” that I wish to kick off my pipeline, “MyCopyPipeline.” You define the trigger, as shown in the following example:
Trigger A definition
Reference: Microsoft Documentation, Documentation 2