How to Create a Pipeline and Copy Data in ADF

Azure Data Factory (ADF) is a cloud-based data integration service that enables you to design, schedule, and manage data pipelines efficiently. One of the fundamental tasks is to create a pipeline and Copy Data in ADF, which allows you to transfer data seamlessly between different systems.

This guide provides a step-by-step process to create a pipeline and Copy Data in ADF.

Introduction to Pipeline and Copy Data Activity in ADF

Before learning how to create a pipeline and copy data in ADF, let’s clarify what a pipeline is.

A pipeline in Azure Data Factory is a logical grouping of activities that perform a specific workflow. It orchestrates the movement and transformation of data across various sources and destinations.

Examples of pipeline use cases include:

Copying data from an on-premises SQL Server to Azure Blob Storage.
Moving files between folders in Azure Data Lake Storage.

Each activity in a pipeline has a specific purpose. The Copy Data activity is commonly used for efficient data transfer.

Access the Azure Data Factory Portal

Sign in to the Azure Portal.
Navigate to Data Factories and select your ADF instance.
Click on Author to open the ADF Studio interface.

Step 1: Create a New Pipeline

In the left navigation pane, click the Author (pencil) icon.
Under Factory Resources, select Pipelines and choose New pipeline.
Enter a descriptive name for your pipeline (e.g., CopyDataPipeline).

Step 2: Add a Copy Data Activity

In the Activities pane, search for Copy data.
Drag the Copy Data activity onto the pipeline canvas.
Assign a clear name, such as CopySQLtoBlob.

Step 3: Set Up the Source Dataset

Select the Copy Data activity and open the Source tab.
Click + New to create a new dataset and select your source type (e.g., Azure SQL Database).
Enter connection details (such as server name, database, and authentication) and test the connection.
Specify the source table or provide a query.
If you have an existing dataset, select the one pointing to your input container (e.g., a CSV file) and preview the data to verify correctness.

Step 4: Set Up the Sink Dataset

Switch to the Sink tab of the Copy Data activity.
Click + New to create a destination dataset and select the sink type (e.g., Azure Blob Storage).
Provide the necessary connection information (storage account, container, folder, filename) and choose the file format (CSV, Parquet, JSON, etc.).
If a dataset already exists, select the one pointing to your output container where the copied file will be stored.

Step 5: Validate, Debug, and Publish

Click Validate All to check for configuration errors.
Use Debug to test-run the pipeline.
Once testing is successful, click Publish All to deploy your pipeline.

Step 6: Trigger and Monitor the Pipeline

After publishing, trigger the pipeline to start the data transfer.
Once the run is complete, verify the output file at the destination (e.g., Azure Blob Storage).

Best Practices to Create a Pipeline and Copy Data in ADF

To make the most of how to create a pipeline and copy data in ADF, follow these best practices:

Always use parameterized datasets for reusability.
Implement logging and monitoring for troubleshooting.
Secure sensitive credentials using Azure Key Vault.
Use debug mode before publishing pipelines.
Organize activities with meaningful names for easy maintenance.

Conclusion

Creating and configuring a pipeline with a Copy Data activity in Azure Data Factory involves these key steps:

Create a new pipeline.
Add and configure a Copy Data activity.
Define source and sink datasets.
Validate, debug, and publish your pipeline.
Trigger and monitor the data transfer.

This approach provides a flexible and scalable way to securely move data between different cloud and on-premises systems.

📌 Watch the full video here: https://www.youtube.com/watch?v=O0O_iz2jnlg&t=2s

How to Create a Pipeline and Copy Data Activity in ADF