Azure Data Factory (ADF) is Microsoft’s cloud-based data integration service that lets you build, orchestrate, and manage scalable data pipelines across multiple sources and destinations. One of its most powerful features for data engineers and architects is the ability to create Dynamic Dataset in ADF—a dataset with parameterized file paths and names instead of hardcoded values. By learning how to create Dynamic Dataset in ADF, you can design pipelines that are reusable, maintainable, and highly adaptable to evolving business requirements.
Why Are Dynamic Datasets Important in ADF?
When building data pipelines, hardcoding file paths, filenames, or even folder structures can quickly become a maintenance nightmare. Imagine you need to ingest files from multiple directories, or your filenames change daily based on a timestamp—a static dataset would require you to create new datasets for each scenario or frequently update the existing ones. This approach is not only inefficient, but it also increases risk and overhead.
Dynamic datasets solve these challenges by using parameters. With parameters, you can pass in values for directory paths, file names, or other dataset properties at runtime.
Reusability
A single dataset can be used across multiple pipelines or activities by simply supplying different parameter values.
Maintainability
Changes in folder structure or file naming conventions do not require you to modify datasets repeatedly.
Automation
You can easily incorporate pipeline variables, system variables (like dates), or outputs from other activities to dynamically select which files to process each run.
Scalability
You can process hundreds or thousands of files or folders without creating a dataset for each one.
Step-by-Step Guide to Create Dynamic Dataset in ADF
Step 1: Build a Dynamic Input Dataset
- In the ADF Studio, navigate to Author → Datasets → New dataset.
- Choose your data store, such as Azure Blob Storage, and select the file format (for example, Delimited Text for CSV files).
- On the Parameters tab, add parameters such as
folderPath
andfileName
. These will allow you to define the location and name of the file dynamically at runtime. - In the file path configuration section:
- For Directory (or Folder Path), enter:
@dataset().folderPath
- For File Name, enter:
@dataset().fileName
- For Directory (or Folder Path), enter:
By referencing parameters in these fields, the dataset will retrieve files based on the values passed in during pipeline execution, rather than relying on static paths.
Step 2: Build a Dynamic Output Dataset
The process for creating a dynamic output dataset is almost identical:
- Go to Author → Datasets → New dataset.
- Choose the same or another data store as your output target (e.g., Azure Blob Storage), and select the appropriate file format.
- Under the Parameters tab, add parameters for
folderPath
andfileName
. - Configure the directory and file name fields to use these parameters, just as in the input dataset:
- Directory:
@dataset().folderPath
- File Name:
@dataset().fileName
- Directory:
This setup allows your pipeline to write output files to dynamic locations, controlled by parameter values supplied at runtime.
Practical Example
Suppose your organization receives daily sales data files, and the data is stored in folders according to the date. For example:
/sales/2025/08/28/sales_data.csv
/sales/2025/08/29/sales_data.csv
With a dynamic input dataset, you can pass the folder path (/sales/2025/08/28/
) and the file name (sales_data.csv
) as parameters. Using pipeline variables or system variables like @utcnow(‘yyyy/MM/dd’)
, you can automate the selection of the correct file each day, removing the need for manual intervention or dataset updates.
Similarly, you can direct processed output data to corresponding date-based folders simply by passing the appropriate folderPath
and fileName
parameters.
How to Use Dynamic Datasets in Pipelines
Once you’ve defined your parameterized datasets, you can link them to pipeline activities (such as Copy Data or Data Flow activities) and supply the parameter values either from pipeline parameters, system variables, or activity outputs. For example, in a Copy Data activity, you can set the dataset parameters to expressions like:
- For input:
folderPath = @concat(‘/sales/’, pipeline().parameters.processDate, ‘/’)
- For input:
fileName = ‘sales_data.csv’
This approach keeps your pipeline logic clean and centralized, and any changes to directory structures or filenames only require updates to parameter values rather than underlying datasets.
Key Benefits and Best Practices
Centralize Your Dataset Definitions
By using dynamic datasets, you reduce duplication and ensure consistency across your pipelines.
Leverage System Variables
Take advantage of built-in variables (such as the current date) for fully automated, time-based file selection.
Document Your Parameters
Provide clear descriptions for each parameter so pipeline developers know how to use them effectively.
Test Thoroughly
Validate your parameter values at runtime to catch errors early, especially when constructing dynamic paths or filenames.
Conclusion
Learning how to create Dynamic Dataset in ADF is a best practice for building robust and scalable data pipelines. By parameterizing file paths and names, you minimize manual effort, enhance maintainability, and unlock true automation in your data integration workflows. Whether you are handling daily file loads, managing multi-environment deployments, or orchestrating complex folder structures, the ability to create Dynamic Dataset in ADF is essential for modern, enterprise-scale data engineering.
Watch the full video here: