Create ADLS Gen2 in Azure Data Factory to build a powerful, scalable data platform that supports advanced analytics and enterprise-grade processing. Azure Data Lake Storage Gen2 (ADLS Gen2) is a powerful solution that combines the scalability and cost-effectiveness of Azure Blob Storage with advanced file system capabilities, making it a preferred choice for organizations seeking robust data analytics and processing platforms. By offering a hierarchical namespace, ADLS Gen2 streamlines big data workloads and integrates seamlessly with other Azure services like Data Factory and Synapse Analytics.
This guide will show you how to create ADLS Gen2 in Azure Data Factory using the Azure Portal, preparing your environment for large-scale data analytics and seamless integration.
Step 1: Access the Azure Portal
Begin by navigating to the Azure Portal, which serves as your central hub for managing all Azure resources. After logging in, use the global search bar at the top to search for “Storage Accounts.” Select the Storage Accounts service from the search results to proceed to the management dashboard. This dashboard is where you’ll create and manage all your storage resources.
Step 2: Create a Storage Account
To initiate the creation process, click the Create button within the Storage Accounts section. This launches a multi-tab setup wizard. On the Basics tab, carefully fill in the required details:
- Subscription: Choose the Azure subscription that will be billed for this storage account.
- Resource Group: Select an existing resource group or create a new one to logically group related resources.
- Storage Account Name: Enter a unique name, following Azure’s naming conventions (3–24 lowercase letters and numbers only).
- Region: Select a geographic location close to your users or data sources to minimize latency and maximize performance.
- Performance & Redundancy: The default Standard performance and Locally-redundant storage (LRS) redundancy are sufficient for most scenarios, but you can adjust these based on your organization’s needs (e.g., geo-redundant storage for higher resiliency).
Step 3: Enable Hierarchical Namespace (ADLS Gen2 Features)
After completing the basics, move to the Advanced tab. Here, under the Data Lake Storage Gen2 section, you’ll find the option to enable the hierarchical namespace. Check the box labelled Enable hierarchical namespace. This step is crucial when you create ADLS Gen2 in Azure Data Factory, as it transforms your storage account into an ADLS Gen2 account, unlocking features needed for enterprise-grade data analytics.
Benefits of Hierarchical Namespace:
- Directory-level Access Control Lists (ACLs): Grant fine-grained permissions at the directory or file level, improving security and compliance.
- Enhanced Performance: Enjoy faster analytics and data operations by leveraging directory structures and optimized metadata management.
- Seamless Integration: Benefit from compatibility with Azure Synapse Analytics, Databricks, HDInsight, and other Azure data services.
Continue through the remaining tabs:
- Networking: Configure network access rules according to your organization’s security standards.
- Data Protection: Set up soft delete, versioning, or other data protection features as needed.
- Encryption: Choose your encryption options (default is Microsoft-managed keys).
- Tags: Optionally, add tags to categorize and manage your resource for reporting and automation.
Step 4: Review and Create
Once all settings are configured, click Review + Create at the bottom of the wizard. Azure will validate your choices; if all settings are correct, you’ll see a summary page. Review your configurations one last time, then click Create to deploy your storage account. The deployment process typically takes a few minutes, after which you’ll receive a notification confirming completion.
Step 5: Set Up Containers and Upload Data
With your storage account ready, navigate to it from the Storage Accounts dashboard. In the left-hand menu, find and select Containers. Create a new container, which will serve as the root directory of your data lake. Containers help you organize data for different departments, projects, or workloads. Once your container is created, upload sample data files—such as CSV, JSON, or Parquet formats—to ensure everything is functioning as expected. This step is essential for verifying permissions, data organization, and compatibility with downstream analytics tools.
Step 6: Prepare for Azure Data Factory Integration
With a properly configured ADLS Gen2 account, you can now integrate it with Azure Data Factory (ADF). Within ADF, create a Linked Service that connects securely to your ADLS Gen2 storage account. This enables you to build data pipelines for extracting, transforming, and loading (ETL) data. By leveraging ADF’s rich set of connectors, you can orchestrate data flows between ADLS Gen2 and other data sources or sinks, automate data movement, and support advanced analytics scenarios.
Conclusion: Why You Should Create ADLS Gen2 in Azure Data Factory
Establishing an ADLS Gen2 account in Azure is a fundamental step toward building a modern, scalable data architecture. By enabling hierarchical namespaces, configuring containers, and integrating with Azure Data Factory, you lay the groundwork for powerful analytics, ETL pipelines, and enterprise data management. Organizations that create ADLS Gen2 in Azure Data Factory gain a streamlined, cost-effective, and future-ready solution for big data workloads and analytics.
Watch the full video here: