Welcome back to our blog series where we continue to explore Microsoft’s newest data and analytics offering, Fabric! Today, we'll be delving into some of the key aspects of Fabric we introduced in the previous blog, specifically OneLake and Data Engineering. Along the way, we'll address the following questions that were raised in our first post, shedding light on the transformative power of Microsoft Fabric.
At the heart of Microsoft Fabric lies OneLake, which acts as a single data lake for an entire organization, and serves as the sole storage space for Fabric. In reality, OneLake is an abstraction that consists of many different data lakes within Azure that appear as a single unified Lake. The intention of creating this abstraction is to centralize various, organizational data assets within a singular entity to increase discoverability and usability of the data.
Fabric also introduces a concept called Shortcuts, which act as pointers to data. These pointers allow users to reference the original data source without any movement of the data. This eliminates the need for tedious data movement or data duplication, and it equips users with the freshest source data.
Together, the abstraction of OneLake and Shortcuts allow users to have easier access to data in different sectors of the company, which can eliminate data silos and remove the need for data migration efforts that can result in duplicate and outdated copies of data allowing for streamlined data delivery and increased efficacy of an organization’s data.
The accumulation of all of an organization’s data into OneLake brings strong benefits of data reusability and broader data accessibility but it also emphasizes the importance of access control. Fabric allows a user to administer access on different levels of granularity. Access can be provisioned for an entire workspace, for an individual Fabric item such as a Lakehouse, or for an individual data item such as a parquet file. This access is managed using roles which can be assigned to individual users, security groups, Microsoft Entra groups, and distribution lists. This means that even though all of a company’s data is stored together within Microsoft Fabric OneLake, an organization is still able to provision or limit access at the same level of granularity that is available within Azure Data Lake Storage today. For instance, a user can only view data or create shortcuts to data or tables that they have been provisioned access to - meaning that certain data elements can remain hidden from users who should not have access. By leveraging proper access controls, Onelake empowers an organization to more effectively use their centralized data assets without sacrificing security and control.
In Microsoft Fabric, the Data Engineering component offers a comprehensive suite of tools including Lakehouses, Warehouses, Spark Notebooks, Spark Jobs and Pipelines to work with the data stored within OneLake. The Notebooks and Pipelines function very similarly to Notebooks and Pipelines within Azure Synapse or Azure Data Factory, with additional features to enhance development such as co-editing support in notebooks. One current downside of Microsoft Fabric is some of the data connectors that are available in Synapse Pipelines are not yet available in Fabric Pipelines, however, we expect those to be added with time.
Can I leverage the data I already have, if it’s stored somewhere other than OneLake?
Yes! Shortcuts not only allow a user to point to data stored within Fabric, they can also point to data stored in AWS S3 buckets, ADLS Gen 2 Storage accounts, and Microsoft Dataverse. This means that users are not required to move the data into Fabric or create a new copy in Fabric to utilize the offerings present within Fabric on their data.
Can I create a traditional medallion architecture within Fabric?
Fabric's flexibility extends its ability to accommodate traditional medallion architectures. Shortcuts within OneLake allow for highly customizable medallion structures, enabling the creation of gold, silver, and bronze layers within a single Lakehouse or across multiple workspaces.
This flexibility extends to sourcing data as well. Fabric allows for the incorporation of bronze or silver data from sources outside the platform, such as S3 Buckets or ADLS Gen 2 Storage. This means you're not locked into a closed ecosystem; you can seamlessly integrate existing data sources into your Fabric architecture.
How does Fabric change the analytics infrastructure I have today?
Fabric's impact on your analytics infrastructure can be significant. By reducing the need for data movement and simplifying access to data through shortcuts, Fabric streamlines data engineering workflows. This means less time spent on mundane tasks like data migration, and more time focusing on value-add activities.
Moreover, Fabric's utilization of Delta Parquet in both Data Lakes and Data Warehouses ensures greater usability and flexibility in working with data. Whether you're more comfortable with SQL or Pyspark, Fabric accommodates your preferred tools and languages.
Microsoft Fabric's OneLake and Data Engineering capabilities herald a new era of data integration and engineering. By providing a unified data lake solution, customizable medallion architectures, and streamlined workflows, Fabric empowers organizations to harness the full potential of their data assets.
Stay tuned for our next exploration as we continue our journey into the depths of Microsoft Fabric!
If you have any questions or would like to discuss how to leverage Microsoft Fabric's OneLake and Data Engineering capabilities, do not hesitate to contact us.