Are you a data architect, data engineer, or decision maker eager to understand the Data Lakehouse concept that’s taking the data world by storm? This session is tailored just for you.
Discover the power of decoupling storage and compute, where storage is cheap, compute is expensive, and you can easily scale compute for specific tasks. Learn about OneLake and Delta Lake and why mastering PySpark makes sense for tasks like data cleaning, handling semi-structured data, and integrating with API-based sources like event streams.
Data pipelines are essential for moving and transforming data between different systems. However, managing a large number of data pipelines can be challenging and time-consuming. How can you ensure that your data pipelines are efficient, reliable, and consistent?
In this session, you will learn how to use a metadata-driven approach to manage your data pipelines and Notebook in Microsoft Fabric. Metadata is data about data, such as source, destination, schema and format.
We will show you how to implement a Data Ingestion and Processing framework based on the Medallion Lakehouse architecture. We will also share the key learnings, best practices, and patterns that we have discovered from applying this framework in our own work.