DevOps and Beyond, Workshops

VSF02 Workshop: Modern Data DevOps in the Azure Cloud

11/18/2022

8:00am - 5:00pm

Level: Intermediate

Ed Freeman

Software Engineer I

endjin

Ian Griffiths

Technical Fellow

endjin

Modern data platforms deserve the love and attention that web applications get when it comes to modern DevOps approaches. Data platforms often lack the same rigor and formalization of DevOps processes, such as little to non-existent test coverage, inconsistent and sporadic development and collaboration techniques, and a less efficient or formalized feedback loop when addressing end-user issues or ensuring quality throughout all areas of the solution. And with the transition to the cloud, new practices are necessarily having to be adopted to keep delivering high-quality solutions. Applying these practices to modern data platforms is sometimes a struggle, so this session aims to demonstrate tried and tested strategies which will set you up for success with your next cloud data project.

We'll start the workshop by talking about modern cloud data services (and the myriad of options Azure offers in this space), introducing DevOps principles, and discussing the related considerations organizations need to make when embarking on a data project. Namely, we'll discuss the importance of developing with the shared responsibility mindset and working on value streams as a cross-functional team, and how many technical requirements can be layered on top of this core concept (such as testing/automated deployments/security frameworks).

We'll then look at how "traditional" data architectures look in the cloud, and the implications such architectures have on DevOps. Maybe your on-prem SQL Server is turning into an Azure SQL DB, or your Parallel Data Warehouse (PDW) is turning into a Synapse SQL Dedicated Pool. We'll demonstrate some candidate "traditional" data architectures implemented in the cloud, and discuss what changes on the DevOps front, such as how schema changes can be enabled in a non-disruptive way and how Infrastructure as Code (IaC) changes the way solution components are deployed and managed.

In the third session we'll deep dive into modern data architectures, primarily focusing on Azure Synapse Analytics to drive the demos. We'll show the power of lake databases (AKA Lakehouses) as an alternative to traditional relational data warehouses. We'll show off services like SQL Serverless, Spark and Delta Lake alongside other advanced features, and their ability to form a logical data warehouse over data stored in Azure Data Lake Gen2. We’ll look at how Synapse Studio enables cross-functional collaboration all under the same roof, including showing advanced development techniques, including how to use Docker and VS Code dev containers when developing Spark-based Synapse solutions.

The final session will focus on administration, governance, and other operational considerations for building efficient and high-quality modern data platforms. We’ll discuss the importance of practices like monitoring and alerting, SRE and data governance and the various Azure products that can help with these. We’ll also discuss security considerations across all elements of modern data platforms, including permissions and access management and networking configurations. Finally, and everyone’s favorite, we’ll discuss typical costs for a modern data platform and what can be done to manage and optimize these.

You will learn:

  • How to embed and apply DevOps principles to a modern data platform
  • Highlight differences between traditional vs modern data architectures in the cloud
  • How to administer and govern modern data platforms to ensure efficient and quality solutions