Duration: 40 hours / 5 days
10 Lessons
Day 1: Introduction to Databricks and Apache Spark
Day 2: Databricks Architecture and Components
Day 3: Spark RDD Transformations and Actions / Spark Fundamentals / Transformations (Hands on Session)
Day 4: Actions (Hands On Session) / Hands On Session: Basic Word Count Application
Day 5: Spark SQL and Databricks SQL / Data Manipulation and Processing
Day 6: NSE Case Study
Day 7-8: Spark Streaming and Structured Streaming
Day 9: Delta Lake Case Study
Day 10: Memory Optimization or Performance Optimization
Duration: 5 Days
19 Lessons
This bootcamp is designed for data engineers seeking advanced skills in building and managing data pipelines using Databricks. Participants will learn how to leverage Databricks’ Unified Analytics Platform to perform data engineering tasks efficiently, including data ingestion, transformation, orchestration, and optimization.
DAY 1: INTRODUCTION TO DATABRICKS AND SPARK
Module 1: Introduction to Databricks
Module 2: Introduction to Apache Spark
Module 3: Data Ingestion in Databricks
Module 4: Data Exploration and Transformation
DAY 2: ADVANCED DATA ENGINEERING TECHNIQUES
Module 1: Advanced Spark SQL Operations
Module 2: Data Partitioning and Optimization
Module 3: Introduction to Delta Lake
Module 4: Orchestrating Workflows with Databricks Jobs
DAY 3: DATA ENGINEERING BEST PRACTICES
Module 1: Introduction to Structured Streaming
Module 2: Managing Data Pipelines with MLflow
Module 3: Data Engineering Best Practices
Modules 4: Real-World Data Engineering Use Cases
DAY 4: ADVANCED TOPICS IN DATABRICKS
Module 1: Introduction to Machine Learning Pipelines in Databricks
Module 2: Advanced Databricks Features
Module 3: Scaling Data Engineering Workloads
Module 4: Monitoring and Performance Tuning
DAY 5: CAPSTONE PROJECT AND FINAL ASSESSMENT
Module 1: Capstone Project
Module 2: Project Presentations and Evaluation
Module 3: Course Conclusion and Certification