6

Lessons

3 days

Duration

English

Language

OBJECTIVEs:

Course features:

PRE-REQUISITES:

Software Pre-requisites:

Target Audience:

Learning Path

  • What is Apache Spark
  • Spark Jobs and APIs
  • Spark Architecture
  • Installation and Configuration
  • Internal workings of an RDD
  • Creating RDDs
  • Global versus local scope
  • Transformations
  • Actions
  • Hands on Session on RDD and Spark
  • Assignments 1
  • Best Practices 1
  • Python to RDD communications
  • Catalyst Optimizer refresh
  • Speeding up PySpark with DataFrames
  • Creating DataFrames
  • Simple DataFrame queries
  • Interoperating with RDDs
  • Querying with the DataFrame API
  • Hands On Session on Pandas DataFrame and PySpark
  • Assignments 2
  • Checking for duplicates, missing observations, and outliers
  • Getting familiar with your data Visualization
  • Hands on Session Data Modeling
  • Assignments 3
  • Overview of the package
  • Loading and transforming the data
  • Getting to know your data
  • Creating the final dataset
  • Predicting infant survival
  • Hands on Session using PySpark MLib
  • Assignments 4
  • Overview of the package
  • Predicting the chances of infant survival with ML
  • Parameter hyper-tuning
  • Other features of PySpark ML in action
  • Implementation of ML Algorithm
  • Random Forest
  • Regression
  • K-means
  • Conclusion and Summary