8

Lessons

5 days

Duration

English

Language

OBJECTIVEs:

Course features:

PRE-REQUISITES:

Software Pre-requisites:

Learning Path

  • Introduction to Big Data
  • Hadoop Architecture
  • Mapper and Reducer
  • What is Apache Spark?
  • Spark Jobs and APIs
  • Spark 3.0 architecture
  • Using Anaconda, Notebook
  • Installation and Configuration
  • Python Introduction
  • Python Objects
  • Complex
  • Boolean
  • Python DataStructure
  • list
  • list methods
  • tuple
  • string
  • string methods
  • dictionary
  • Dictionary methods with examples
  • Control Structure
  • Functions
  • glob variale
  • Variable Argument *arg, **kwarg
  • Built in Functions
  • range
  • lambda
  • filter
  • map
  • reduce
  • set
  • zip
  • Conclusion and Summary
  • File Handling
  • Exception Handling
  • List Comprehension
  • Dictionary Comprehension
  • Modules
  • Uer Define Modules
  • Built in Modules
  • os
  • sys
  • system
  • glob
  • Class
  • Methods
  • Inheritance
  • Case Study
  • Iterator
  • Generator
  • Regular Expression ( re )
  • File Handling and Exception Handling
  • Conclusion and Summary
  • Hands on Session
  • Array Manipulation
  • Matrix Manipulation
  • pandas
  • Hands on Session
  • Data Series
  • DataFrame
  • Case Study
  • Data Visualisation
  • Matplotlib
  • Case Study
  • Internal workings of an RDD
  • Creating RDDs
  • Global versus local scope
  • Transformations Functions
  • Actions Functions
  • Hands on Session on RDD and Spark
  • Assignments 1
  • Best Practices 1
  • Project Discussion using Pyspark
  • Conclusion and Summary
  • Python to RDD communications
  • Catalyst Optimizer refresh
  • Speeding up PySpark with DataFrames
  • Creating DataFrames
  • Simple DataFrame queries
  • Interoperating with RDDs
  • Querying with the DataFrame API
  • Hands On Session on Pandas DataFrame and PySpark
  • Assignments 2
  • Checking for duplicates, missing observations, and outliers
  • Assignments 3
  • Conclusion and Summary
  • Group Project Presentation