Python, PySpark, RDD, DataFrame - Corporate Training

8

Lessons

5 days

Duration

English

Language

Share This Class:

OBJECTIVEs:

Course features:

PRE-REQUISITES:

Software Pre-requisites:

Learning Path

Module 1: Introduction to Hadoop, Spark and Python

Introduction to Big Data
Hadoop Architecture
Mapper and Reducer
What is Apache Spark?
Spark Jobs and APIs
Spark 3.0 architecture
Using Anaconda, Notebook
Installation and Configuration
Python Introduction
Python Objects
Complex
Boolean
Python DataStructure
list
list methods
tuple
string
string methods
dictionary
Dictionary methods with examples

Module 2: Dictionary Case Study

Control Structure
Functions
glob variale
Variable Argument *arg, **kwarg
Built in Functions
range
lambda
filter
map
reduce
set
zip
Conclusion and Summary

Module 3: Advance Python

File Handling
Exception Handling
List Comprehension
Dictionary Comprehension
Modules
Uer Define Modules
Built in Modules
os
sys
system
glob

Module 4: Introduction to Object Oriented Programming

Class
Methods
Inheritance
Case Study
Iterator
Generator
Regular Expression ( re )
File Handling and Exception Handling
Conclusion and Summary

Module 5: Python Library for DataScience numpy

Hands on Session
Array Manipulation
Matrix Manipulation
pandas
Hands on Session
Data Series
DataFrame
Case Study
Data Visualisation
Matplotlib
Case Study

Module 6: PySpark/Spark RDD Resilient Distributed Datasets

Internal workings of an RDD
Creating RDDs
Global versus local scope
Transformations Functions
Actions Functions
Hands on Session on RDD and Spark
Assignments 1
Best Practices 1
Project Discussion using Pyspark
Conclusion and Summary

Module 7: DataFrames

Python to RDD communications
Catalyst Optimizer refresh
Speeding up PySpark with DataFrames
Creating DataFrames
Simple DataFrame queries
Interoperating with RDDs
Querying with the DataFrame API
Hands On Session on Pandas DataFrame and PySpark
Assignments 2

Module 8: Prepare Data for Modelling

Checking for duplicates, missing observations, and outliers
Assignments 3
Conclusion and Summary
Group Project Presentation

We can customize your training as well.

CONTACT US TODAY!