APACHE HADOOP & SPARK HADOOP

DESCRIPTION

Course Content

APACHE HADOOP & SPARK HADOOP TRAINING
Introduction to HADOOP and Big Data Ecosystem
 Distributed computing, cloud computing
 BigData Basics and Need for Parallel Processing
 How Hadoop works?
 Introduction to HDFS and Map Reduce
Hadoop Architecture Details
 Name Node Master details
 Data Node and storage
 Secondary Name Node, FSImage and Edit file
 Job Tracker and Task Tracker
 SafeMode Details and Configuration
HDFS (Hadoop – Distributed File System)
 Hadoop Distributed file system, Background, GFS
 Data Replication – Static and Dynamic configuration
 Data Storage – Block Size details
 Additional HDFS commands
 HDFS API for Automation ( Real time project need)
MapReduce Programming
 MapReduce, Background
 Writing MapReduce Programs
 Input Format, Output Format
 JobConf and JobClient API
 Number of Mappers and Reducers
 Pre-built Mappers and Reducers
Hadoop Streaming
 Introduction to Hadoop Streaming
 Streaming API details and use cases
 Streaming Lab
Apache Sqoop
 Introduction and Basics
 Sqoop Installation with Oracle DB/MySQL
 Sqoop Export and Import features
Apache Hive
 Hive Installation.
 Hive Shell Description
 Meta store Details
 Hive QL Basics
 Working with Tables, Databases etc.
 Hive JDBC programming
 Hands on Exercises and Assignments
Introduction to Spark
 What is Spark?
 Review: From Hadoop MapReduce to Spark
 Introduction: HDFS
 Introduction: YARN , MESOS
 Spark Architecture Details
 Spark Modules
Spark and Scala Installation
 Apache Spark Installation (version 2.x)
 Scala Installation and Configuration
 Using the Spark Shell – Scala
 Using PySpark shell
 Spark Labs and Exercises
Resilient Distributed Datasets (RDD)
 Working with RDDs in Spark
 Creating RDDs
 Accumulators and Broadcast variables
 RDD – Transformations
 RDD – Actions
 RDD Labs
Spark SQL and DataFrames
 Spark SQL and the SQL Context
 Creating DataFrames
 Transforming and Querying DataFrames
 DataFrames and RDDs
 Comparing Spark SQL, Impala and Hive-on-Spark
Spark Mlib ( Machine Learning)
 Basic Principles of Machine Learning
 Spark ML Setup
 Transaformation, Corelation Algorithm.
 Example: K-means .

Study Materials and Labs
1) Complete Virtual Machine is shared with students. It has Java , Oracle DB , Mozilla
Firefox and other components pre-installed
2) The VM can be used even after the training is DONE. Please note it’s NOT a remote
lab type environment. You will be able to keep the VM and all labs even after the
training is completed
3) Certification Dump for Hadoop will be provided.
4) Interview Questions on Spark , Hadoop , Map Reduce and other Eco system
components.
5) 30 Hour Duration course. All materials are shared via G Drive.

Register this course

Paypal: rajani.kotapati9@gmail.com

riamoneytransfer.com

remitly.com

Duration: 6 Weeks

Type: Online Training

Languages: English

Video : English

APACHE HADOOP & SPARK HADOOP

About

Online Trainnings9 is based at India and serves to global customers for IT Training and Consulting Services. Online Trainnings9having the trainer pool to deliver training programs with cutting edge techniques and methodology that provides people with right skills and knowledge to help them achieve their career goals.

Hot Links

About Us

Courses

Instructers

Payment

Contact Us

Contact

phone

IND NO.+91 90 3629 8699

USA NO.+1 7576099623

email
info@onlinetrainings9.com
Skype id
onlinetrainings9

Payment

Thanks for your interest to register for a course with us, we ensure you to get the best service for the same. You can pay your course fee with the following options,

Paypal rajani.kotapati9@gmail.com

riamoneytransfer.com

www.remitly.com

Copyright © 2016 Onlinetrainings | All Right Reserved | Privacy Policy | Terms of Use

Designed & Developed By