In this course we will introduce you to distributed data processing, how to use MapReduce to process large amounts of data. This course is focused on providing practical hands-on exercises. Students will learn to write MapReduce programs. Advanced Features of MapReduce will be covered as well. Course Schedule This is a weekend course that will be held July 20 – August 11, 2019 US Pacific Time The class sessions will be held-Saturday, Sunday every weekend 9:00-11:00 AM US Pacific time, each day. Please check your local date and time for first session. Prerequisite Desired but not required – Exposure to, Working proficiency of Java, sql. Course Features 4 weeks, 8 sessions, 16 hours of total LIVE Instruction Training material, instructor handouts and access to useful resources on the cloud provided Practical Hands on Lab exercises on cloud workstations provided Actual code and scripts provided Real-life Scenarios Course Outline 1. Introduction to MapReduce MapReduce Overview MapReduce in Hadoop History of MapReduce MapReduce applications Data Flow in MapReduce Map and Reduce operations Job submission flow of MapReduce Map Operation Job Initialization Task Assignment Job Completion Job Scheduling Job Failures Shuffle and sort Word Count Problem, Flow and Solution MapReduce Algorithms 2. Map Reduce Types and Formats Data Types File Formats Input Formats Output Formats Explain the Driver, Mapper and Reducer code Configuring development environment – Eclipse Writing Unit Test Running locally Running on Cluster 3. Understanding MapReduce Data Flow in MapReduce MapReduce example MapReduce Daemons Job tracker Task Tracker Other phases in MapReduce Data Flow in single, multiple and no reduce task 4. MapReduce with YARN Hadoop Architecture Problem with Hadoop 1.x, Hadoop 2.x features, YARN MapReduce Application Execution Flow YARN Workflow Anatomy of MapReduce Program 5. Advanced MapReduce Counters Sorting Input Splits in MapReduce MapReduce Combiner MapReduce Partitioner MapReduce Distributed Cache MRunit Reduce Join Joins – Map Side and Reduce Side Custom Input Format Sequence Input Format Side Data Distribution Refund Policy 100% refund can be applied if request is initiated 24 hours before the 1st course session. If a class is rescheduled/cancelled by the organizer, registered students will be offered a credit towards any future course or a 100% refund.