SapHanaTutorial.Com HOME     Learning-Materials Interview-Q&A Certifications Quiz Online-Courses Forum Jobs Trendz FAQs  
     Explore The World of Hana With Us     
About Us
Contact Us
Hadoop App
Tutorial App on SAP HANA
This app is an All-In-One package to provide everything to HANA Lovers.

It contains
1. Courses on SAP HANA - Basics, Modeling and Administration
2. Multiple Quizzes on Overview, Modelling, Architeture, and Administration
3. Most popular articles on SAP HANA
4. Series of Interview questions to brushup your HANA skills
Tutorial App on Hadoop
This app is an All-In-One package to provide everything to Hadoop Lovers.

It contains
1. Courses on Hadoop - Basics and Advanced
2. Multiple Quizzes on Basics, MapReduce and HDFS
3. Most popular articles on Hadoop
4. Series of Interview questions to brushup your skills
Hadoop App
Stay Connected
Search Topics
Topic Index
Hadoop Overview
Hadoop Examples
MapReduce - The Heart of Hadoop

In this article, we will learn:
    • What is MapReduce
    • Few interesting facts about MapReduce
    • MapReduce component and architecture
    • How MapReduce works in Hadoop

MapReduce is a programming model which is used to process large data sets in a batch processing manner.
A MapReduce program is composed of
    • a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name)
    • and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies).

Few Important Facts about MapReduce:
    • Apache Hadoop Map-Reduce is an open source implementation of Google's Map Reduce Framework.
    • Although there are so many map-reduce implementation like Dryad from Microsoft, Dicso from Nokia which have been developed for distributed systems but Hadoop being the most popular among them offering open source implementation of Map-reduce framework.
    • Hadoop Map-Reduce framework works on Master/Slave architecture.

MapReduce Architecture:

Hadoop MapReduce

Hadoop 1.x MapReduce is composed of two components.
    1. Job tracker playing the role of master and runs on MasterNode (Namenode)
    2. Task tracker playing the role of slave per data node and runs on Datanodes
Job Tracker:

Hadoop MapReduce
    1. Job Tracker is the one to which client application submit mapreduce programs(jobs).
    2. Job Tracker schedule clients jobs and allocates task to the slave task trackers that are running on individual worker machines(date nodes).
    3. Job tracker manage overall execution of Map-Reduce job.
    4. Job tracker manages the resources of the cluster like:
        • Manage the data nodes i.e. task tracker.
        • To keep track of the consumed and available resource.
        • To keep track of already running task, to provide fault-tolerance for task etc.

Task Tracker:

    1. Each Task Tracker is responsible to execute and manage the individual tasks assigned by Job Tracker.
    2. Task Tracker also handles the data motion between the map and reduce phases.
    3. One Prime responsibility of Task Tracker is to constantly communicate with the Job Tracker the status of the Task.
    4. If the JobTracker fails to receive a heartbeat from a TaskTracker within a specified amount of time, it will assume the TaskTracker has crashed and will resubmit the corresponding tasks to other nodes in the cluster.

How MapReduce Engine Works:

The Let us understand how exactly map reduce program gets executed in Hadoop. What is the relationship between different entities involved in this whole process.

The entire process can be listed as follows:
    1. Client applications submit jobs to the JobTracker.
    2. The JobTracker talks to the NameNode to determine the location of the data
    3. The JobTracker locates TaskTracker nodes with available slots at or near the data
    4. The JobTracker submits the work to the chosen TaskTracker nodes.
    5. The TaskTracker nodes are monitored. If they do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different TaskTracker.
    6. A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may may even blacklist the TaskTracker as unreliable.
    7. When the work is completed, the JobTracker updates its status.
    8. Client applications can poll the JobTracker for information.
Let us see these steps in more details.
    1. Client submits MapReduce job to Job Tracker:
      Whenever client/user submit map-reduce jobs, it goes straightaway to Job tracker. Client program contains all information like the map, combine and reduce function, input and output path of the data.

      Hadoop MapReduce

    2. Job Tracker Manage and Control Job:
        • The JobTracker puts the job in a queue of pending jobs and then executes them on a FCFS(first come first serve) basis.
        • The Job Tracker first determine the number of split from the input path and assign different map and reduce tasks to each TaskTracker in the cluster. There will be one map task for each split.
        • Job tracker talks to the NameNode to determine the location of the data i.e. to determine the datanode which contains the data.

      Hadoop MapReduce

    3. Task Assignment to Task Tracker by Job Tracker:
        • The task tracker is pre-configured with a number of slots which indicates that how many task(in number) Task Tracker can accept. For example, a TaskTracker may be able to run two map tasks and two reduce tasks simultaneously.
        • When the job tracker tries to schedule a task, it looks for an empty slot in the TaskTracker running on the same server which hosts the datanode where the data for that task resides. If not found, it looks for the machine in the same rack. There is no consideration of system load during this allocation.

      Hadoop MapReduce

    4. Task Execution by Task Tracker:
        • Now when the Task is assigned to Task Tracker, Task tracker creates local environment to run the Task.
        • Task Tracker need the resources to run the job. Hence it copies any files needed from the distributed cache by the application to the local disk, localize all the job Jars by copying it from shared File system to Task Tracker's file system.
        • Task Tracker can also spawn multiple JVMs to handle many map or reduce tasks in parallel.
        • TaskTracker actually initiates the Map or Reduce tasks and reports progress back to the JobTracker.

      Hadoop MapReduce

    5. Send notification to Job Tracker:
        • When all the map tasks are done by different task tracker they will notify the Job Tracker. Job Tracker then ask the selected Task Trackers to do the Reduce Phase

      Hadoop MapReduce

    6. Task recovery in failover situation:
        • Although there is single TaskTracker on each node, Task Tracker spawns off a separate Java Virtual Machine process to prevent the TaskTracker itself from failing if the running job(process) crashes the JVM due to some bugs defined in user written map reduce function
    7. Monitor Task Tracker :
        • The TaskTracker nodes are monitored. A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status.
        • If Task Tracker do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different TaskTracker.
        • A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may even blacklist the TaskTracker as unreliable.
    8. Job Completion:
        • When the work is completed, the JobTracker updates its status.
        • Client applications can poll the JobTracker for information.

Continue reading:
Top 10 Myths about Hadoop
Hadoop Basic Course

Support us by sharing this article.

Explore More
Close X
Close X

10 thoughts on “All You Want to Know about MapReduce (The Heart of Hadoop)

  1. HaHu says:

    Amazing Site for SAP HANA!

    • Admin says:

      Hi HaHu,
      Thanks for the feedback.

    • Kevin says:

      Hi There,

      That’s really cool…. I followed these instructions and it was like boom… it worked well..

      Using the SAP Web IDE for SAP HANA in an on-premise S/4 HANA system running HANA 2.0, how do I create a CDS view in my HDI container on top of a standard S/4 table, such as T001? My admin has already granted my ID access to the S/4 schema, so I can see the S/4 tables in the Database Explorer tool, but I am not sure what files I will need in my database module to reference S/4 tables.
      SAP HANA is a modern, in-memory database and a platform that is deployable on-premise, or in the cloud, and column-oriented, it is a combination of HANA Database, Data Modeling, HANA Administration and Data Provisioning in one single suite SAP HANA Training .

      Anyways great write up, your efforts are much appreciated.

      Best Regards,

  2. Dablu says:

    Very Clear site, perfect for beginners..:)

  3. Rasha says:

    very descriptive and detailed information given. Thanks.

  4. Indrasinh says:

    Very Clear Explanation.

    How exactly MapReduce Works Awesome!!


  5. Raj Srivastava says:

    Excellent… Very crispy and descriptive information and nowhere else is given in such a digestive manner.

  6. Soumik says:


  7. kevin says:

    Hello Buddie,
    THANKS SO MUCH for sharing this! I would love to buy you a coffee since I now won’t be up all night that has been driving me crazy (until now!!). I just wish I knew what was going wrong but so glad it’s in the right place now! Thanks again:)

    I have created one ADSO through SAP BWMT 1.18 version on SAP HANA edition. 1. spe11 with SAP HANA Studio version been 2.3.32 and then created calculation view on top of the hana view table for my ADSO.
    The view is active without any error, Checked SYS_BIC column views. but when I do Data preview I get error “View not found. Either the view is inactive or the selected attribute is not present in the active view.”
    I went through few discussion and it was noted that latest version of the HANA studio solved the issue but they did not precise what was their version and Environment details.
    Could you please help me out on what could be the reason for such error?
    Below is the link i referred before.

    By the way do you have any YouTube videos, would love to watch it. I would like to connect you on LinkedIn, great to have experts like you in my connection (In case, if you don’t have any issues).
    Please keep providing such valuable information.
    Thank you,
    Rahul jain

  8. Varun says:

    Hi There,

    Thank you SO MUCH! I was actually holding my breath as I followed these directions. It worked beautifully!

    We use BW datasources to extract data from ECC system.
    But BW system will be stopped and using HANA system instead.
    Instead of extracting data from ECC with SLT and remodeling in HANA with complex business logic again , is there other way in HANA to extract data from ECC like BW datasources without BW system ?

    Excellent tutorials – very easy to understand with all the details. I hope you will continue to provide more such tutorials.

    Muchas Gracias,

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

 © 2017 :, All rights reserved.  Privacy Policy