SapHanaTutorial.Com HOME     Learning-Materials Interview-Q&A Certifications Quiz Online-Courses Forum Jobs Trendz FAQs  
     Explore The World of Hana With Us     
About Us
Contact Us
 Apps
X
HANA App
>>>
Hadoop App
>>>
Tutorial App on SAP HANA
This app is an All-In-One package to provide everything to HANA Lovers.

It contains
1. Courses on SAP HANA - Basics, Modeling and Administration
2. Multiple Quizzes on Overview, Modelling, Architeture, and Administration
3. Most popular articles on SAP HANA
4. Series of Interview questions to brushup your HANA skills
Tutorial App on Hadoop
This app is an All-In-One package to provide everything to Hadoop Lovers.

It contains
1. Courses on Hadoop - Basics and Advanced
2. Multiple Quizzes on Basics, MapReduce and HDFS
3. Most popular articles on Hadoop
4. Series of Interview questions to brushup your skills
Apps
HANA App
Hadoop App
';
Search
Stay Connected
Search Topics
Topic Index
+
-
Hadoop Overview
+
-
MapReduce
+
-
YARN
+
-
Miscellaneous
How YARN Overcomes MapReduce Limitations in Hadoop 2.0


In this article we will talk about two new names YARN and MR2 introduced in Hadoop 2.0
    • What is YARN?
    • Why there was a need of YARN (Yet Another Resource Negotiator), a new framework in Hadoop 2.0?
    • What are the benefits of YARN framework over earlier MapReduce framework of Hadoop 1.0?
    • What is the difference between MR1 in Hadoop 1.0 and MR2 in Hadoop2.0?

Prerequisite:
You can understand this article in a better manner if you have basic knowledge of Hadoop and MapReduce. If you are not aware of Hadoop and MapReduce, below article may help you.
Introduction of new YARN layer in Hadoop 2.0:
YARN (Yet Another Resource Negotiator) is a new component added in Hadoop 2.0
Let’s have a look on how Hadoop architecture has changed from Hadoop 1.0 to Hadoop 2.0

YARN in Hadoop

As shown, in Hadoop 2.0 a new layer has been introduced between HDFS and MapReduce.
This is YARN framework which is responsible for doing Cluster Resource Management.

Cluster Resource Management:
Cluster resource management means managing the resources of the Hadoop Clusters. And by resources we mean Memory, CPU etc.

YARN took over this task of cluster management from MapReduce and MapReduce is streamlined to perform Data Processing only in which it is best.

YARN in Hadoop

Why YARN was needed?
Before we understand the need of YARN, we should understand how cluster resource management was done in Hadoop 1.0 and what the problem in that approach was.

Cluster Resource Management in Hadoop 1.0:
In Hadoop 1.0, there is tight coupling between Cluster Resource Management and MapReduce programming model.
Job Tracker, which does resource management, is part of, MapReduce Framework.

YARN in Hadoop

In MapReduce framework, MapReduce job (MapReduce application) is divided between number of tasks called mappers and reducers. Each task runs on one of the machine (DataNode) of the cluster, and each machine has a limited number of predefined slots (map slot, reduce slot) for running tasks concurrently.

Here, JobTracker is responsible for both managing the cluster's resources and driving the execution of the MapReduce job. It reserves and schedules slots for all tasks, configures, runs and monitors each task, and if a task fails, it allocates a new slot and reattempts the task. After a task finishes, the job tracker cleans up temporary resources and releases the task's slot to make it available for other jobs.

Problems with this approach in Hadoop 1.0:
    1. It limits scalability: JobTracker runs on single machine doing several task like
        • Resource management
        • Job and task scheduling and
        • Monitoring
      Although there are so many machines (DataNode) available; they are not getting used. This limits scalability.
    2. Availability Issue: In Hadoop 1.0, JobTracker is single Point of availability. This means if JobTracker fails, all jobs must restart.
    3. Problem with Resource Utilization: In Hadoop 1.0, there is concept of predefined number of map slots and reduce slots for each TaskTrackers. Resource Utilization issues occur because maps slots might be ‘full’ while reduce slots is empty (and vice-versa). Here the compute resources (DataNode) could sit idle which are reserved for Reduce slots even when there is immediate need for those resources to be used as Mapper slots.
    4. Limitation in running non-MapReduce Application: In Hadoop 1.0, Job tracker was tightly integrated with MapReduce and only supporting application that obeys MapReduce programming framework can run on Hadoop.
      Let’s try to understand point 4 in more detail.

      Hadoop distributed file system (HDFS) makes it cheap to store large amounts of data, and its scalable MapReduce analysis engine makes it possible to extract insights from that data. MapReduce works on batch-driven data analysis, where the input data is partitioned into smaller batches that can be processed in parallel across many machines in the Hadoop cluster. But MapReduce, while powerful enough to express many data analysis algorithms, is not always the optimal choice of programming paradigm. It‘s often desirable to run other computation paradigms in the Hadoop cluster – here are some examples.
        • Problem in performing real-time analysis: MapReduce is batch driven. What if I want to do perform real time analysis instead of batch-processing (where results is available after several hours).

          There are many applications which need results in real time like fraud detection algorithm. There are real time engines like Apache Storm which can work better in this case. But in Hadoop 1.0, due to tight coupling these engines cannot run independently.
        • Problem in running Message-Passing approach: It is a stateful process that runs on each node of a distributed network. The processes communicate with each other by sending messages, and alter their state based on the messages they receive. This is not possible in MapReduce.
        • Problem in running Ad-hoc query: Many users like to query their big data using SQL. Apache Hive can execute a SQL query as a series of MapReduce jobs, but it has shortcomings in terms of performance.
          Recently, some new approaches such as Apache Tajo , Facebook's Presto and Cloudera's Impala drastically improve the performance, but they require to run services in other form than MapReduce form.
          It is not possible to run all such non Map Reduce jobs on Hadoop Cluster. Such jobs have to "disguise" themselves as mappers and reducers in order to be able to run on Hadoop 1.0.

Hadoop 2.0 solves all these problem with YARN:

YARN in Hadoop

YARN took over the task of cluster management from MapReduce and MapReduce is streamlined to perform Data Processing only in which it is best.

YARN has central resource manager component which manages resources and allocates the resources to the application. Multiple applications can run on Hadoop via YARN and all application could share common resource management.

Advantage of YARN:
    1. Yarn does efficient utilization of the resource.
      There are no more fixed map-reduce slots. YARN provides central resource manager. With YARN, you can now run multiple applications in Hadoop, all sharing a common resource.
    2. Yarn can even run application that do not follow MapReduce model.
      YARN decouples MapReduce's resource management and scheduling capabilities from the data processing component, enabling Hadoop to support more varied processing approaches and a broader array of applications. For example, Hadoop clusters can now run interactive querying and streaming data applications simultaneously with MapReduce batch jobs. This also streamlines MapReduce to do what is does best - process data.

Few Important Notes about YARN:
    1. YARN is backward compatible.
      This means that existing MapReduce job can run on Hadoop 2.0 without any change.
    2. No more JobTracker and TaskTracker needed in Hadoop 2.0
      JobTracker and TaskTracker has totally disappeared. YARN splits the two major functionalities of the JobTracker i.e. resource management and job scheduling/monitoring into 2 separate daemons (components).
        • Resource Manager
        • Node Manager(node specific)
      Central Resource Manager and node specific Node Manager together constitutes YARN.

      YARN in Hadoop

MapReduce: Difference between MR1 and MR2:
Earlier version of map- reduce framework in Hadoop 1.0 is called as MR1. The new version of MapReduce is known as MR2.

No more JobTracker and TaskTracker needed in Hadoop 2. With the introduction of YARN in Hadoop2, the term JobTracker and TaskTracker disappeared. MapReduce is now streamlined to perform processing data.

The new model is more isolated and scalable as compared to the earlier MR1 system. MR2 is one kind of distributed application that run MapReduce framework on top of YARN. MapReduce perform data processing via YARN. Other tools can also perform data processing via YARN. Hence Yarn execution model is more generic than earlier MapReduce model.

MR1 was not able to do so. It would only run MapReduce applications.






Support us by sharing this article.



Explore More
Close X
Close X

54 thoughts on “How YARN Overcomes MapReduce Limitations in Hadoop 2.0

  1. RakeshGupta says:

    Awesome explanation!! Very well written and communicated, exactly what I was looking for. It resolved many of my confusions.

  2. sreenivas says:

    Very good explanation yar great…..thanks

  3. Bhasker says:

    Agree with the earlier comments….crisp and to the point. Nice write-up.

  4. Mansi says:

    Very well formulated article, great help in my learnings.

  5. Avinash says:

    Great explanation. Very easy to understand and good job done.

  6. Ajit Singh says:

    Gr8 explanation; easy to understand and compare. Good work!!! Thanks for sharing.

  7. suresh says:

    will you provide any training on hadoop

  8. prathap says:

    This is what i have been looking for. Awesome explanation. very well done

  9. Ravi says:

    Very well organized and explained to the point. Very easy to understand. Thanks a lot for this article.

  10. Atul says:

    well scripted article. Looking forward to reading more articles written this way.

  11. Hi,

    This is very much understandable, simple & good article.
    Thanks for sharing such a important info. Keep it up.

    Thanks
    Pradeep Deokar.

  12. dhruv says:

    Nice and clear explanation. Thanks

  13. kamal Mondal says:

    Good differentiation between Hadoop1 & Hadoop2.

  14. Kanakasabapathy says:

    AMAZING!! WOW IT IS THE BEST EXPLANATION WHICH I HAVE EVER SEEN ON MR1 AND MR2 AND YARN AND THE DIFFERENCE

  15. Mostafa says:

    Very good article

  16. skylos mavros says:

    Clear and concise!
    Thanks

  17. Sridhar says:

    Excellent ! This is the best explanation I have got so far. Would really be awesome if you can dwell into the details of comparision taking an example.

  18. Sreekanth Reddy says:

    Awesome, Its very useful and very simple to know the difference between HP 1.0 & HP 2.0 for every one.
    Thanks a lot for Your support.

  19. Satya says:

    Awesome explanation. Thanks

  20. Prashant Basa says:

    Very well explained. Great work!!

  21. Adil Wasi says:

    Hi,
    Very good explanation. Great Work.

    Can you please upload post on pictorially work flow diagram for Hadoop2 with yarn with explanation as you have mentioned here for haddop1 with Namenode,JobTracker,DataNode & Task tracker.

    I am waiting for your reply.

    Thanks & Regards
    Aadil

  22. Adil Wasi says:

    Hi,
    As I was going through your content on your following link, but all example are in hadoop1

    http://saphanatutorial.com/hadoop-online-training-hadoop-basics-5-2/
    http://saphanatutorial.com/hadoop-online-training-hadoop-basics-5-3/

    Kindly upload such tutorial or post for hadoop2 also using yarn

    • Admin says:

      Hello,

      Yes, we would surely plan for Hadoop 2.0, It’s been quite a long time from when the contents were posted.

      We are glad to receive your request, we surely act upon it.

      Thanks for your interest and patience, have a great day !!!!

      Thanks,
      Admin

  23. VEERA says:

    Hi team,

    Really a great post where we can understand clearly about basic things.

    Thanks
    veera

  24. chinmayesri says:

    nice post ,thank you….

  25. Bharat Gupta says:

    Great! very good post, Thank you.
    Please cover more topics.

  26. Sujit says:

    Great explanation in simple and understandable language, cleared all my confusions…thanks a lot 🙂

  27. Sujit says:

    Is there any similar post for HBase?….thanks in advance

  28. DEEPASUMATHI says:

    i can’t able to understand what the below line refers …help me..!!!
    Problem with Resource Utilization in Hadoop:
    “Resource Utilization issues occur because maps slots might be ‘full’ while reduce slots is empty (and vice-versa).”

    • Admin says:

      Hello,

      Please send us screenshots at “admin@saphanatutorial.com” with the steps you were following.
      The screenshots would help us to understand where did you face the issue and what is the issue.

      Thanks,
      Admin

    • Abhimanyu says:

      As per my understanding, fixed slots means Hadoop 1.0 has predefined fixed slots for mapper tasks and reduce tasks, no say your mapper slots are full due to the mapper tasks that are being performed, now if there are more mapper tasks waiting for execution, it will have to wait until one of the mapper slot is cleaned by the Job tracker. Even though there might be reduce slots vacant hadoop 1.0 doesn’t provide the provision to use it for mapper tasks.
      I hope this was useful to you.
      @site admin: Please correct if I am wrong

  29. Abhimanyu says:

    WOW, I am so in love with this site, point to point explanation for every topic and is very easy to comprehend to.
    Kudos.

    • Admin says:

      Hello,

      Thanks for the appreciation, it would inspire us to do further. 🙂

      Thanks,
      Admin

      • Abhimanyu says:

        @admin: in hadoop installation course under section 3.3 it is mentioned that: We will run wordcount MapReduce job available in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.2.0.jar to count number of words in input file.

        I am not able to find this directory, also till 3.3 section every thing i have done and is working fine, all nodes started everything conigured.
        Please help me on section 3.3 to run map reduce programs

  30. praveen says:

    Basically I am lazy to give feedback, but when I impressed more then I dont forgot

    Excellent explanation…..
    This helps to come out of chaos

  31. K Raj Kiran says:

    Great Explanation . . .

  32. siva says:

    hii,,man done a great job this is very useful in my research work dude and neatly explained line to line.

  33. poornima palha says:

    it is really very helpful…Very easy to understand. Thanks a lot for this article.

  34. Ananthi says:

    i learnt new information about Hadoop which really helpful to develop my knowledge and cracking the interview easily.. This concept explanation are very clear so easy to understand..

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

 © 2017 : saphanatutorial.com, All rights reserved.  Privacy Policy