SapHanaTutorial.Com HOME     Learning-Materials Interview-Q&A Certifications Quiz Online-Courses Forum Jobs Trendz FAQs  
     Explore The World of Hana With Us     
About Us
Contact Us
 Apps
X
HANA App
>>>
Hadoop App
>>>
Tutorial App on SAP HANA
This app is an All-In-One package to provide everything to HANA Lovers.

It contains
1. Courses on SAP HANA - Basics, Modeling and Administration
2. Multiple Quizzes on Overview, Modelling, Architeture, and Administration
3. Most popular articles on SAP HANA
4. Series of Interview questions to brushup your HANA skills
Tutorial App on Hadoop
This app is an All-In-One package to provide everything to Hadoop Lovers.

It contains
1. Courses on Hadoop - Basics and Advanced
2. Multiple Quizzes on Basics, MapReduce and HDFS
3. Most popular articles on Hadoop
4. Series of Interview questions to brushup your skills
Apps
HANA App
Hadoop App
';
Search
Stay Connected
Search Topics
Course Index
Close
X
Hadoop Basics
Course Overview
1. Big Data
2. Introduction to Hadoop
3. Hortonworks SandBox Installation
4. Run Your First "Hello World Example"
5. Hadoop Core Components
6. Hadoop Programming Language (HIVE, PIG)
7. Installation of ODBC Hive Driver in Windows 7
8. A Real World Business Example of Hadoop
9. What is Next?
<< Previous
Next >>
5.3 Map reduce (Processing of data)

MapReduce is programming model for efficient computation of data. MapReduce brings computation to the data location in contrast to traditional parallelism.

Conceptually, MapReduce works in 2 steps.
    1. First being the mapper phase where the map job takes a collection of data and convert it in to another set of data, where individual elements are broken into <key,value> pair.
    2. Next in reducer phase , the reduce job takes the output from the map as input and those <key,value> pairs into smaller set of <key,value> pairs as an output.
Input and Output types of a Map/Reduce job:

Hadoop Basics

(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3, v3> (output) Confused??? Let's understand with the help of an example.

MapReduce Example:

Take simple Word Count Example.
Objective: Word count is just to count the number of occurrences of words in set of files which is input.
Input: Input can be collection of thousand of files, documents etc. For now just take a small set of 3 files.


First File has content : "Hello Bob, How are you?"
Second File has content: "I see you Bob."
Third File has content : "I want to talk to you."

Now let's see how Map Reduce approach works:
Step 1: Each line would be distributed over individual mapper instances.
"Hello Bob, How are you?" ” To mapper instance 1
"I see you Bob" - To mapper instance 2
"I want to talk to you." - To mapper instance 3

Step2: Here In the map job, the sentence would be split as words and form the initial key value pair as
<hello, 1>
<Bob, 1>
<How ,1>
<Are,1>
<You, 1>

<I, 1>
<See, 1>
<You, 1>
<Bob ,1 >

<I,1>
<want,1>
<to, 1>
<talk, 1>
<to, 1>
<You, 1>

Step3: In the reduce phase the keys are grouped together and the values for similar keys are added.
So the result of the reducer phase would be as below.
< Hello,1>
< Bob, 1>
< How 2>
< Hello, 2>
< World, 2>

You see, This would give the number of occurrence of each word in the input files.
Have a look at below Image to understand the flow of mapper and reducer with one more example.

Hadoop Basics

You can develop MapReduce applications in Java or any JVM-based language.

Components of MapReduce:

The main components of MapReduce are as described below:

Hadoop Basics



How exactly MapReduce (map/ reduce) job works (WorkFlow):

Let's try to understand the work flow of map reduce here in bit more detail.

Hadoop Basics

    1. When Client applications submit map reduce jobs to the Job tracker.
    2. The JobTracker talks to the Name node to determine the location of the data
    3. The JobTracker locates Tasktracker nodes with available slots at or near the data.
    4. The JobTracker submits the work to the chosen Tasktracker nodes.
    5. The TaskTracker nodes are monitored. If they do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different TaskTracker
    6. When the work is completed, the JobTracker updates its status.
    7. Client applications can poll the JobTracker for information.
The JobTracker is a point of failure for the Hadoop MapReduce service. If it goes down, all running jobs are halted.
<< Previous
Next >>

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

 © 2017 : saphanatutorial.com, All rights reserved.  Privacy Policy