SapHanaTutorial.Com HOME     Learning-Materials Interview-Q&A Certifications Quiz Online-Courses Forum Jobs Trendz FAQs  
     Explore The World of Hana With Us     
About Us
Contact Us
 Apps
X
HANA App
>>>
Hadoop App
>>>
Tutorial App on SAP HANA
This app is an All-In-One package to provide everything to HANA Lovers.

It contains
1. Courses on SAP HANA - Basics, Modeling and Administration
2. Multiple Quizzes on Overview, Modelling, Architeture, and Administration
3. Most popular articles on SAP HANA
4. Series of Interview questions to brushup your HANA skills
Tutorial App on Hadoop
This app is an All-In-One package to provide everything to Hadoop Lovers.

It contains
1. Courses on Hadoop - Basics and Advanced
2. Multiple Quizzes on Basics, MapReduce and HDFS
3. Most popular articles on Hadoop
4. Series of Interview questions to brushup your skills
Apps
HANA App
Hadoop App
';
Search
Stay Connected
Search Topics
Topic Index
+
-
Hadoop Overview
+
-
Hadoop Examples
+
-
MapReduce
+
-
YARN
+
-
Miscellaneous

Hadoop Ecosystem and their Components - Explained in simple words

In previous articles, we explained what is Hadoop and what are Hadoop's Core Components.

In this article, we will explain Hadoop Ecosystem.

Hadoop Ecosystem

So far, we only talked about core components of Hadoop - HDFS, MapReduce.
These core components are good at data storing and processing. But later Apache Software Foundation (the corporation behind Hadoop) added many new components to enhance Hadoop functionalities.

Hadoop Ecosystem and their Components - Explained in simple words

These new components comprise Hadoop Ecosystem and make Hadoop very powerful.
Let's have a look into the Hadoop Ecosystem.

Hadoop Ecosystem and their Components - Explained in simple words

Categorization of Hadoop Components

All these components have different purpose and role to play in Hadoop Eco System.

Hadoop Ecosystem and their Components - Explained in simple words

Data Storage Layer

HDFS (Hadoop Distributed File System)

Hadoop Ecosystem and their Components - Explained in simple words
HDFS is a distributed file-system that stores data on multiple machines in the cluster.
It is suitable for storing huge files.
Note that HDFS does not provide facility of tabular form of storage.
HDFS does data chunking and distribute data across different machines.

Hadoop Ecosystem and their Components - Explained in simple words

HBase

Hadoop Ecosystem and their Components - Explained in simple words
HBase is a column-oriented database which runs on top of HDFS for providing structural data models. It stores data in tabular form.
You can think of HDFS as local file system and HBase as database management system where we store data in database in tabular format. Internally DBMS communicate to write that logical tabular data to physical file system.

HCatalog

HCatalog provides a standard view of the data stored in HDFS, so that different processing tools (like Pig, hive etc.) can read and write data more easily.
HCatalog presents a relational view of data in tabular form to ensure that users need not worry about where or in what format their data is stored in HDFS like if it's stored as RCFile format or text files or sequence files.

Hadoop Ecosystem and their Components - Explained in simple words

Data Processing Layer

It is the layer data is processed and analyzed.
In Hadoop 1.0 this layer consisted of only MapReduce. In Hadoop 2.0 YARN was introduced.

MapReduce:

MapReduce is a programming model which is used to process large data sets in a parallel processing manner.

YARN:

Hadoop Ecosystem and their Components - Explained in simple words

YARN (Yet Another Resource Negotiator) is a new component added in Hadoop 2.0
YARN took over this task of managing Hadoop Cluster from MapReduce and MapReduce is streamlined to perform Data Processing only in which it is best.

Data Access Layer

This layer is used to access data from Data Processing Layer.
Writing MapReduce programs require Java skills and you need to write lots of JAVA code. Pig and Hive are high level programming language and sits on top of MapReduce layer.
During run time Pig or Hive scripts are converted to MapReduce job only.

Hadoop Ecosystem and their Components - Explained in simple words

Pig

Pig is a tool used to analyze large amounts of data.
Using the PigLatin scripting language data analysis and processing can be easily done.

Hadoop Ecosystem and their Components - Explained in simple words

Hive

Hive provides a SQL-like interface for data stored in Hadoop. Hive allows SQL developers to write Hive Query Language (HQL) statements that are similar to standard SQL statements.
HQL statements are broken down by the Hive service into MapReduce jobs and executed across Hadoop cluster.

Hadoop Ecosystem and their Components - Explained in simple words

Sqoop

Sqoop is a tool to transfer data between Hadoop and relational databases.
You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.

Hadoop Ecosystem and their Components - Explained in simple words

Mahout

Mahout provides a library of scalable machine learning algorithms useful for big data analysis based on Hadoop or other storage systems.
Hadoop Ecosystem and their Components - Explained in simple words
Mahout also provides Java libraries for common math operations (focused on linear algebra and statistics) and primitive Java collections.

Avro

Avro is a remote procedure call and data serialization framework developed within Hadoop project. It uses JSON for defining data types and protocols and serializes data in a compact binary format.

Management Layer

This is the layer that meets the user. User access the system through this layer which has various components. Few components are as below.
Hadoop Ecosystem and their Components - Explained in simple words

Ambari

Ambari is a web-based tool for provisioning, managing and monitoring Hadoop Clusters.

Chukwa

Chukwa, is data collection system for monitoring of distributed systems and more specifically Hadoop clusters.

Zookeeper

ZooKeeper is an open-source project which deals with maintaining configuration information, naming, providing distributed synchronization, group services for various distributed applications. It implements various protocols on the cluster so that the applications need not to implement them on their own.

If you have a question or doubt, please post that in comment.



Support us by sharing this article.



Explore More
Close X
Close X

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

 © 2017 : saphanatutorial.com, All rights reserved.  Privacy Policy