SapHanaTutorial.Com HOME     Learning-Materials Interview-Q&A Certifications Quiz Online-Courses Forum Jobs Trendz FAQs  
     Explore The World of Hana With Us     
About Us
Contact Us
 Apps
X
HANA App
>>>
Hadoop App
>>>
Tutorial App on SAP HANA
This app is an All-In-One package to provide everything to HANA Lovers.

It contains
1. Courses on SAP HANA - Basics, Modeling and Administration
2. Multiple Quizzes on Overview, Modelling, Architeture, and Administration
3. Most popular articles on SAP HANA
4. Series of Interview questions to brushup your HANA skills
Tutorial App on Hadoop
This app is an All-In-One package to provide everything to Hadoop Lovers.

It contains
1. Courses on Hadoop - Basics and Advanced
2. Multiple Quizzes on Basics, MapReduce and HDFS
3. Most popular articles on Hadoop
4. Series of Interview questions to brushup your skills
Apps
HANA App
Hadoop App
';
Search
Stay Connected
Search Topics
Topic Index
+
-
Hadoop Overview
+
-
MapReduce
+
-
YARN
+
-
Miscellaneous
What is Hadoop?



Hadoop gets a lot of buzz these days in database. This open source software platform managed by the Apache Software Foundation has proven to be very helpful in storing and managing vast amounts of data cheaply and efficiently. But many people in the industry still don't really know what exactly Hadoop is.

In this article we will explore - what is Hadoop? What makes it so special? How Hadoop can be best applied?

Hadoop Overview:

Hadoop (also known as Apache Hadoop) is an open source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer.

Hadoop is designed to be robust, in that your Big Data applications will continue to run even when individual servers or clusters fail.

Hadoop is not a database:
Hadoop an efficient distributed file system and not a database. It is designed specifically for information that comes in many forms, such as server log files or personal productivity documents. Anything that can be stored as a file can be placed in a Hadoop repository.

A Brief History of Hadoop:

Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts. Any of these parts (also called fragments or blocks) can be run on any node in the cluster. Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant.
What is Hadoop

What problems can Hadoop solve?

The Hadoop platform was designed to solve problems where you have a lot of data " perhaps a mixture of complex and structured data " and it doesn't fit nicely into tables. It's for situations where you want to run analytics that are deep and computationally extensive, like clustering and targeting. That's exactly what Google was doing when it was indexing the web and examining user behavior to improve performance algorithms.

Hadoop applies to a bunch of markets. In finance, if you want to do accurate portfolio evaluation and risk analysis, you can build sophisticated models that are hard to jam into a database engine. But Hadoop can handle it. In online retail, if you want to deliver better search answers to your customers so they're more likely to buy the thing you show them, that sort of problem is well addressed by the platform Google built.

Hadoop is used for:
    • Search - Yahoo, Amazon, Zvents
    • Log processing - Facebook, Yahoo
    • Data Warehouse - Facebook, AOL
    • Video and Image Analysis - New York Times, Eyealike

Hadoop Architecture:

Hadoop is designed to run on a large number of machines that don't share any memory or disks. That means you can buy a whole bunch of commodity servers, slap them in a rack, and run the Hadoop software on each one. When you want to load all of your organization's data into Hadoop, what the software does is bust that data into pieces that it then spreads across your different servers. There's no one place where you go to talk to all of your data; Hadoop keeps track of where the data resides. And because there are multiple copy stores, data stored on a server that goes offline or dies can be automatically replicated from a known good copy.

Architecturally, the reason you're able to deal with lots of data is because Hadoop spreads it out. And the reason you're able to ask complicated computational questions is because you've got all of these processors, working in parallel, harnessed together.

Components of Hadoop:

The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper. MapReduce and Hadoop distributed file system (HDFS) are the main component of Hadoop.

MapReduce:
The framework that understands and assigns work to the nodes in a cluster

Hadoop distributed file system (HDFS):
HDFS is the file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes.

What is Hadoop

Advantage of Hadoop:

It's Scalable:
New nodes can be added as needed and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.

It's Cost effective:
Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.

It's Flexible:
Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.

It's Fault tolerant:
When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.


Next Recommended Article:
SAP HANA and Hadoop




Support us by sharing this article.



Explore More
Close X
Close X

22 thoughts on “What is Hadoop

  1. Hemant Janrao says:

    Really nice tutorial 🙂 keep it up

  2. Surendra says:

    it’s very nice article and it will become even better if there is something about hadoop ecosystem also.

  3. Gyanesh Prakash says:

    Do we store data in json format like we do in MongoDB?

  4. suresh says:

    Hi ,
    Is there any training provided by you

  5. Hanunmanth says:

    it’s very good tutorial for Beginners.

  6. Hanunmanth says:

    Pls Upload the study material for Advance Hadoop Development.

    • Admin says:

      Hi Hanunmanth,
      We are working on providing more content on Hadoop. Soon we will publish some more articles on it.

  7. Jags says:

    Thanks for the nice article.

  8. Hasectic says:

    very nice keep it up!
    i m learning and working on big data and hadoop, so it was and ll your tutorial were helpful

  9. deven khandekar says:

    HI

    MY NAME IS DEVEN AND I HAVE DONE BA FROM PUNE UNIVERSITY, I WANT TO LEARN HADOOP COURSE, PLEASE GUIDE ME AND PLEASE TELL ME AFTER DONE THIS COURSE AM I GET THE JOB IN IT FIELD ?

  10. santosh says:

    Nice tutorial, love to study.

  11. salman noorani says:

    Thanks for the info.

  12. Satish says:

    Hello Admin,

    I’m planning to learn Hadoop but I don’t have any Java stuff. But how can I learn Hadoop? Please advise me on the same

  13. Omer Ahmed says:

    To be honest I never read such a great article.

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

 © 2017 : saphanatutorial.com, All rights reserved.  Privacy Policy