Few Interesting Facts about Apache Hadoop
- Hadoop is top level Apache project initiated and led by YAHOO
- Hadoop is one of the most popular environment to work with Big data and to solve Big data problems.
- Hadoop is not a database but a file system. Hold on!! It is not just a distributed file system but a complete open source framework
- Hadoop core component are HDFS(Hadoop Distributed File System) and Map Reduce.
- The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as shell-scripts
- All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework.
- A standard 10 node cluster can have a 1 TB of RAM total and do the entire processing in memory (distributed) using MapReduce2/YARN
Few real life example: How industries are implementing Hadoop and solving BIG Data problems:
And the list goes on and on..
Companies that offer services on or based around Hadoop are:
Many companies from IBM to Amazon Web Services, Microsoft and Teradata have packaged Hadoop into more easily-consumable distributions or services. Each company has their own strategies but the key differentiator for all of these is that Hadoop has the ability to distribute workloads across potentially thousands of servers," making big data manageable data"
How would you know if your company really needs Hadoop?
In simple words,
- When you have very huge dataset which cannot be fit into single machine
- When you have very huge data to be processed within reasonable amount of time period and you cannot achieve this with single machine