SapHanaTutorial.Com HOME     Learning-Materials Interview-Q&A Certifications Quiz Online-Courses Forum Jobs Trendz FAQs  
     Explore The World of Hana With Us     
About Us
Contact Us
Hadoop App
Tutorial App on SAP HANA
This app is an All-In-One package to provide everything to HANA Lovers.

It contains
1. Courses on SAP HANA - Basics, Modeling and Administration
2. Multiple Quizzes on Overview, Modelling, Architeture, and Administration
3. Most popular articles on SAP HANA
4. Series of Interview questions to brushup your HANA skills
Tutorial App on Hadoop
This app is an All-In-One package to provide everything to Hadoop Lovers.

It contains
1. Courses on Hadoop - Basics and Advanced
2. Multiple Quizzes on Basics, MapReduce and HDFS
3. Most popular articles on Hadoop
4. Series of Interview questions to brushup your skills
Hadoop App
Stay Connected
Search Topics
Topic Index
Hadoop Overview

Hive and Pig - Introduction and key differences between them

In our previous articles What is Hadoop?, MapReduce - The Heart of Hadoop and Hadoop Cluster - Architecture, Core Components and Work-flow we explained Hadoop, MapReduce and Hadoop Eco System.
In this article we will talk about 2 important Hadoop High level Programming Languages - Hive and Pig, which sits on the top of MapReduce layer in the Hadoop ecosystem.

Why do we need Hive and Pig?

Once you have all your big data loaded into Hadoop, the next step would be to play with the data i.e. process the data, analyze the data and once you get the output, make the business decisions from the analysis.
In Hadoop, you write MapReduce programs to do data analysis but writing MapReduce programs require Java skills and you need to write so many lines of plain Java code.

Hive (an SQL programming languages) integrated in Hadoop ecosystem solves this problem up to much extent.
Hive and Pig
Pig Latin (also called Pig) and Hive sits on one layer up of MapReduce layer in Hadoop ecosystem and provides high level language for using MapReduce library
  • Pig
  • Hive

What is Pig?

Pig or Apache Pig is a high-level platform for creating programs that run on Hadoop. The language for this platform is called Pig Latin. Pig Latin is a scripting language.

Hive and Pig

Pig was developed at Yahoo for analysis of large data and to reduce time in writing map-reduce programs.
Pig has 2 core components.
  1. SQL-like scripting language, called Pig Latin
  2. Second one being the runtime environment  where Pig Latin programs would get executed

What are the differences between Pig and MapReduce? What are the advantages of Pig?

Hive and Pig
  1. Pig is complete, so you can do all required data manipulations in Apache Hadoop with Pig.
  2. Pig internally create sequence of MapReduce jobs that are  run by  Hadoop Cluster
  3. Pig scripts takes 5% of the time compared to writing MapReduce  programs in JAVA but run time performance reduce by 50% due to  obvious reason that MapReduce programs (in java) runs natively.
  4. What if something goes wrong in between? Pig being a procedural language, you can easily debug at each intermediate steps.
  5. For scenarios with complex job pig would be the best language to work with due to its debugging capability.

What is Hive?

Hive and Pig
Hive or Apache Hive is a data warehouse system for querying and analyzing large datasets stored in Hadoop files.
Hive provides a SQL-like query language which is known as HiveQL. In Hive you have SQL-like interface where user can write limited number of commands like SQL.
Initially developed by Facebook, Apache Hive is now used and developed by many other companies.

This is the reason why Hive is mostly known as Hive SQL.
  1. Hive provides a data warehouse view of the data loaded in HDFS.
  2. With Hive you can do ad hoc analysis, perform joins on the data sets.
  3. Remember that Hive only deals with structured data.
  4. Hive also allow to write custom mapper and reducers to extend the QL-capabilities.

Which one shall I use - Hive or Pig? Difference between Hive and Pig?

From a technical point of view, both Pig and Hive are feature complete, so you can do tasks in either tool. However, you will find one tool or the other will be preferred by the different groups that have to use Apache Hadoop. The good part is they have a choice and both tools work together.

Hive because of its SQL like query language is often used as the interface to an Apache Hadoop based data warehouse. Hive is considered friendlier and more familiar to users who are used to using SQL for querying data.
Pig fits in through its data flow strengths where it takes on the tasks of bringing data into Apache Hadoop and working with it to get it into the form for querying.

There is no simple way to compare both Pig and Hive. However, you can check below points to get a fair idea.
Hive and Pig

Support us by sharing this article.

Explore More
Close X
Close X

2 thoughts on “Hive and Pig – Introduction and key differences between them

  1. Arief says:

    Nice article to get to know about Pig and Hive, thanks lot 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Current day month ye@r *

 © 2017 :, All rights reserved.  Privacy Policy