This app is an All-In-One package to provide everything to HANA Lovers.
1. Courses on SAP HANA - Basics, Modeling and Administration
2. Multiple Quizzes on Overview, Modelling, Architeture, and Administration
3. Most popular articles on SAP HANA
4. Series of Interview questions to brushup your HANA skills
The term "Big Data" is used to describe the collection of complex and large data sets such that it's difficult to capture, process, store, search and analyze this kind of data using conventional data base management tools and traditional databases management system.
Where does Big Data come from?
Now the next question would be from where this Big Data originates, what makes the Big Data?
Basically the data coming from everywhere like
sensors used to gather climate information
posts to social media sites
digital pictures and videos
software logs, cameras
scans of government documents
purchase transaction records
cell phone GPS signals
and many more.
All these together constitute Big Data.
Some interesting facts about Big Data:
Big data includes both structured and unstructured data.
If you look the current trend, Every day we create 2.5 quintillion bytes of data - so much that 90% of the data in the world today has been created in the last two years alone. Big data deals with data in petabyte and exabyte.
Big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages since it requires "massively parallel software running on tens, hundreds, or even thousands of servers".
Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach.
Why Big Data is so important for any business organization and current society?
Below image explain it 3 simple sentence.
The value of big data to an organization falls into two categories:
Enabling new products
Analytical use "Big data analytics can reveal insights hidden previously by data too costly to process, such as peer influence among customers, revealed by analyzing shoppers" transactions, social and geographical data. Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data, in contrast to the somewhat static nature of running predetermined reports.
Enabling new products - The past decade's successful web startups are prime examples of big data used as an enabler of new products and services.
For example, by combining a large number of signals from a user's actions and those of their friends, Facebook has been able to craft a highly personalized user experience and create a new kind of advertising business. It's no coincidence that the lion's share of ideas and tools underpinning big data has emerged from Google, Yahoo, Amazon and Facebook.
Characterization of Big Data - Volume, Velocity and Variety (3Vs):
As far back as 2001, industry analyst Doug Laney (currently with Gartner) articulated the now mainstream definition of big data as the 3V's of big data: volume, velocity and variety1.
Volume: The benefit gained from the ability to process large amounts of information is the main attraction of big data analytics. More data leads to more accurate analysis. Example: If you could run that forecast taking into account 300 factors rather than 6, could you predict demand better?
Turn 12 terabytes of Tweets created each day into improved product sentiment analysis.
Convert 350 billion annual meter readings to better predict power consumption
Velocity: Sometimes 2 minutes is too late. For time critical applications where Time is the core factor such that catching the frauds, catching the hackers, running status of train, big data must be used as it streams into your enterprise in order to maximize its value. Not only is the volume of data large, it is arriving ever more rapidly. Example: Scrutinize 5 million trade events created each day to identify potential fraud
Analyze 500 million daily call detail records in real-time to predict customer churn faster
"machine data" generated on the factory floor or trading data generated by financial markets.
Variety: Big data includes both structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. Example: Monitor 100's of live video feeds from surveillance cameras to target points of interest
Exploit the 80% data growth in images, video and documents to improve customer satisfaction.
All these types of data can have a significant effect on a business. Finding out quickly what the data means and understanding its importance provides a business with an ongoing advantage as well as the opportunity to realize competitive benefits.
4th Challenge - Veracity:
There is a 4th challenge "veracity."
There is one common problem to trust on the data, data accuracy and data sources. Any business should have some basic idea of these (data accuracy and data sources) to make decision where data is one factor. Now the Big Data obviously would amplify and extend this problem due to 3V's i.e. big data contains variety and vast data .
3 Vs present challenges - Problems with Traditional Disk based RDBMS:
The three Vs present challenges for conventional disk-based relational databases
The traditional databases are not designed to handled database insert/update rates required to support the Velocity at which Big Data arrives or needs to be analysed.
The traditional databases require the database schema to be created in ADVANCE to define the data how it would look like which makes it harder to handle Variety
Traditional databases can't analyze data from social media, data from videos, data from sensors as this type of data grows at very fast speed and also this is unstructured data.
Few initiatives to handle Big Data Challenges:
Some RDBMS has tried to solve these challenges which include SAP Sybase IQ. It uses column-store technology, enabling it to compress data efficiently, and a parallel processing approach on multiple servers to handle multi petabyte data stores.
Database appliances have been built to address these problems, including the SAP HANA database and SAP HANA software, which have demonstrated a greater than 20x compression rate where a 100 TB five-year sales and distribution data set was reduced to 3.78 TB and analytic queries on the entire data set ran in under four seconds.
Big Data and Hadoop:
As Big Data overwhelms traditional databases, storage and more, companies are looking to exploit new tools like Hadoop. The reality is that you want to avoid single device bottlenecks, since they inhibit scaling. Hadoop uses map-reduce to spread analytical processing across armies of commodity servers.
The other approach would be to use non-relational data store "HADOOP".