Characterization of Big Data - Volume, Velocity and Variety (3Vs):
As far back as 2001, industry analyst Doug Laney (currently with Gartner) articulated the now mainstream definition of big data as the 3V's of big data: volume, velocity and variety1.
- Volume: The benefit gained from the ability to process large amounts of information is the main attraction of big data analytics. More data leads to more accurate analysis.
Example: If you could run that forecast taking into account 300 factors rather than 6, could you predict demand better?
Turn 12 terabytes of Tweets created each day into improved product sentiment analysis.
Convert 350 billion annual meter readings to better predict power consumption
Velocity: Sometimes 2 minutes is too late. For time critical applications where Time is the core factor such that catching the frauds, catching the hackers, running status of train, big data must be used as it streams into your enterprise in order to maximize its value. Not only is the volume of data large, it is arriving ever more rapidly.
Example: Scrutinize 5 million trade events created each day to identify potential fraud.
- Variety: Big data includes both structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together.
Example: Monitor 100's of live video feeds from surveillance cameras to target points of interest
Exploit the 80% data growth in images, video and documents to improve customer satisfaction.
All these types of data can have a significant effect on a business. Finding out quickly what the data means and understanding its importance provides a business with an ongoing advantage as well as the opportunity to realize competitive benefits.
4th Challenge - Veracity:
There is a 4th challenge - veracity.
There is one common problem to trust on the data, data accuracy and data sources.
Any business should have some basic idea of these (data accuracy and data sources) to make decision where data is one factor. Now the Big Data obviously would amplify and extend this problem due to 3V's i.e. big data contains variety and vast data .
3 Vs present challenges - Problems with Traditional Disk based RDBMS:
The three Vs present challenges for conventional disk-based relational databases
- The traditional databases are not designed to handled database insert/update rates required to support the Velocity at which Big Data arrives or needs to be analysed.
- The traditional databases require the database schema to be created in ADVANCE to define the data how it would look like which makes it harder to handle Variety
Traditional databases can't analyze data from social media, data from videos, data from sensors as this type of data grows at very fast speed and also this is unstructured data.