Big Data” and its technologies are the “new kid” on the block (atleast, for most of us!) and this post will seek to explain the fundamental details of Big Data.
Communication is at all time high today and data is generated through cell phones, televisions, credit cards, airplanes, social media, flights, trains and so on. Big data has become a huge factor in the last three years and “Data scientists” are in huge demand and most of them command mind boggling salaries.
Necessity is the mother of invention” goes the saying and the necessity to process, analyze and store huge amounts of data in the social media age has paved the way for the creation of the ‘Big Data’ and its technologies like open source project ‘Hadoop’.
Issues With Traditional Systems:
We all remember the days of database systems, normalization and the different ‘normal forms’. These traditional systems falter with the current avalanche of data that is being generated. Setting up legacy systems, maintaining them and scaling problems are the crucial reasons why Big Data and its technologies are gaining importance today. Traditional systems also need a common data format – they cannot process pictures, emails, DBMS data etc all in the same place.
How Much Data Is Being Generated?
According to a report from Webopedia, 2.5 quintillion bytes of data are being generated every day. Another not so astonishing fact – most of this data has been generated in the last 2 years alone. Among other statistics :
- 51% of the data that is generated is structured
- there are a billion social media posts every two days
- 27% of data that is generated is unstructured
- 22% of data is semi-structured
- By 2015, 4.4 million jobs will be created which will be related to Big Data
What Is Structured Data, Unstructured Data And Semi-structured Data?
Typically, all data used to be “structured” a few years ago. With the passage of time and advancements in technology, unstructured data and semi-structured data crept in.
Data in RDBMS (Relational Data Base Systems) and spreadsheet is structured data. Structured data is anything that follows a common format and can be grouped into “chunks”.
Anything that cannot be put into rows and columns is “unstructured data”.
From the picture below, we can understand that unstructured data does not follow any particular pattern and cannot be grouped into “chunks”. They vary and examples include videos, emails, Wikis, PPTs, Word files and so on. (Semi-Structured Data)
Semi structured data is anything between structured data and unstructured data. Semi-structured data is neither raw data (like videos, emails) nor data that is laid neatly in columns and rows. It is a type of structured data that has tags and other elements to identify it.
Big Data And Organizations:
Analyzing complex data and turning it into profits is one of the reasons why organizations embrace Big Data technologies. Companies like GE, UPS are all embracing Big Data technologies to optimize their businesses. “GE estimates that a 1% fuel reduction in the use of big data from aircraft engines would result in a $30 billion savings for the commercial airline industry over 15 years”. (Big Data in Big Companies)
Big Data will hold its magic over all the segments of a business including retail, healthcare in the coming years.
15 Important Big Data Facts for IT Professionals. (2014, Feb 4). Retrieved from Webopedia.com: http://www.webopedia.com/quick_ref/important-big-data-facts-for-it-professionals.html
Big Data in Big Companies. (n.d.). Retrieved from International Institute for Analytics: http://www.sas.com/content/dam/SAS/en_us/doc/whitepaper2/bigdata-bigcompanies-106461.pdf
Semi-Structured Data. (n.d.). Retrieved from http://www.dcs.bbk.ac.uk/~ptw/teaching/ssd/notes.html
- Top 25+ Fresher Java Interview Questions - March 9, 2023
- 25 Free Practice Questions – GCP Certified Professional Cloud Architect - December 3, 2021
- 30 Free Questions – Google Cloud Certified Digital Leader Certification Exam - November 24, 2021
- 4 Types of Google Cloud Support Options for You - November 23, 2021
- APACHE STORM (2.2.0) – A Complete Guide - November 22, 2021
- Data Mining Vs Big Data – Find out the Best Differences - November 18, 2021
- Understanding MapReduce in Hadoop – Know how to get started - November 15, 2021
- What is Data Visualization? - October 22, 2021