Sunday, April 23, 2017

Difference Between BigData and Hadoop

Difference between bigdata and hadoop
Difference between Big Data and Hadoop

Difference between Bigdata and Hadoop

The difference between Big Data and Hadoop is: there is no such difference.😉 They complement each other.

Let's explain this!

The amount of data produced across the world is increasing exponentially and is currently doubling in size every two years. It is estimated that by the year 2020, the data available will reach 44 zettabytes (44 trillion gigabytes). 

Where does Big Data come from?

Mobile devices, remote sensing technologies, software logs, cameras, microphones, radio-frequency identification, wireless sensors, weather satellites and sensors, scientific experiments, social networks, internet text and documents, Internet search indexing, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, medical records, photography archives, video archives, and large-scale e-commerce, all contribute.

Big Data is nothing but a concept which facilitates handling many data sets. Big data refers to the large sets of data that businesses and other parties put together to reach specific goals. It can include many different kinds of data in many different kinds of formats. Big Data is simply a term that depicts the expansive volume of information, both organized and unstructured. They are so complex and large that traditional data processing applications cannot deal with them. 

Data in a form which cannot be represented in databases are known as Unstructured/Semi-structured data. A collection of a huge set of such data which conventional software is unable to capture, manage and process in a stipulated amount of time is known as “Big Data”.

Big Data is generally described to be having the following 3 properties: volume, velocity, variety.

On the other hand....

Hadoop is an innovation which helps you store and process Big Data. Is an open source, Java-based programming structure that supports the taking care of and limit of to an awesome degree significant data sets in a scattered figuring environment. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

Hadoop is one of the tools designed to handle Big Data and is maintained by a global community of users. You can easily grow your system to handle more data simply by adding nodes. Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail.

There are two main components of the Hadoop framework though:

HDFS: Hadoop Distributed File System (HDFS) It is the distributed file system which is used to store the big data on different systems in a cluster which will be processed by Hadoop.

MapReduce: MapReduce is the actual framework which is used for processing of the data stored in HDFS. 

Said this, you can figure out that actually there is no difference between Big Data and Hadoop. One is the raw material in large quantities and the other is the tool that stores and processes.