Discuss the advantages of Hadoop technology and distributed

Discuss the advantages of Hadoop technology and distributed data file systems.

How is an Hadoop Distributed File System different from a Relational Database system?

What organizational issues are best solved using Hadoop technology? Give examples of the type of data they will analyze.

What companies currently use Hadoopo related technologies.

Solution

Hadoop is a highly scalable storage platform.. It can store and distribute very large data sets across hundreds of inexpensive servers that operate in parallel.

There are five advantages of Hadoop technology..

1. Scalable

Unlike traditional relational database systems (RDBMS) that can\'t scale to process large amounts of data,

Hadoop enables businesses to run applications on thousands of nodes involving thousands of terabytes of data.

2. Cost effective

Hadoop also offers a cost effective storage solution for businesses\' exploding data sets. The problem with traditional relational database management systems is

that it is extremely cost prohibitive to scale to such a degree in order to process such massive volumes of data

3. Flexible

Hadoop enables businesses to easily access new data sources and tap into different types of data (both structured and unstructured) to generate value from that data.

This means businesses can use Hadoop to derive valuable business insights from data sources such as social media, email conversations or clickstream data.

4. Fast

If you\'re dealing with large volumes of unstructured data, Hadoop is able to efficiently process terabytes of data in just minutes, and petabytes in hours.

5. Resilient to Failure

A key advantage of using Hadoop is its fault tolerance. When data is sent to an individual node, that data is also replicated to other nodes in the cluster,

which means that in the event of failure, there is another copy available for use.

--> Difference between hadoop and relational database system

basically the difference is that Hadoop isn\'t a database at all. Hadoop is basically a distributed file system (HDFS) -

it lets you store large amount of file data on a cloud of machines, handling data redundancy etc

On top of that distributed file system, Hadoop provides an API for processing all that stored data - Map-Reduce.

The basic idea is that since the data is stored in many nodes, you\'re better off processing it in a distributed manner where each node can process the data stored on it rather than spend a lot of time moving it over the network.

Unlike RDMS that you can query in realtime, the map-reduce process takes time and doesnt produce immediate results.

On top of this basic scheme you can build a Column Database, like HBase.

A column-database is basically a hashtable that allows realtime queries on rows.

--> Organizations are making critical predictions by sorting through and analyzing Big Data.

Formatting unstructured data makes it suitable for data mining and analysis. Hadoop is the core platform for structuring Big Data.

It also solves the problem of formatting it for analytic purposes.

Hadoop uses a distributed computing architecture consisting of many servers using commodity hardware.

This intern makes it inexpensive to scale and support massive data stores. Hadoop has some challenges.

Discuss the advantages of Hadoop technology and distributed data file systems. How is an Hadoop Distributed File System different from a Relational Database sys
Discuss the advantages of Hadoop technology and distributed data file systems. How is an Hadoop Distributed File System different from a Relational Database sys

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site