Discuss the advantages of Hadoop technology and distributed
Discuss the advantages of Hadoop technology and distributed data file systems. How is an Hadoop Distributed File System different from a Relational Database system? What organizational issues are best solved using Hadoop technology? Give examples of the type of data they will analyze. What companies currently use Hadoopo related technologies.
Solution
The advantages of Hadoop technology
1. Scalable
Hadoop is a highly scalable storage platform, because it can store and distribute very large data sets across hundreds of inexpensive servers that operate in parallel. Unlike traditional relational database systems (RDBMS) that can\'t scale to process large amounts of data, Hadoop enables businesses to run applications on thousands of nodes involving thousands of terabytes of data.
2. Cost effective
Hadoop also offers a cost effective storage solution for businesses\' exploding data sets. The problem with traditional relational database management systems is that it is extremely cost prohibitive to scale to such a degree in order to process such massive volumes of data. In an effort to reduce costs, many companies in the past would have had to down-sample data and classify it based on certain assumptions as to which data was the most valuable. The raw data would be deleted, as it would be too cost-prohibitive to keep. While this approach may have worked in the short term, this meant that when business priorities changed, the complete raw data set was not available, as it was too expensive to store. Hadoop, on the other hand, is designed as a scale-out architecture that can affordably store all of a company\'s data for later use. The cost savings are staggering: instead of costing thousands to tens of thousands of pounds per terabyte, Hadoop offers computing and storage capabilities for hundreds of pounds per terabyte.
3. Flexible
Hadoop enables businesses to easily access new data sources and tap into different types of data (both structured and unstructured) to generate value from that data. This means businesses can use Hadoop to derive valuable business insights from data sources such as social media, email conversations or clickstream data. In addition, Hadoop can be used for a wide variety of purposes, such as log processing, recommendation systems, data warehousing, market campaign analysis and fraud detection.
4. Fast
Hadoop\'s unique storage method is based on a distributed file system that basically \'maps\' data wherever it is located on a cluster. The tools for data processing are often on the same servers where the data is located, resulting in much faster data processing. If you\'re dealing with large volumes of unstructured data, Hadoop is able to efficiently process terabytes of data in just minutes, and petabytes in hours.
5. Resilient to failure
A key advantage of using Hadoop is its fault tolerance. When data is sent to an individual node, that data is also replicated to other nodes in the cluster, which means that in the event of failure, there is another copy available for use.
The MapR distribution goes beyond that by eliminating the NameNode and replacing it with a distributed No NameNode architecture that provides true high availability. Our architecture provides protection from both single and multiple failures.
The advantages of distributed data file systems:
Sharing Data :
There is a provision in the environment where user at one site may be able to access the data residing at other sites.
Autonomy :
Because of sharing data by means of data distribution each site is able to retain a degree of control over data that are stored locally.
In distributed system there is a global database administrator responsible for the entire system. A part of global data base administrator responsibilities is delegated to local data base administrator for each site. Depending upon the design of distributed database
each local database administrator may have different degree of local autonomy.
Availability :
If one site fails in a distributed system, the remaining sites may be able to continue operating. Thus a failure of a site doesn\'t necessarily imply the shutdown of the System.
How is an Hadoop Distributed File System different from a Relational Database system?
Hadoop framework works very well with structured and unstructured data. This also supports variety of data formats in real time such as XML, JSON and text based flat file formats. However, RDBMS only work with better when an entity relationship model (ER model) is defined perfectly and therefore, the database schema or structure can grow and unmanaged otherwise. i.e., An RDBMS works well with structured data. Hadoop will be a choice in environments such as when there are needs for BIG data processing on which the data being processed does not have consistent relationships. Where the data size is too BIG for complex processing, or not easy to define the relationships between the data, then it becomes difficult to save the extracted information in an RDBMS with a coherent relationship.
What organizational issues are best solved using Hadoop technology?
1. eCommerce Challenges.
2. big data problem.
3. banking
4. cases in retail.
examples of the type of data they will analyze:
1. Analyze life-threatening risks
2. Identify warning signs of security breaches
3. Prevent hardware failure
4. Understand what people think about your company
5. Understand when to sell certain products
6. Find your ideal prospects
7. Gain insight from your log files
companies currently use Hadoopo related technologies:

