Understanding Hadoop - Continued

Continued from previous post

What is hive?

Hive is datawarehouse built on top of hadoop.

We can create tables in Hive using "Create External Table" command. Hive uses HDFS files as input to the tables.

What is Hbase?

A NoSQL Database developed by Apache. It is column oriented database each column is associated with Rowkey.

What is Mahout?

A Set of machine learning libraries that can be used on Map-Reduce programming.
Ex: k-means Clustering, Collaborating filtering etc.

What is PIG?

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.

What is flume?

Flume is a Service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.


Typical Hadoop Architecture is as shown below:


Comments