Posts

Showing posts from March, 2014

Understanding Hadoop - Continued

Image
Continued from previous post What is hive? Hive is datawarehouse built on top of hadoop. We can create tables in Hive using "Create External Table" command. Hive uses HDFS files as input to the tables. What is Hbase? A NoSQL Database developed by Apache. It is column oriented database each column is associated with Rowkey. What is Mahout? A Set of machine learning libraries that can be used on Map-Reduce programming. Ex: k-means Clustering, Collaborating filtering etc. What is PIG? Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. What is flume? Flume is a Service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. Typical Hadoop Architecture is as shown below:

Understanding Hadoop

Image
Everyone says Hadoop is a new technology which deals with Bigdata and it contains many frameworks of applications such as, HADOOP HDFS, HIVE, HBASE, PIG MAHOUT, ZOOKEEPER,  OOZIE, FLUME, SQOOP Confusing?? Technology has the highest power today which is leading us towards greater innovations. Everyone in this world use technology to minimize their personal effort, Connectivity, Communications and the list goes on.. Day by day Technology is changing and people should be more innovative interms of increasing their business, saving time, saving costs etc. Lets understand what is this BigData is all about: Bigdata, put 'big' aside and lets talk about 'data' What is data? Data is raw and unorganized facts that needs to be processed. Basically its useless until it is organized properly. Universal example: A Log File. in Log file you see some of the below: A series of events or actions with a Timestamp prefixed with it. A set of Exceptions/Err