Testing Strategies: Big Data and Hadoop

Well. Im currently researching how we can test Big Data related applications. After I browse many sites I got some useful information on How we can test BIG Data based Hadoop Systems.

BigData testing basically involve Analytics part like BI, ETL testing, running Jobs Flows through any third party Hadoop Implemented Tools.

BI:

Its purely Report based testing where we will be having Analysed Data Store to Showcase Report based Results through Any third party Reporting Tools like Cognos, MS Bi etc.

ETL testing:

Traditional ETL tools will be extending capabilty to integrate with Hadoop HDFS( Hadoop Distributed File System).
 Ex: In IBM datastage  8.7, There is new sequence stage called BDFS(Big Data File Stage) which will act as connector to Hadoop HDFS.


Running MapReduce/Hive Jobs flows:

This is pure Hadoop Testing, We will be given an front end tool to configure,

Number of instances,
The Input file to process,
The Output file to store Result,
And the Processing Application.

Once We configure a Job Flow, next comes Running and monitor a Job Flow.

Hadoop Testcases could be,
1. Checking and Monitoring Available and used space in HDFS Clusters.
2.Cluster Failure Handling.
3.Data Redundancy in Output File  during Pausing/Stopping/Re-execute due to nextwork issues etc.
4.Data Integrity and Redundancy when Task Failure.
5.Data Integrity End to End flows.
6.Performance TestCases on Cluster nodes.
7.Correctness in DataTransformation Logic.
8. DataType checking between Input and Output files.
9. Exporting/Importing of Data to HDFS.
10.Nagative Cases when Input File size more than Block sizes etc

Comments