Sample Hands-on Projects in BigData

I would recommend using ClouderaVM to do these. Alternatively, you can use the Elastic Map Reduce service in AWS.

Data Mining (Text Mining)

General Strategy

The basic data (text) mining project outline is as follows:

  1. Stackexchange posts are available as a massive XML dump
  2. Parse XML dump and store into HDFS
  3. Analysis on the parsed posts – such as the samples below:

Posts Mining – Sample labs/exercises on the parsed stackexchange ‘Posts’

  1. — Find all Questions that do not have any answers.
  2. –Find the most viewed questions in each category.

Sample  exercises  on stackexchange ‘Users’

  1. The User profile with maximum views.
  2. The top users with maximum reputation points.

Sample mining of Comments

  1. The Question Post that have highest number of comments etc.

Predictive Analysis (on StackExchange Posts)

Sample Exercise – Find average time between a question appearing and an answer being posted

General Strategy

  • For each posted question the fastest reply was taken into consideration and the time difference between posting a question and getting the first reply was calculated.
  • This difference was averaged for all the posts belonging to a category, thereby predicting the activity on a post.

Cloud Advisory Services | Security Advisory Services | Data Science Advisory and Research

Specializing in high volume web and cloud application architecture, Anuj Varma’s customer base includes Fortune 100 companies (, British Petroleum, Schlumberger).

All content on this site is original and owned by AdverSite Web Holdings, Inc. – the parent company of No part of it may be reproduced without EXPLICIT consent from the owner of the content.

Anuj Varma – who has written posts on Anuj Varma, Technology Architect.

Leave a Reply

Your email address will not be published. Required fields are marked *