There’s a perception out there that BigData is only for Big Companies, with needs in the terabytes, petabytes and exabytes of data.  Say, for instance, you are the NSA monitoring millions of voice and data calls every second of every day, you might need BigData.  Or if you are Google, analyzing, in real time, everything that everyone is searching for across the world.

Or a Facebook, trying to recognize a face among a billion other faces.   Sure, all that falls under the BigData’s data crunching portfolio and capabilities. What a lot of companies miss, however, is that those aren’t the only uses of BigData. Small to midsized companies can also make a killing by using BigData – provided they ask the right questions!

Enter IoT, and a 2 Trillion Dollar Opportunity

Cars are no longer merely transportation vehicles; they are internet endpoints, capable of duplex, 2 way communication. Vehicles can produce upwards of 560 GB data per vehicle, per day. The same can be said of almost any device with electronic components. Your fridge? Another internet endpoint. What about your shower, with arguably no electronic components? Still, a potential IoT device, there’s a market just waiting to be exploited.  Of course, anything that ‘moves’ – whether it is your heart, your pulse, or an iPhone on a manufacturing line – are all IoT data generating candidates.

Say, for example, you are Toyota or Honda and are sell a few hundred million cars every year, worldwide. Now, say you want to get crash data from all the Toyotas that get into crashes, on a daily interval; i.e. Toyota would like immediate notification along with a ‘crash dump’ of all on-board data. This is a problem that BigData (combined with IoT) can solve. This IoT plus BigData lethal combo can applied as easily to home appliance breakdowns or power blackouts or even termite infestations as to car crashes. Any product, be it a pill or a pill container – be it a soda can or potato chips –  if it spends any amount of time on a manufacturing line, is a candidate for IoT and BigData.

Anywhere where you can capture and transmit data (via IoT devices), is a potential BigData use case! And that explains the estimated $1.9 Trillion Value Add (Gartner)

And that is why, it is possible, that most companies who think BigData isn’t for them, are not asking the right questions. IoT combined with BigData is opening up opportunities, where none existed. The IoT value add to all industries has been estimated by Gartner, to be a staggering $1.9 Trillion.

AWS IoT - Gartner

Hadoop or Spark?

Large scale data processing engines come in two flavors – Hadoop and Spark.  Hadoop is the industry standard and is the canonical implementation of the famous MapReduce search algorithm.

Spark’s big claim to fame is its real-time data processing capability as compared to MapReduce’s disk-bound, batch processing engine. On Hadoop’s project page, Spark is listed as a module.

Spark has its own page because, while it can run in Hadoop clusters through YARN (Yet Another Resource Negotiator), it also has a standalone mode. The fact that it can run as a Hadoop module and as a standalone solution makes it tricky to directly compare and contrast. However, as time goes on, some big data scientists expect Spark to diverge and perhaps replace Hadoop, especially in instances where faster access to processed data is critical.

Given that most IoT devices transmit tons of streaming data, I foresee Spark as leading the Hadoop

Who will win the BigData War?

Of course, it is hard to predict, however, these companies have a head start.

  • Google BigQuery, TensorFlow and ML
  • Cloudera

Who will win the IoT war?

There are more IoT startups than you can throw a brick at. Some have distinguished themselves

Arduino, Raspberry Pi,

Autonomo, Solar Panel and Long Battery Life

Wio Link

Addendum 1 – Data Mining versus Predictive Analysis

Hadoop’s origins lie in text searches (again, think of searching HTML content across a gazillion webpages for specific keywords). And to this date, a large proportion of ‘data mining’ relies on text mining. However, image searches are gaining in popularity. Facebook can immediately recognize your face in a picture uploaded by a friend, using Machine Learning and Image Recognition.

Addendum 2 – BigData – AI, Machine Learning & Deep Learning

Today, image recognition by machines trained via deep learning in some scenarios is better than humans, and that ranges from cats to identifying indicators for cancer in blood and tumors in MRI scans. Google’s AlphaGo learned the game, and trained for its Go match —  it tuned its neural network —  by playing against itself over and over and over.

Addendum 3 – Where do Teradata, Exadata products fit in?

For structured data, if a company had a choice between Massive Parallel Processors (Teradata, Exadata, etc) and the BigData / Hadoop environment, the MPPs would most likely win. Firstly, on the maturity of the platforms and secondly on the development team required to maintain a BigData environment (not trivial, not cheap).s

Anuj holds professional certifications in Google Cloud, AWS as well as certifications in Docker and App Performance Tools such as New Relic. He specializes in Cloud Security, Data Encryption and Container Technologies.

Initial Consultation

Anuj Varma – who has written posts on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.