Getting Started With Microsoft StreamInsight

Introduction

Imagine a technology that lets you immediately respond to variations in energy or water consumption – or one that allows monitoring of real-time stock data (up to 100,000 events per second). Imagine the same technology allowing monitoring of an entire manufacturing plant’s floor activities – and raising events/alerts when something goes wrong. These are just some of the applications of Microsoft StreamInsight – a powerful platform that can be used to develop complex event processing (CEP) applications.   In this post, I will provide an overview of how StreamInsight can be used to monitor IIS Log Files for events – something that can be used by high-volume, multiple server farm, asp.net websites.

STEPS In creating an end-to-end StreamInsight application

A typical streaminsight app would need an event source, an event stream, a query to read the event stream and an output sink to handle/display the query results.

  1. Define your input event source (IIS log file etc.) – and write an input adapter for it
  2. Create an event stream from the event source (inputStream) that can consist of custom datatypes
  3. Define the query to run against the inputStream- and start the query
  4. Specify an output sink – Write an output adapter for the output sink
  5. BIND the query and the output sink – and you should start seeing the output in your output file etc.

An Example – Using StreamInsight to Query IIS Log Files

If we need to query the IIS logs (txt files), then the process is as follows:

The event source would be the IIS log file – and we would write a text file adapter – which would read each line of the log file and create an ‘event’ based on the contents of the line. In addition, we can put custom datatypes inside the event (so – for e.g. – if our log file contains customer firstname, lastname as strings, you could create a ‘Customer’ datatype from these strings – and embed it inside the event).

All these events are sent as part of an ‘input stream’ which looks something like:

Code Snippet
  1. var inputstream = CepStream<Customer>.Create(“inputStream”,typeof(MyInputAdapterFactory),new InputAdapterConfig { someFlag = true },EventShape.Point);

Now we are ready to query this stream. A simple example would be querying the stream for all customers with firstName = ‘Anuj’

Code Snippet
  1. var filtered = from e in inputstream where e.FirstName = ‘Anuj’ select e;
  1. Now that the query is defined, we are ready to ‘start’ the query. This essentially instantiates a ‘query’ object which stays running for the duration of the application. This effectively ‘binds’ the query to the input stream.
  2. The only remaining step is getting the results of the query. For this, we need an ‘output’ adapter. The output can be written to a txt file, csv file etc.

Difference between SQL Server Notification Services (SQLNS) and StreamInsight

SQL Server Notification Services also works with event sources and event targets, but the similarity ends there. In SQL Notification Services, events are stored in the database and matched against subscriptions to produce notifications. The focus in SQLNS is on storing events, matching, and formatting notifications. Until the event is stored, it’s not visible to the infrastructure, only to the event provider.

Event Driven Applications versus Database Applications

  Database Applications Event Driven Applications
Query Paradigm Ad-Hoc queries or requests Continuous.’always-on’ queries
Latency Seconds, Hours, Days Milliseconds or less
Data Rate Hundreds of events/second Tens of thousands of events per sec
Query Semantics Declarative relational analytics Declarative relational and temporal analytics

In StreamInsight, the focus is on processing and querying the data in the input stream, in real-time. The stream processing architecture is lightweight, and almost all of the use cases presented in the docs name “the ability to handle up to 100,000 events per second for a large number of devices” as a goal.

Real Life CEP (StreamInsight) usage Scenarios

The need for high-throughput, low-latency processing of event streams is common to the following business scenarios:

  • Manufacturing process monitoring and control

  • Clickstream analysis

  • Financial services

  • Power utilities

  • Health care

  • IT monitoring

  • Logistics

  • Telecom

Manufacturing Process Monitoring and Control

Manufacturing companies require low-latency data collection and analysis of plant-floor devices and sensors. The typical manufacturing scenario includes the following requirements:

  • Asset-based monitoring and aggregation of machine-born data.

  • Sensor-based observation of plant floor activities and output.

  • Observation and reaction through device controllers.

  • Ability to handle up to 10,000 data events per second.

  • Event and alert generation the moment something goes wrong.

  • Proactive, condition-based maintenance on key equipment.

  • Low-latency analysis of aggregated data (windowed and log-scales).

Clickstream Analysis

An optimal customer experience from a commercial Web site requires low-latency processing of user behavior and interactions at the site. The typical click stream analysis application includes the following requirements:

  • Ability to drive page layout, navigation, and presentation based on low-latency click stream analysis.

  • Ability to handle up to 100,000 data events per second during peak traffic times.

  • Immediate click-stream pattern detection and response with targeted advertising.

Algorithmic Trading in a Financial Services Environment

Algorithmic trading, with its high volume data processing needs, typically has the following requirements:

  • Ability to handle up to 100,000 data events per second.

  • Time-critical query processing.

  • Monitoring and capitalizing on current market conditions with very short windows of opportunity.

  • Smart filtering of input data.

  • Ability to define patterns over multiple data sources and over time to automatically trigger buy/sell/hold decisions for assets in a portfolio.

Power Utilities

The utility sector requires an efficient infrastructure for managing electric grids and other utilities. These systems typically have the following requirements.

  • Immediate response to variations in energy or water consumption, to minimize or avoid outages or other disruptions of service.

  • Gaining operational and environmental efficiencies by moving to smart grids.

  • Multiple levels of aggregation along the grid.

  • Ability to handle up to 100,000 events per second from millions of data sources.

StreamInsight Server Architecture

The run-time component of Microsoft StreamInsight is the StreamInsight server. It consists of the core engine and the adapter framework. The adapter framework allows developers to create interfaces to event stores such as Web servers, devices or sensors, and stock tickers or news feeds; and to event sinks such as pagers, monitoring devices, KPI dashboards, trading stations, or databases. Incoming events are continuously streamed into standing queries in the StreamInsight server, which process and transform the data according to the logic defined in each query. The query result at the output can then be used to trigger specific actions.

The following illustration presents a high-level overview of the StreamInsight architecture.

StreamInsight 

Conclusion

Using Microsoft’s StreamInsight platform for complex event processing, one can develop robust event-driven applications with high-performance and scalability. This post was meant to serve as an overview of the capabilities of the technology – and also provide a real-life example (querying IIS log files across a server farm) using StreamInsight.

References : Raising Events from a PUSH Source

http://blogs.msdn.com/b/masimms/archive/2010/08/10/building-your-first-end-to-end-streaminsight-application.aspx

Cloud Advisory Services | Security Advisory Services | Data Science Advisory and Research

Specializing in high volume web and cloud application architecture, Anuj Varma’s customer base includes Fortune 100 companies (dell.com, British Petroleum, Schlumberger).

All content on this site is original and owned by AdverSite Web Holdings, Inc. – the parent company of anujvarma.com. No part of it may be reproduced without EXPLICIT consent from the owner of the content.

Anuj Varma – who has written posts on Anuj Varma, Technology Architect.


1 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *