Imagine a technology that lets you immediately respond to variations in energy or water consumption – or one that allows monitoring of real-time stock data (up to 100,000 events per second). Imagine the same technology allowing monitoring of an entire manufacturing plant’s floor activities – and raising events/alerts when something goes wrong. These are just some of the applications of Microsoft StreamInsight – a powerful platform that can be used to develop complex event processing (CEP) applications. In this post, I will provide an overview of how StreamInsight can be used to monitor IIS Log Files for events – something that can be used by high-volume, multiple server farm, asp.net websites.
STEPS In creating an end-to-end StreamInsight application
A typical streaminsight app would need an event source, an event stream, a query to read the event stream and an output sink to handle/display the query results.
- Define your input event source (IIS log file etc.) – and write an input adapter for it
- Create an event stream from the event source (inputStream) that can consist of custom datatypes
- Define the query to run against the inputStream- and start the query
- Specify an output sink – Write an output adapter for the output sink
- BIND the query and the output sink – and you should start seeing the output in your output file etc.
An Example – Using StreamInsight to Query IIS Log Files
If we need to query the IIS logs (txt files), then the process is as follows:
The event source would be the IIS log file – and we would write a text file adapter – which would read each line of the log file and create an ‘event’ based on the contents of the line. In addition, we can put custom datatypes inside the event (so – for e.g. – if our log file contains customer firstname, lastname as strings, you could create a ‘Customer’ datatype from these strings – and embed it inside the event).
All these events are sent as part of an ‘input stream’ which looks something like:
Now we are ready to query this stream. A simple example would be querying the stream for all customers with firstName = ‘Anuj’
- Now that the query is defined, we are ready to ‘start’ the query. This essentially instantiates a ‘query’ object which stays running for the duration of the application. This effectively ‘binds’ the query to the input stream.
- The only remaining step is getting the results of the query. For this, we need an ‘output’ adapter. The output can be written to a txt file, csv file etc.
Difference between SQL Server Notification Services (SQLNS) and StreamInsight
SQL Server Notification Services also works with event sources and event targets, but the similarity ends there. In SQL Notification Services, events are stored in the database and matched against subscriptions to produce notifications. The focus in SQLNS is on storing events, matching, and formatting notifications. Until the event is stored, it’s not visible to the infrastructure, only to the event provider.
Event Driven Applications versus Database Applications
|Database Applications||Event Driven Applications|
|Query Paradigm||Ad-Hoc queries or requests||Continuous.’always-on’ queries|
|Latency||Seconds, Hours, Days||Milliseconds or less|
|Data Rate||Hundreds of events/second||Tens of thousands of events per sec|
|Query Semantics||Declarative relational analytics||Declarative relational and temporal analytics|
In StreamInsight, the focus is on processing and querying the data in the input stream, in real-time. The stream processing architecture is lightweight, and almost all of the use cases presented in the docs name “the ability to handle up to 100,000 events per second for a large number of devices” as a goal.
Real Life CEP (StreamInsight) usage Scenarios
The need for high-throughput, low-latency processing of event streams is common to the following business scenarios:
Manufacturing process monitoring and control
Manufacturing Process Monitoring and Control
Manufacturing companies require low-latency data collection and analysis of plant-floor devices and sensors. The typical manufacturing scenario includes the following requirements:
Asset-based monitoring and aggregation of machine-born data.
Sensor-based observation of plant floor activities and output.
Observation and reaction through device controllers.
Ability to handle up to 10,000 data events per second.
Event and alert generation the moment something goes wrong.
Proactive, condition-based maintenance on key equipment.
Low-latency analysis of aggregated data (windowed and log-scales).
An optimal customer experience from a commercial Web site requires low-latency processing of user behavior and interactions at the site. The typical click stream analysis application includes the following requirements:
Ability to drive page layout, navigation, and presentation based on low-latency click stream analysis.
Ability to handle up to 100,000 data events per second during peak traffic times.
Immediate click-stream pattern detection and response with targeted advertising.
Algorithmic Trading in a Financial Services Environment
Algorithmic trading, with its high volume data processing needs, typically has the following requirements:
Ability to handle up to 100,000 data events per second.
Time-critical query processing.
Monitoring and capitalizing on current market conditions with very short windows of opportunity.
Smart filtering of input data.
Ability to define patterns over multiple data sources and over time to automatically trigger buy/sell/hold decisions for assets in a portfolio.
The utility sector requires an efficient infrastructure for managing electric grids and other utilities. These systems typically have the following requirements.
Immediate response to variations in energy or water consumption, to minimize or avoid outages or other disruptions of service.
Gaining operational and environmental efficiencies by moving to smart grids.
Multiple levels of aggregation along the grid.
Ability to handle up to 100,000 events per second from millions of data sources.
StreamInsight Server Architecture
The run-time component of Microsoft StreamInsight is the StreamInsight server. It consists of the core engine and the adapter framework. The adapter framework allows developers to create interfaces to event stores such as Web servers, devices or sensors, and stock tickers or news feeds; and to event sinks such as pagers, monitoring devices, KPI dashboards, trading stations, or databases. Incoming events are continuously streamed into standing queries in the StreamInsight server, which process and transform the data according to the logic defined in each query. The query result at the output can then be used to trigger specific actions.
The following illustration presents a high-level overview of the StreamInsight architecture.
Using Microsoft’s StreamInsight platform for complex event processing, one can develop robust event-driven applications with high-performance and scalability. This post was meant to serve as an overview of the capabilities of the technology – and also provide a real-life example (querying IIS log files across a server farm) using StreamInsight.