While Austinites were enjoying sunny skies this weekend, I had my head stuck in a cloud. More specifically, Microsoft Azure Cloud.

Saturday morning, as I am easing into my weekend, I get a call from a startup in Seattle. They have an important customer demo coming up – their application (cloud based) needs to scale to support 15000 users.  Currently, a 2000 user load brings the (cloud based) site to a crawl. They have a regular n-Tier app – with a slight twist. On their service layer, they are using Azure (Microsoft’s Service Bus) for various tasks – including session storage. Oh – and they would like to address all performance issues – and scale up to 15,000 users – by the end of this weekend.

I am a sucker for nail biting finishes – so I took them up on the challenge. For those who want to skip to the executive summary, here it is :

  1. Azure (AppFabric) turned out to be remarkably scalable. This was my first experience in scaling Azure up to support 10K plus users – and it was a positive one.
  2. Bottlenecks in YOUR application code (for example – your WCF (or ASMX) service requests that are long-running), will end up propagating to the App Fabric – and may look like Azure is the source of the bottleneck.
  3. If you are going to use Azure, your biggest challenge is going to be ensuring network bandwidth (between your deployed production servers and Azure). If this bandwidth suffers even occasional blips, you are in for a lot of heartache – including lost user sessions, unresponsive pages and even a fully hosed IIS.

For the ‘more detail’ junkies amongst you (I know who you are) – here’s a more technical, step by step guide of the troubleshooting that was performed.

Summary

The Microsoft Cloud offering (aka Azure aka AppFabric) is  something to reckon with for 2 reasons:

a) Ease of configurability (I doubt if any cloud offering can make it any easier than tweaking app.config files)

b) Scalability and Performance – While there are a few things Microsoft needs to address (such as better handling of the AppFabric cache’s ‘session locking’), overall, if your application doesn’t have any slow, unresponsive pieces, then the AppFabric itself will enable you to scale in hitherto, impossible ways. 

Update

Am happy to report that, while they are still running more load tests, the application seems to be holding up well under a 10,000 user load. This is a 5 fold increase from the original 2000 that was bringing the website to a crawl.

Anuj holds professional certifications in Google Cloud, AWS as well as certifications in Docker and App Performance Tools such as New Relic. He specializes in Cloud Security, Data Encryption and Container Technologies.

Initial Consultation

Anuj Varma – who has written posts on Anuj Varma, Hands-On Technology Architect, Clean Air Activist.