Thoughts on Co-WIN website

Before we begin, this post is based on my limited research and observations. It might not be accurate.

The Co-WIN portal started accepting registration for 18+ year old people on 28th April at 4 PM. Just within a few hours, the portal received over 13 Million registrations.

Normally government applications deployed on government-managed servers would timeout and become unavailable when they receive a few thousand concurrent requests (remember NEET results).

Doing some research

Inspecting the source in the browser developer console, it looks like it a front-end web application (possibly react) that is consuming a lot of APIs. So this is built on microservice architecture.

Running nslookup gives the following results:

We can see a bunch of cname DNS records. elb.amazonaws.com looks like the endpoint of the elastic load balancer in AWS and K8s would indicate a Kubernetes backend. This would mean that the front-end web application is deployed on Kubernetes clusters behind an HTTP(S) load balancer.

These are the asynchronous cross-origin requests:‌

Each registration and appointment booking would send around 30 API requests.

Running nslookup of the API domain:‌

Cloudfront is the edge caching service of AWS, so Cloudfront is used to temporarily cache the dynamic data from the API backends.

How did it handle millions of concurrent requests without crashing?

From the above details, we can see that the portal and all the APIs are deployed on the Cloud (AWS) and optimization techniques such as edge caching are used.

It did not crash because of one of the most powerful features of the cloud which is Auto Scaling.

To explain Auto Scaling in simple terms, let's assume a single server instance can handle 1000 requests (usually connections) concurrently. On a normal day, you would have about 10,000 concurrent requests which would require 10 instances. When you have a surge in traffic (like when registrations open) and you end up with ~1 million concurrent requests, you would need 1000 server instances to handle those requests. Auto Scaling is the ability of the cloud to automatically scale up the instances when the traffic increases (for a few hours when registrations are opened) and scale down when traffic decreases. This ensures that the server is available to handle surges and cuts down the cost when traffic decreases.

If you want to know more about the APIs used here, you can check my post on Co-WIN open APIs.