Scaling Websockets

Websockets

Someone recently asked me about scaling web applications which use websockets and I realized I wasn’t sure how that could be architected. I recognized the challenges, and the word “Redis” popped into my brains, but I didn’t have a solution at hand. So this post is a discovery on why websockets need a bit of extra consideration before attempting to scale horizontally (adding more application instances).

Often, to scale a web application across multiple instances, it may be sufficient to deploy another application instance, configure a load balancer to point to each of the instances, and away you go. (I’ve done this a bunch of times). But, because websocket connections are more “persistent” than “normal” HTTP sessions, a few considerations need to happen on the both the load balancer (or proxy) and the application itself.

Environment

For this example I am using the following architecture:

A Docker container running nginx. With nginx configured as a load balancer.
Two Docker containers, each running an instance of a Python web application built using Flask and Flask-SocketIO.
A Docker container running Redis
Two docker containers running a Python script which acts like a “client” and connects to the application, via the proxy.
This is all spun up and managed inside of Docker, via docker-compose

Proxy Configuration

Load Balancing nginx

A simple example of load balancing with nginx can be configured as such:

upstream websockets-webapp {
    server webapp-1;
    server webapp-2;
}

In this case, each HTTP request will be round-robin’d to each of the servers configured. But, per the docs):

Please note that with round-robin or least-connected load balancing, each subsequent client’s request can be potentially distributed to a different server. There is no guarantee that the same client will be always directed to the same server.

This is fine for many applications, but won’t work for websockets. Because websockets have a more persistent connection, and are full-duplex, a client request may end up on a server in which no established session is in place and will be rejected as invalid. So, we need to make the sessions persistent, or “sticky” using the ip_hash mechanism in nginx.

With ip-hash, the client’s IP address is used as a hashing key to determine what server in a server group should be selected for the client’s requests. This method ensures that the requests from the same client will always be directed to the same server except when this server is unavailable.

An alternative is to use the hash $remote_addr mechanism. This post describes the merits of either. I am using this to better spread the “clients” across the backend for testing and demonstration purposes.

So, now the proxy is load balancing NEW clients across all instances.

Processing Websockets with nginx

In order for nginx to process websockets, we need to update the configuration with the following lines:

proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;

proxy_http_version 1.1;: Ensure HTTP/1.1 is used for the proxy connection (vs the default 1.0)
proxy_set_header Upgrade $http_upgrade;: Add a header (Upgrade: TBD). Per the nginx documentation

.. “Upgrade” is a hop-by-hop header, it is not passed from a client to proxied server

proxy_set_header Connection "Upgrade";: Per the documentation:

hop-by-hop headers including “Upgrade” and “Connection” are not passed from a client to proxied server, therefore in order for the proxied server to know about the client’s intention to switch a protocol to WebSocket, these headers have to be passed explicitly.

If you check out the demo code, you can view the /headers endpoint to see the headers coming across for the requests.

Application Configuration

Message Queue

In order to cross-communicate in a pub/sub construct between application instances, we can use an in memory key-value cache like Redis, or a message streaming tool like Kafka. This is outlined in the Flask-SocketIO docs. In this example, we can use the Redis container to act as a message queue. This doc from socket.io has a great diagram visualizing how the external queue is used to communicate between instance, as well as enables additional external services to communicate with the websockets.

Failover

A bit out of scope for this post, but worth mentioning is application instance failure. In the case of an instance failure, the load balancer will re-route the connection to a new backend instance. But, this backend instance will not have the OTHER instance’s session information. For example, room joins. So, any client joins that occurred previously, are now gone. I haven’t quite cracked the nut on this yet, but with the pub/sub capabilities of Redis, I am cautiously optimistic that I can persist join information in such a way that re-connections automatically rejoin the rooms. OPPORTUNITY FOR FUTURE POST!

Conclusion

Websockets are awesome, but more complex than old-school HTTP or even AJAX type polling. The complexity adds a bit of fragility that needs some extra attention and consideration. Luckily, websockets have been around long enough now that packages and patterns exist for the popular languages and platforms. With a bit of investigation and thoughtfulness, it appears to be fairly straightforward to architect a scalable solutions to support applications with websockets.

Code

Check out https://github.com/briangreunke/load-balanced-websockets for a working code example of the details I discussed previously!