Real-time Web with SignalR

For the last couple of years, one of my teams have been working on building a complete suite of translation technology tools (e.g. file conversion, translation memory, terminology databases) and business process management (workflow, invoicing, payment) tools.

To keep the web-based UI responsive to server events rather than waiting for a polling timer to fire, we use a library called SignalR which abstracts a number of real-time Web technologies, including Web Sockets, Server Sent Events and long polling (depending on server and client support), commonly grouped together under the name of “comet”.

When a web browser or .Net client connects to a SignalR endpoint, the server can “push” events back up to the client Web browser. This allows the server application to push updates to the client application (e.g. Web browser) as they happen.

For my teams, normal use of this technology involves converting messages from our messaging infrastructure into SignalR messages for delivery to relevant connected clients. We use SignalR for all sorts of things that we would have used polling for in the past, updates on percentage completion of background processing tasks, task completion statuses, new chat messages, real-time previews etc.

SignalR with Load Balancing

If you have load balanced Web Servers as per the diagram below, connections can be made to any one of the Web servers, so all of them need to be kept up-to-date with SignalR information received from other Web servers in the cluster. To do this, SignalR uses a “backplane” - a system that passes the messages to other servers. There are a number of different choices, each with strengths and weaknesses.

SQL Server

Requires Service Broker to be running on the database system.
More complexity in database infrastructure.
Rejected - Web Servers would need to be able to connect to the database cluster which isn’t possible in our security model.

RabbitMQ

Requires the use of a custom RabbitMQ exchange type to increment message ids, so setup is more complicated.
Code at https://github.com/mdevilliers/SignalR.RabbitMq
I contacted the author with a number of questions about the reason for taking on the complexity of writing a custom RabbitMQ exchange in Erlang when another RabbitMQ backplane didn’t use it, but I’m convinced that it’s the correct decision. Implementations of RabbitMQ backplanes which don’t use this design are likely to drop messages.
Rejected - Deployment and support is more complex due to the RabbitMQ exchange installation requirement.

Redis

A single instance of Redis becomes a single point of failure, so a cluster may need to be setup using Sentinel depending on how critical real-time updates are.
There is some basic resilience built-in, since SignalR will attempt to reconnect to Redis on failure, see source code at https://github.com/SignalR/SignalR/blob/master/src/Microsoft.AspNet.SignalR.Redis/RedisMessageBus.cs
Normally runs on Linux, but can also run on Windows since Microsoft support it via their Open Source bridge http://msopentech.com/opentech-projects/redis/
Chosen - Developed by the SignalR team, easy to deploy, high performance.

Troubleshooting

Detecting Failure

SignalR negotiates a method of communication depending on the availability of server and client support. If the browser fails to negotiate an appropriate connection, a failure occurs which is visible to the end-user as a lack of expected UI updates.

It’s visible to the Web application developer / tester as a failed HTTP request in the Browser Developer Tools. Check out the failed call to the “/signalr” endpoint. (In this scenario, it’s usually that the SignalR endpoint failed to initialise.)

The SignalR API has connection lifecycle events, so depending on the application, it may be worth checking that failure to connect to SignalR results in some user-visible result, e.g. notifications windows being grayed out.

Monitoring

For monitoring availability of SignalR endpoints, I use a .Net client test harness which looks a bit like this:

async void Main()
{
    using(var connection = new HubConnection("https://xxxxxxxxxxxxx/signalr"))
    {
        var proxy = connection.CreateHubProxy("tracehub");
                
        string[] functions = new string[] { 
            "ping",
        };
        
        foreach(var function in functions)
        {       
            proxy.On(function, () => Console.WriteLine(function));
        }
        
        await connection.Start();
        
        await proxy.Invoke("requestPing", new Guid("xxxxxxxxxxxx-xxxxxxxx-xxx"));
        
        while(true)
        {
            await Task.Delay(new TimeSpan(0, 0, 1));
        }
    }
}

Troubleshooting Tips

Enable logging.

The overview at [0] tells you everything you need to know. For Web applications, you need to follow the steps at: Logging server events to text files.

http://www.asp.net/signalr/overview/testing-and-debugging/enabling-signalr-tracing [0]

Do check the initialization logs after an app restart, SignalR uses an Owin lifecycle, not the ASP.Net lifecycle, so it’s possible that SignalR never started up due to missing required libraries, incorrectly configured dependency injection etc.

Enabling tracing in the JavaScript client.

Minimise Complexity.

Create a simple test harness to use, like the SignalR chat sample application, to check whether your SignalR code is at fault, or if it’s the surrounding infrastructure.

Disable any nodes you can, e.g. switch off other Web servers if you can.

Bypass load balancers by modifying host files to rule them out.

Test your Redis backplane.

If you’re using Redis, ping Redis using the command line tools.

Subscribe to the hub using the command line, check out the end of this post:

http://www.asp.net/signalr/overview/performance/scaleout-with-redis

Test scaleout against a new single node Redis server in the same environment, rather than Redis cluster.

Performance

Backplane Performance

The Redis benchmarks don’t test the pub/sub functions of Redis, so I created a test harness using the Redis Message Bus code to guide implementation:

https://github.com/SignalR/SignalR/blob/master/src/Microsoft.AspNet.SignalR.Redis/RedisMessageBus.cs

You can download the test harness at:

http://share.linqpad.net/jvt9xh.linq

HTTP Performance

When we implemented SignalR, our support team started to receive complaints from some users about slow system performance which we couldn’t reproduce or see any evidence of in our monitoring. This turned out to be users who had a large number of browser tabs open at once.

Web browsers limit the number of concurrent HTTP connections to a single domain, and each tab was maintaining a connection to the SignalR endpoint, making each subsequent tab a bit slower than the last, until at last, the browser completely refused to make any more connections. The best workaround is to put SignalR traffic onto a subdomain of its own, hook into the connection lifecycle events in the SignalR javascript API to detect connection failures and consider warning the user that updates will not be received.