WCF - Client Proxy Creation Performance with Ninject

This week, I’ve been working on improving the performance of a Web application as we scale it out to more users.

I’ve always used Ants Profiler for this kind of work, but I decided to try out Jet Brain’s dotTrace, because it’s got better support for attaching to remote systems (and it’s cheaper).

Aside from the usual things like caching, simplification of database queries and using multi-threading whenever possible, I spotted a large volume of CPU cycles owned by the DI framework (Ninject) which turned out to be being spent on the creation of WCF clients.

At thebigword, we publish internal libraries up to an internal (private) NuGet server. As part of service development, we also publish the service interface as part of each package, so that other developers in the team can find, use and update them easily.

In the case of using an internal WCF service, an MVC Web application would normally use constructor injection in Ninject [0] to inject WCF clients into Manager classes, which are then, in turn, injected into MVC controllers. This allows testing of interactions between application layers with the use of mocks (we mostly use NSubstitute for this) and is a simpler build process than using the SvcUtil.exe to create a class which inherits from ClientBase to create a channel.

https://github.com/ninject/Ninject/wiki/Injection-Patterns [0]

However, as per Wenlong Dong’s blog points out [1] the ClientBase<T> has the additional benefit of caching the ChannelFactory object to improve performance.

http://blogs.msdn.com/b/wenlong/archive/2007/10/27/performance-improvement-of-wcf-client-proxy-creation-and-best-practices.aspx [1]

To replicate this, I tried using a singleton ChannelFactory<T> and creating channels as I needed in Ninject by binding to a Method, but the RAM consumption went crazy, because the channels were never disposed by Ninject, (the singleton ChannelFactory<T> was still in scope and was retaining a reference to the Channel).

I needed a different approach.

Luís Gonçalves’s blog has a great series of posts on his approach to solving the same problem [2], but it was written with a different version of Ninject and only supported the WCF configuration being set in the Web.Config which I don’t use, so I made a few amendments and a test harness. I also found the code quite hard to understand, not being really that familiar with the internals of Ninject, so I chopped it down to a simple solution which uses a couple of extension methods to Ninject and a cache. Here’s the background methods:

https://luisfsgoncalves.wordpress.com/2012/02/28/mixin-up-ninject-castle-dynamic-proxy-and-wcf-part-iii/ [2]

// Modified from examples at luisfsgoncalves.wordpress.com/2012/02/28/mixin-up-ninject-castle-dynamic-proxy-and-wcf-part-iii/ 
// to use a channel factory cache.
public static class ToWcfClientExtensions
{
    public static IBindingWhenInNamedWithOrOnSyntax<T> ToWcfClient<T>(this IBindingToSyntax<T> syntax) where T : class
    {
        return syntax.ToMethod(ctx => ctx.Kernel
            .Get<ProxyGenerator>()
            .CreateInterfaceProxyWithoutTarget<T>(new WcfProxyWithDisposalInterceptor<T>()));
    }
}

public class WcfProxyWithDisposalInterceptor<TInterface> : IInterceptor where TInterface : class
{
    void IInterceptor.Intercept(IInvocation invocation)
    {
        if (invocation.Method.Name.Equals("Dispose", StringComparison.Ordinal))
        {
            throw new InvalidOperationException("Dispose invoked on WcfProxyWithDisposalInterceptor");
        }
    
        // I use a global ChannelFactory cache instead of caching using the Configuration System.
        using (var channel = (IDisposable)ChannelFactoryCache.CreateChannel<TInterface>())
        {
            invocation.ReturnValue = InvokeMethod(invocation, channel, invocation.Arguments);
        }
    }

    private static object InvokeMethod(IInvocation invocation, object channel, object[] arguments)
    {
        try
        {
            return invocation.Method.Invoke(channel, arguments);
        }
        catch (TargetInvocationException ex)
        {
            // Preserve stack trace
            var stackTrace = typeof(Exception).GetField("_remoteStackTraceString", BindingFlags.Instance | BindingFlags.NonPublic);
            if (stackTrace != null)
            {
                stackTrace.SetValue(ex.InnerException, ex.InnerException.StackTrace + Environment.NewLine);
            }
            throw ex.InnerException;
        }
    }
}

public static class ChannelFactoryCache
{
    private static Dictionary<Type, Tuple<EndpointAddress, object>> cache = new Dictionary<Type, Tuple<EndpointAddress, object>>();

    public static void Add<TInterface>(Uri uri, Binding binding, List<IEndpointBehavior> behaviors)
    {
        var endpointAddress = new EndpointAddress(uri);

        var factory = new ChannelFactory<TInterface>(binding);

        if (behaviors != null)
        {
            foreach (var behavior in behaviors)
            {
                factory.Endpoint.Behaviors.Add(behavior);
            }
        }

        if (!cache.ContainsKey(typeof(TInterface)))
        {
            cache.Add(typeof(TInterface), new Tuple<EndpointAddress, object>(endpointAddress, factory));
        }
    }

    public static TInterface CreateChannel<TInterface>() where TInterface : class
    {
        var data = cache[typeof(TInterface)];

        return (data.Item2 as ChannelFactory<TInterface>).CreateChannel(data.Item1) as TInterface;
    }
}

And here’s how you’d configure Ninject:

private static void RegisterServices(IKernel kernel)
{
    // The ProxyGenerator is part of Castle and will emit code at runtime.  This code needs to be cached
    // or there will be zero improvement in performance.  It's cached by default, per instance of the 
    // ProxyGenerator.
    kernel.Bind<ProxyGenerator>()
        .ToConstant(new ProxyGenerator());
    
    // The ChannelFactoryCache will cache the creation of the ChannelFactory, which is slow because it 
    // requires the use of reflection.
    ChannelFactoryCache.Add<IWebServiceInterface>(endpoint, GetBinding(endpoint), null);
    
    // Setup Ninject to provide instances of a dynamically generated class which will create instances of 
    // a WCF channel on the fly.  See the ToWcfClient extension method.
    kernel.Bind<IWebServiceInterface>()
        .ToWcfClient();
}

Just to give you an idea of what to expect by this change, I’ve graphed of the performance of creating 5000 instances of a WCF client in the approach we were using before, and using the cached version. Black is cached, red isn’t. I was surprised by just how slow creating a client for the toy WCF contract in the test harness actually is.

You can see it took about 120ms to create 5000 proxies on my machine with caching, and over 1600ms without.

The LinqPad script test harness is at:

http://share.linqpad.net/xnjcgc.linq