Scaling Out Wolverine: What I Learned Coming from Rebus and NServiceBus

by Brad Jolicoeur

04/12/2026

I've been exploring Wolverine as a potential alternative for my usual messaging stack. Rebus for most projects, NServiceBus when the budget allows it. The integration with Marten alone makes Wolverine compelling, but before I commit to using it in production, I need to understand how it scales. Not the marketing speak about "massively scalable cloud-native architectures," but the practical reality. When do I add more instances, what knobs do I turn, and how do I know it's working?

Coming from NServiceBus and Rebus, I have a mental model for how message-based systems scale. Both use competing consumers on queues, both let you tune concurrency, and both give you metrics (if you pay for them or wire them up yourself). Wolverine takes a different approach in some interesting ways. Some better, some just different. This article is what I've learned so far.

The key insight: Wolverine gives you more control than Rebus or NServiceBus, but that flexibility becomes a trap if you don't understand the layered parallelism model. The framework will tell you exactly when you need more capacity, if you let it.

When Should You Actually Scale Out?

Let's start with the obvious question: when do you even need to think about scaling out your Wolverine application?

When should you scale? "When things get slow" is too vague. Here's what I actually look for:

Message queue depth is consistently growing. If your inbox or external queue (RabbitMQ, Azure Service Bus, whatever) has messages piling up faster than you can process them, you have a throughput problem. One instance can't keep up with the incoming rate.

CPU or memory pressure on your application nodes. If your handlers are CPU-bound (heavy computation, lots of serialization) or memory-bound (large object graphs, big datasets), adding horizontal instances spreads that load.

Latency targets are slipping. You promised the business that orders get processed within 5 seconds, but you're averaging 12 seconds. Time to scale.

You're hitting database connection limits. This one surprised me with Wolverine + Marten. If your handlers are doing a lot of database work and you're maxing out your connection pool, adding instances can help. But only if your database can handle the additional concurrent connections. Don't blindly scale out and then wonder why PostgreSQL is rejecting connections.

The key insight: don't scale prematurely. A single Wolverine instance with proper concurrency configuration can handle significant load. I've seen a single instance process thousands of messages per second with the right setup. Add complexity (more instances, orchestration, health checks) only when you have evidence you need it.

How Wolverine Scales: The Core Mechanisms

Wolverine's scaling model is built on a few core concepts. Understanding these is critical before you start turning knobs.

Competing Consumers

This is the foundation. Multiple instances of your Wolverine application can all listen to the same external queue (RabbitMQ, Azure Service Bus, SQS, etc.). The transport broker ensures that each message is delivered to only one instance. This is standard message bus behavior: Rebus and NServiceBus do the same thing.

Where Wolverine gets interesting is its local queue model.

Local Queues and Parallelism

Every Wolverine application has internal, in-memory local queues. When a message arrives from an external transport or is published in-process, it lands in a local queue before being dispatched to a handler. These local queues are TPL Dataflow queues under the hood, and they support parallel execution.

By default, Wolverine allows parallel processing on local queues, with the maximum number of parallel threads set to the number of processors on the machine. For a 4-core machine, that's 4 messages potentially executing in parallel.

You can tune this per queue:

builder.Services.AddWolverine(opts =>
{
    opts.LocalQueue("high-throughput")
        .MaximumParallelMessages(20);
    
    opts.LocalQueue("sequential-processing")
        .Sequential(); // Force single-threaded execution
});

This is different from NServiceBus's MaximumConcurrencyLevel or Rebus's MaxParallelism, but it accomplishes the same thing: controlling how many messages execute simultaneously.

Listener Counts (Transport-Level Parallelism)

Here's where Wolverine diverges from my Rebus mental model. With external transports like RabbitMQ or Azure Service Bus, Wolverine lets you configure multiple parallel listeners per queue. This is distinct from the local queue parallelism.

A listener pulls messages from the external broker. If you configure .ListenerCount(5), Wolverine spins up 5 concurrent listeners, each pulling messages independently. This increases throughput from the broker to your application.

builder.Services.AddWolverine(opts =>
{
    opts.UseRabbitMq("host=rabbitmq")
        .AutoProvision();
    
    opts.ListenToRabbitQueue("orders")
        .PreFetchCount(100)
        .ListenerCount(5) // 5 parallel listeners
        .MaximumParallelMessages(10); // Each message executes with up to 10 parallel handlers
});

This layered parallelism (listener count at the transport level, MaximumParallelMessages at the local queue level) gives you fine-grained control. It's powerful, but it's also easy to misconfigure. Too many listeners with high prefetch counts can overwhelm your local queues or your database.

Wolverine vs. Rebus vs. NServiceBus: Scaling Model Differences

Let me frame this in terms I understand from my Rebus and NServiceBus experience.

Rebus

Rebus has a simple mental model: MaxParallelism controls how many messages execute concurrently on a single instance. If you need more throughput, you scale out by running more instances. Rebus uses competing consumers on the transport (RabbitMQ, Azure Service Bus, SQL Server queues).

Rebus doesn't have a built-in inbox/outbox pattern. You can implement it yourself with sagas and custom handlers, but it's not a framework feature. This makes scaling simpler. There's no database polling to coordinate across instances, but it also means you don't get exactly-once processing guarantees without extra work.

Wolverine's advantage: The durable inbox/outbox is built-in and transparent. You get exactly-once semantics without custom handler logic. The tradeoff is that the database becomes a scaling bottleneck if you're not careful.

NServiceBus

NServiceBus has MaximumConcurrencyLevel (similar to Wolverine's MaximumParallelMessages) and supports competing consumers across multiple instances. It also has an optional outbox for exactly-once processing, which uses the same database as your business data (like Wolverine + Marten).

NServiceBus's scaling model is mature. ServiceControl and ServicePulse give you real-time monitoring, and the documentation is exhaustive. You pay for it with licensing costs and a steeper learning curve.

Wolverine's advantage: It's open source, it integrates seamlessly with Marten for event sourcing, and the configuration is leaner. The disadvantage is that observability tooling is still catching up. You need to wire up OpenTelemetry yourself.

Wolverine's Unique Angle: Layered Parallelism

Both Rebus and NServiceBus have a single concurrency setting. Wolverine gives you two: listener count (transport-level) and MaximumParallelMessages (local queue-level). This is more flexible, but it's also easier to misconfigure.

For example, if you set .ListenerCount(10) and .MaximumParallelMessages(20), you could theoretically have 200 messages in flight (10 listeners × 20 parallel executions). If your handlers are doing database writes, you might saturate your connection pool or create lock contention in PostgreSQL.

The lesson I learned: start conservative. Use ListenerCount(1) and tune MaximumParallelMessages first. Once you've maxed out a single listener's throughput, add more listeners. Understanding this layered control is what separates Wolverine from the simpler concurrency models I'm used to. You get precision, but only if you know what each layer actually does.

Durable Inbox/Outbox and Scale

Wolverine's transactional inbox and outbox are critical for production systems. They guarantee exactly-once processing semantics even if your application crashes mid-handler. But they also impact how you scale.

The durable inbox means messages are persisted to your database (PostgreSQL with Marten, SQL Server, or others) before being processed. The durability agent (a background service within each Wolverine instance) polls the inbox for unprocessed messages and dispatches them to handlers.

When you scale out to multiple instances, each instance runs its own durability agent. They all query the same inbox tables. Wolverine uses row-level locking to ensure only one instance picks up a given message. This is competing consumers at the database level.

The implication: your database becomes a coordination point. If you scale out to 10 instances, you now have 10 durability agents polling the inbox. Make sure your database can handle it.

For high-throughput scenarios, consider using external transports (RabbitMQ, Azure Service Bus) for the actual message delivery and reserve the durable inbox for critical workflows where exactly-once semantics are non-negotiable.

Configuration Deep Dive: The Knobs You Actually Turn

Let's get practical. Here are the configuration options I've found most useful when tuning Wolverine for scale.

RabbitMQ Example: High-Throughput Order Processing

builder.Services.AddWolverine(opts =>
{
    opts.UseRabbitMq("host=rabbitmq")
        .AutoProvision();
    
    // High-volume queue with multiple listeners
    opts.ListenToRabbitQueue("orders")
        .PreFetchCount(100)          // Pull 100 messages at a time from RabbitMQ
        .ListenerCount(5)             // 5 parallel listeners
        .MaximumParallelMessages(10)  // 10 concurrent handlers per listener
        .CircuitBreaker(cb =>
        {
            cb.PauseTime = 1.Minutes();
            cb.FailurePercentageThreshold = 25;
        });
    
    // Critical queue with durability
    opts.ListenToRabbitQueue("payments")
        .UseDurableInbox()            // Persist to database for exactly-once
        .ListenerCount(3)             // Fewer listeners, we prioritize correctness
        .MaximumParallelMessages(5);  // Lower concurrency to reduce database contention
});

Azure Service Bus Example: Durable Listeners

builder.Services.AddWolverine(opts =>
{
    opts.UseAzureServiceBus(connectionString)
        .AutoProvision()
        .ConfigureListeners(listener =>
        {
            // Apply to all Azure Service Bus listeners
            listener.UseDurableInbox(new BufferingLimits(500, 100));
        });
    
    opts.ListenToAzureServiceBusQueue("shipments")
        .ListenerCount(3)
        .MaximumParallelMessages(8);
});

Local Queues: In-Process Message Routing

builder.Services.AddWolverine(opts =>
{
    opts.LocalQueue("background-jobs")
        .MaximumParallelMessages(20)
        .UseDurableInbox(); // Persist local messages for exactly-once
    
    opts.LocalQueue("reporting")
        .Sequential() // Single-threaded for ordered processing
        .UseDurableInbox();
    
    // Route specific message types to specific queues
    opts.PublishMessage<GenerateReport>()
        .ToLocalQueue("reporting");
    
    opts.PublishMessage<ProcessImage>()
        .ToLocalQueue("background-jobs");
});

Conventional Configuration Across All Listeners

If you have many queues and want to apply consistent configuration, use conventional policies:

builder.Services.AddWolverine(opts =>
{
    opts.UseRabbitMq("host=rabbitmq").AutoProvision();
    
    // Apply to all listeners based on message namespace
    opts.Policies.ConfigureListeners((listener, context) =>
    {
        if (context.MessageType.IsInNamespace("MyApp.Messages.Important"))
        {
            listener.UseDurableInbox().ListenerCount(5);
        }
        else
        {
            listener.ListenerCount(3);
        }
    });
});

This is one of those Wolverine features that feels like magic until you understand it. The conventions eliminate repetitive configuration across dozens of message types.

Observability: Knowing When to Scale

Here's where I hit my first real frustration with Wolverine: out-of-the-box observability isn't as rich as NServiceBus's monitoring tools. NServiceBus ships with ServicePulse, a web UI that shows you queue depths, processing times, failure rates, and historical trends. It's a paid product, but it's also a solved problem.

Wolverine doesn't have an equivalent yet. But it does have solid OpenTelemetry support, and that's actually better for most production scenarios.

OpenTelemetry Integration

Wolverine emits OpenTelemetry traces and metrics for message execution. If you're already using an observability stack (Jaeger, Zipkin, Application Insights, Datadog, Grafana), you can plug Wolverine in:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing =>
    {
        tracing
            .AddAspNetCoreInstrumentation()
            .AddSource("Wolverine") // Adds Wolverine spans
            .AddOtlpExporter(options =>
            {
                options.Endpoint = new Uri("http://otel-collector:4317");
            });
    })
    .WithMetrics(metrics =>
    {
        metrics
            .AddAspNetCoreInstrumentation()
            .AddMeter("Wolverine") // Adds Wolverine metrics
            .AddPrometheusExporter();
    });

With this configuration, every message handler execution shows up as a span in your distributed traces. You get:

Message execution time: the first thing I check when latency spikes
Handler name and message type: critical when I'm diagnosing which handlers are causing bottlenecks
Success/failure status: tells me if I have a correctness problem or a throughput problem
Correlation IDs propagated across service boundaries: I use these to trace a single order through multiple services
Database query times (if you're using Marten with OTel integration): shows me when the database is the bottleneck, not the handler logic

With OpenTelemetry configured, you can track metrics such as:

Messages processed per second
Message execution duration (histogram)
Queue depth (for durable inbox)
Failure rates
Circuit breaker status

These metrics can feed into Prometheus and then visualized in Grafana dashboards. This is the production-grade observability I need to make scaling decisions.

Metrics I Actually Watch

When I'm deciding whether to scale out, here's what I look at:

Inbox/Outbox queue depth over time. If it's consistently growing, I'm not keeping up with the incoming rate. Time to scale horizontally or tune concurrency.

Handler execution time (p95, p99). If the 95th percentile is creeping up, my handlers are getting slower. This could be database contention, CPU saturation, or garbage collection pressure. Scaling out might help, but profiling the handler is usually the better first step.

Database connection pool utilization. If I'm maxing out the connection pool, adding more instances without increasing the pool size on each instance will just make things worse. I need to either increase the pool size or reduce the MaximumParallelMessages setting.

CPU and memory usage per instance. If I'm under 50% CPU utilization, I probably don't need more instances. I need to increase local queue parallelism.

Circuit breaker trips. If circuit breakers are firing frequently, I have a downstream dependency problem, not a scaling problem. Adding instances won't help.

Autoscaling with ECS Fargate

I wanted to know: can I autoscale ECS Fargate tasks based on queue depth?

Yes, but only with external transports. Local transport doesn't expose queue depth externally, so there's nothing to scale on.

With ECS Fargate, you use Application Auto Scaling with CloudWatch metrics. If your transport is SQS, AWS gives you ApproximateNumberOfMessagesVisible out of the box in CloudWatch. Wire that to a step scaling policy targeting your Fargate service and you're done. Here's a CDK snippet for step scaling on SQS queue depth:

const scalableTarget = service.autoScaleTaskCount({
  minCapacity: 1,
  maxCapacity: 10,
});

scalableTarget.scaleOnMetric('ScaleOnQueueDepth', {
  metric: queue.metricApproximateNumberOfMessagesVisible(),
  scalingSteps: [
    { upper: 0, change: -1 },
    { lower: 50, change: +1 },
    { lower: 200, change: +3 },
  ],
  adjustmentType: AdjustmentType.CHANGE_IN_CAPACITY,
});

For RabbitMQ or Azure Service Bus, Wolverine publishes metrics via OpenTelemetry. Route those through an OTEL collector to CloudWatch (the AWS Distro for OpenTelemetry collector handles this), create a custom CloudWatch alarm, and wire it to Application Auto Scaling the same way.

The caveat: Wolverine's durable inbox is database-backed. If you want to scale on inbox queue depth, you need to publish that metric yourself, either as an OTEL metric from within the application or via a scheduled Lambda that queries the database and puts a custom metric to CloudWatch. Once it's in CloudWatch, scaling works the same as any other metric.

When to Reach for Each Scaling Lever

After digging through the docs and running some experiments, here's my mental model for when to use each scaling option:

Increase MaximumParallelMessages (local queue parallelism) when:

You have spare CPU on your instances
Your handlers are independent and can safely execute in parallel
You're not saturating your database connection pool

Increase ListenerCount (transport-level parallelism) when:

You've maxed out a single listener's throughput
Your transport (RabbitMQ, Azure Service Bus) can handle the additional connections
You need to pull messages faster from the broker

Scale out horizontally (add more instances) when:

You're maxing out CPU or memory on existing instances
Increasing concurrency on a single instance isn't helping (database bottleneck, external API rate limits)
You need redundancy for high availability

Use durable inbox when:

Exactly-once processing is critical (payments, inventory updates)
You can tolerate the database overhead
You're okay with the coordination cost across multiple instances

Use external transports without durable inbox when:

You need maximum throughput
At-least-once semantics are acceptable (handlers are idempotent)
You don't want the database to be in the critical path

Use local queues when:

Messages are published in-process (no external transport needed)
You want to route different message types to different execution queues
You need ordered processing for specific message types (use .Sequential())

Conclusion

Wolverine's scaling model is powerful, but it requires understanding the layered architecture: competing consumers at the transport level, multiple listeners per queue, local queue parallelism, and the durable inbox/outbox coordination.

Coming from Rebus and NServiceBus, the biggest mental shift was realizing that Wolverine gives me more knobs to turn. Listener count and local queue parallelism: that's flexibility, but it's also complexity. Start simple: one listener, tune MaximumParallelMessages, and scale out horizontally when you need it. You get more control than the simpler frameworks, but misunderstanding the layered model means you'll either over-provision infrastructure or create database contention without realizing why.

The observability story is solid if you invest in OpenTelemetry. Wolverine doesn't ship with a monitoring UI like NServiceBus's ServicePulse, but the OTel integration means I can use the same observability stack I'm already running for the rest of my .NET services.

For autoscaling on ECS Fargate, SQS gives you queue depth metrics in CloudWatch out of the box. Other transports like RabbitMQ or Azure Service Bus need a bit more wiring through OpenTelemetry, but it's doable.

Would I use Wolverine for a high-scale production system? Yes, but I'd start with a single instance, instrument everything with OpenTelemetry, and scale out only when the metrics tell me to. Premature scaling just adds operational complexity without measurable benefit.

Wolverine's integration with Marten, the built-in inbox/outbox, and the flexible concurrency model make it compelling. The learning curve is real, but the payoff is a message bus that feels like it was designed for .NET developers who care about both performance and correctness.

I came in asking "when do I scale?" The answer turned out to be simpler than I expected: instrument, measure, and scale only when the data tells you to. Wolverine gives you the knobs. The hard part is resisting the urge to turn them prematurely.

References

About Brad Jolicoeur

Principal Architect with 20+ years building and transforming engineering organizations. Wharton Executive CTO Program graduate. Writing about architecture, distributed systems, production AI, and engineering leadership.

Get in touch → More articles →