Comprehensive patterns for rate limiting, circuit breakers, bulkhead isolation, retry policies, and request queuing to handle 500+ concurrent users with graceful degradation.
ReqVise must handle 500+ concurrent users with complex procurement workflows, real-time auctions, and heavy report generation. Without proper resilience patterns, a single service failure can cascade and bring down the entire system.
Protect APIs from abuse and ensure fair resource distribution across users and tenants.
| Scope | Limit | Window | Action on Exceed |
|---|---|---|---|
| Per User | 100 requests | 1 minute | HTTP 429 Too Many Requests |
| Per Tenant | 1,000 requests | 1 minute | HTTP 429 + Alert |
| Per IP (Unauthenticated) | 20 requests | 1 minute | HTTP 429 + Captcha |
| Reverse Auction Bids | 10 bids | 10 seconds | Throttle with warning |
| Report Generation | 5 requests | 1 minute | Queue request |
// Program.cs - Rate Limiting Configuration (.NET 10) builder.Services.AddRateLimiter(options => { // Per-user rate limit options.AddPolicy("PerUser", context => RateLimitPartition.GetFixedWindowLimiter( partitionKey: context.User?.Identity?.Name ?? context.Connection.RemoteIpAddress?.ToString(), factory: _ => new FixedWindowRateLimiterOptions { PermitLimit = 100, Window = TimeSpan.FromMinutes(1), QueueProcessingOrder = QueueProcessingOrder.OldestFirst, QueueLimit = 10 })); // Per-tenant rate limit options.AddPolicy("PerTenant", context => RateLimitPartition.GetFixedWindowLimiter( partitionKey: context.User?.FindFirst("tenant_id")?.Value ?? "anonymous", factory: _ => new FixedWindowRateLimiterOptions { PermitLimit = 1000, Window = TimeSpan.FromMinutes(1) })); options.OnRejected = async (context, token) => { context.HttpContext.Response.StatusCode = 429; await context.HttpContext.Response.WriteAsync("Too many requests. Please try again later."); }; });
Prevent cascade failures by stopping requests to failing services and allowing them time to recover.
| External Service | Failure Threshold | Break Duration | Fallback |
|---|---|---|---|
| Brevo (Email/SMS) | 5 failures | 30 seconds | Queue for retry |
| Bank API | 3 failures | 60 seconds | Return cached status |
| CRM Webhook | 5 failures | 30 seconds | Queue sync-back |
| Gift Card API | 3 failures | 60 seconds | Pending fulfillment |
// Polly Circuit Breaker with HttpClient builder.Services.AddHttpClient("BrevoClient", client => { client.BaseAddress = new Uri("https://api.brevo.com"); }) .AddPolicyHandler(Policy .Handle<HttpRequestException>() .OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode) .CircuitBreakerAsync( handledEventsAllowedBeforeBreaking: 5, durationOfBreak: TimeSpan.FromSeconds(30), onBreak: (result, duration) => { Log.Warning("Circuit OPEN for Brevo. Duration: {Duration}", duration); }, onReset: () => { Log.Information("Circuit CLOSED for Brevo. Service recovered."); } )) .AddPolicyHandler(Policy .Handle<HttpRequestException>() .WaitAndRetryAsync(3, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)) // 2s, 4s, 8s ));
When circuit is OPEN, email/SMS notifications MUST be queued to Azure Service Bus for later delivery instead of being dropped.
Isolate different parts of the system so that failure in one area doesn't exhaust resources for others.
| Bulkhead | Max Concurrent | Queue Size | Purpose |
|---|---|---|---|
| Report Generation | 10 | 50 | Prevent CPU exhaustion |
| Excel Import | 5 | 20 | Memory protection |
| External API Calls | 20 | 100 | Connection limits |
| SignalR Broadcasts | 50 | 200 | Real-time capacity |
// Bulkhead for Report Generation private static readonly SemaphoreSlim _reportSemaphore = new(10, 10); public async Task<ReportResult> GenerateReportAsync(ReportRequest request) { if (!await _reportSemaphore.WaitAsync(TimeSpan.FromSeconds(30))) { throw new ServiceUnavailableException("Report generation queue full. Please try again."); } try { return await _reportService.GenerateAsync(request); } finally { _reportSemaphore.Release(); } }
Handle transient failures by retrying with increasing delays to avoid overwhelming recovering services.
| Operation | Max Retries | Backoff Strategy | Max Delay |
|---|---|---|---|
| Database Connection | 3 | Exponential (1s, 2s, 4s) | 4 seconds |
| Email/SMS Send | 5 | Exponential (2s, 4s, 8s, 16s, 30s) | 30 seconds |
| Webhook Delivery | 5 | Exponential + Jitter | 60 seconds |
| Redis Cache | 2 | Immediate, then 1s | 1 second |
// Retry Policy with Jitter (prevents thundering herd) var retryPolicy = Policy .Handle<HttpRequestException>() .OrResult<HttpResponseMessage>(r => r.StatusCode == HttpStatusCode.ServiceUnavailable) .WaitAndRetryAsync( retryCount: 5, sleepDurationProvider: (retryAttempt, context) => { var baseDelay = TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)); var jitter = TimeSpan.FromMilliseconds(Random.Shared.Next(0, 1000)); var totalDelay = baseDelay + jitter; return totalDelay > TimeSpan.FromSeconds(30) ? TimeSpan.FromSeconds(30) : totalDelay; }, onRetry: (outcome, timespan, retryAttempt, context) => { Log.Warning("Retry {Attempt} after {Delay}ms", retryAttempt, timespan.TotalMilliseconds); } );
Handle burst traffic by queuing requests for background processing instead of rejecting them.
| Queue | Message Type | Processing Rate | Retention |
|---|---|---|---|
| email-queue | Email notifications | 100/batch | 7 days |
| sms-queue | SMS notifications | 50/batch | 3 days |
| report-queue | Report generation | 5/minute | 1 day |
| webhook-queue | CRM sync-back | 20/minute | 7 days |
| import-queue | Excel imports | 2/minute | 1 day |
// Azure Service Bus Producer public class EmailQueueService { private readonly ServiceBusSender _sender; public async Task QueueEmailAsync(EmailMessage email) { var message = new ServiceBusMessage(JsonSerializer.Serialize(email)) { MessageId = Guid.NewGuid().ToString(), ContentType = "application/json", Subject = email.TemplateCode, ScheduledEnqueueTime = email.ScheduledTime ?? DateTimeOffset.UtcNow }; await _sender.SendMessageAsync(message); } } // Azure Service Bus Consumer (Hangfire Worker) public class EmailQueueProcessor { [AutomaticRetry(Attempts = 5, DelaysInSeconds = new[] { 10, 30, 60, 120, 300 })] public async Task ProcessBatchAsync() { var messages = await _receiver.ReceiveMessagesAsync(maxMessages: 100); foreach (var message in messages) { var email = JsonSerializer.Deserialize<EmailMessage>(message.Body); await _brevoService.SendAsync(email); await _receiver.CompleteMessageAsync(message); } } }
| Rule ID | Category | Description |
|---|---|---|
| BR-SCALE-001 | Circuit Breaker | Failed notifications MUST be queued, not dropped |
| BR-SCALE-002 | Rate Limiting | HTTP 429 response MUST include Retry-After header |
| BR-SCALE-003 | Bulkhead | Report generation limited to 10 concurrent requests |
| BR-SCALE-004 | Retry | Maximum retry delay MUST NOT exceed 30 seconds |
| BR-SCALE-005 | Queue | Messages MUST be idempotent (safe to replay) |
| BR-SCALE-006 | Monitoring | Circuit breaker state changes MUST trigger alerts |