Skip to content

Provider Pool

The provider pool manages multiple LLM providers with tag-based routing, adaptive freeze/thaw failover, priority scheduling, and concurrency control.

Architecture

Routing Flow

Entry: ProviderPoolImpl::chat_completion() in y-provider/src/pool.rs

Route Selection Strategy

The TagBasedRouter supports 5 selection strategies for choosing among eligible providers:

StrategyAlgorithmBest For
Priority (default)First candidate in sorted orderPredictable provider preference
RandomTime-seeded random selectionLoad distribution
LeastLoadedMost available semaphore permitsBalanced utilization
RoundRobinAtomic counter modulo candidatesEven distribution
CostOptimizedMinimum cost_per_1k_inputCost minimization

RouteRequest

RouteRequest {
    preferred_provider_id: Option<ProviderId>,  // exact provider match
    preferred_model: Option<String>,            // preferred model name
    required_tags: Vec<String>,                 // all must match
    priority: Priority,                         // Critical | Normal | Idle
    strategy: Option<RoutingStrategy>,          // override default strategy
}

Freeze/Thaw Failover

Error Classification

The error_classifier categorizes provider errors into StandardError types:

Error TypeFreeze DurationPermanent
Quota60s, escalatingNo
RateLimit30s, escalatingNo
Network15sNo
ServerError30sNo
Authentication--Yes
InvalidRequest--No (not frozen)
Unknown15sNo

Adaptive duration: Consecutive failures increase freeze duration. The formula scales based on the number of consecutive errors, up to a configurable maximum.

Freeze/Thaw API

rust
// Programmatic freeze/thaw
pool.freeze(provider_id, duration);
pool.thaw(provider_id);

// Query status
pool.provider_statuses(); // -> Vec<ProviderStatus>

Priority Scheduling

Three priority tiers with reserved capacity:

TierFilter RuleUse Case
CriticalAlways passes, reserved slots guaranteedUser-facing requests, error recovery
NormalPasses if available permits > max_permits / 5Standard agent operations
IdlePasses if any permits availableBackground tasks, prefetch, analytics

Concurrency Control

Two-level semaphore system:

Global Semaphore (optional)
  |
  +-- Provider A Semaphore (max_concurrent per provider)
  |     +-- Active Request Counter (atomic)
  |
  +-- Provider B Semaphore
  |     +-- Active Request Counter
  |
  +-- Provider C Semaphore
        +-- Active Request Counter
  • Global semaphore: System-wide concurrency cap across all providers
  • Per-provider semaphore: Limits concurrent requests to each individual provider
  • Active request counter: AtomicUsize for real-time monitoring (no lock overhead)

Streaming Guard

For streaming responses, the concurrency counter must be decremented when the stream ends, not when the initial response arrives:

The ActiveRequestGuard implements Drop to ensure cleanup even on abnormal stream termination.

Provider Configuration

toml
[[providers]]
id = "anthropic-main"
provider_type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
model = "claude-sonnet-4-6"
tags = ["fast", "coding"]
max_concurrent = 10
priority = 1

[providers.defaults]
temperature = 0.7
max_tokens = 4096

[providers.proxy]
url = "http://proxy.internal:8080"

Supported Provider Types

Type StringProviderNotes
openaiOpenAiProviderStandard OpenAI API
openai-compat / openai_compatible / customOpenAiProviderOpenAI-compatible APIs
anthropicAnthropicProviderAnthropic Claude API
geminiGeminiProviderGoogle Gemini API
ollamaOllamaProviderLocal Ollama inference
azureAzureOpenAiProviderAzure OpenAI deployments
deepseekOpenAiProviderDeepSeek API (OpenAI-compatible)

Multi-Level Proxy Config

Proxy configuration follows a precedence chain: provider-level > tag-level > global.

Metrics & Observability

ProviderMetrics tracks per-provider:

MetricType
Request countCounter
Success countCounter
Error countCounter
LatencyHistogram
Token usage (input/output)Counter
Cost (USD)Counter
Consecutive failuresGauge
Current active requestsGauge

Metrics are persisted to SqliteProviderMetricsStore for historical analysis and exposed via ObservabilityService::provider_snapshots().

Cost Limits

Providers support automatic freeze on budget exhaustion:

toml
[providers.cost_limit]
daily_max_usd = 50.0
monthly_max_usd = 500.0
action = "freeze"  # freeze | warn | deny

When a cost limit is reached, the provider is frozen until the limit resets (daily at midnight UTC, monthly on the 1st). The CostService in y-service aggregates costs and generates daily summaries.

Released under the MIT License.