Request Lifecycle
This document traces the end-to-end journey of a user message through the y-agent system, from input to response.
High-Level Flow
Phase 1: Turn Preparation
Entry: ChatService::send_message() in y-service/src/chat.rs
- Load session from
SessionManagerusingsession_id - Append user message to both transcripts:
- Context transcript (LLM-facing, subject to compaction)
- Display transcript (UI-facing, immutable)
- Resolve system prompt from
PromptContext(rendered template with mode overlays) - Load conversation history from the context transcript
- Filter tool definitions via
AgentService::filter_tool_definitions()-- respects agentallowed_toolsallowlist - Build
AgentExecutionConfigwith session_id, messages, system_prompt, tool_definitions, max_iterations, trust_tier
Phase 2: Context Assembly
Entry: ContextPipeline::assemble_with_request() in y-context/src/pipeline.rs
The pipeline iterates registered ContextProvider implementations in priority order. Each provider appends ContextItem entries to the AssembledContext.
Each ContextItem carries:
category-- SystemPrompt, Bootstrap, Memory, Knowledge, Skills, Tools, History, Statuscontent-- the actual text injected into the prompttoken_estimate-- estimated token count for budget trackingpriority-- ordering weight within the category
Fail-open design: If any provider errors, the pipeline logs a warning and continues. Partial context is better than no context.
Phase 3: Agent Execution Loop
Entry: AgentService::execute() in y-service/src/agent_service/mod.rs
Initialization
- Set up
DiagnosticsContextand trace scope (if tracing enabled) - Build
working_historyfrom assembled context + conversation messages - Initialize
ToolExecContextwith iteration counters, token accumulators, cancellation token
Loop Body (each iteration)
Intra-Turn Pruning
Between iterations, the system applies three pruning strategies:
IntraTurnPruner::prune_working_history()-- removes failed tool call branches (error results that the LLM has already seen and reacted to)pruning::prune_old_tool_results()-- truncates or removes stale tool outputs from earlier iterationspruning::strip_historical_thinking()-- removesreasoning_contentfrom previous turns (only the current turn's thinking is preserved)
Phase 4: LLM Communication
Entry: llm::call_llm() in y-service/src/agent_service/llm.rs
- Build
ChatRequestwith model, temperature, max_tokens, tools, thinking config - Build
RouteRequestwith preferred provider/model, required tags, priority tier - Provider pool selects a provider via
TagBasedRouter(see Provider Pool) - Call
provider.chat_completion()orchat_completion_stream() - On success: accumulate
TokenUsageinto cumulative counters - On error: provider pool classifies error and may freeze the provider
Phase 5: Tool Execution
Entry: tool_handling::handle_native_tool_calls() in y-service/src/agent_service/tool_handling.rs
Meta-Tool Interception
Before reaching the registry, certain tool names are intercepted and dispatched to specialized orchestrators:
| Tool Name | Orchestrator | Purpose |
|---|---|---|
ToolSearch | ToolSearchOrchestrator | Activates tools into the LRU set |
Task | TaskDelegationOrchestrator | Spawns sub-agent with isolated session |
Plan | PlanOrchestrator | Structured planning via agent delegation |
WorkflowCreate/List/... | WorkflowOrchestrator | DAG workflow CRUD |
ScheduleCreate/List/... | WorkflowOrchestrator | Schedule management |
Permission Model
The permission check follows a layered evaluation:
PermissionModel::evaluate(tool_name, is_dangerous)-- config-based rules (allow/notify/ask/deny)session_permission_mode()-- session-level override (BypassPermissionsconverts Ask -> Allow, but never overrides Deny)- Built-in trust auto-allow: if
trust_tier == BuiltIn && agent_allowed_tools.contains(name)-> Allow
Phase 6: Response Delivery
After the loop exits (no tool calls in the LLM response):
build_final_result()constructsAgentExecutionResultwith:content-- the assistant's text responsenew_messages-- all messages generated during the turn (assistant + tool results)cumulative_usage-- total token counts across all iterationscumulative_cost-- total cost in USDiterations-- number of loop iterations
ChatServiceappends the assistant reply to dual transcripts- Emits
TurnEvent::Completethrough the progress channel - Presentation layer renders the response to the user
Streaming Flow
For streaming responses, the flow differs at Phase 4:
The provider pool wraps the stream with an ActiveRequestGuard (RAII) to ensure the per-provider concurrency counter is decremented even if the consumer aborts mid-stream.
HITL (Human-in-the-Loop) Interrupts
Two types of HITL interrupts can pause the turn loop:
Permission Request
When a tool requires user approval (PermissionDecision::Ask):
- A
oneshot::channel()is created - The
Senderis inserted intoctx.pending_permissions TurnEvent::PermissionRequestis emitted to the presentation layer- Execution blocks on
receiver.await - User responds with Approve / AllowAllForSession / Deny
AskUser Tool
When the agent invokes the AskUser tool:
- The tool output is delivered to the user
- A
oneshot::channel()is inserted intoctx.pending_interactions - Execution blocks until the user provides a response
- The response becomes the tool result, and the loop continues
Error Recovery
| Error Type | Behavior |
|---|---|
| LLM quota/rate-limit | Provider frozen, pool retries with next available provider |
| LLM auth error | Provider frozen permanently |
| Tool execution error | Error string returned as tool result, LLM sees it and adapts |
| Tool loop limit exceeded | Turn ends with ToolLoopLimitExceeded, partial results preserved |
| Cancellation | Turn ends with Cancelled, partial results and messages preserved |
| Context overflow | ContextWindowGuard triggers compaction or pruning |
