Skip to content

Architecture

Audience: Engineers orienting to the monorepo for design or review work. Outcome: After reading, you know which package owns which concern, how they communicate, and where to dive deeper. Last verified: 2026-04-22

This document is the umbrella. It does not re-derive system internals — it points at the existing per-package architecture docs and the ADRs that ratify each decision. If you are new to the repo, read this in full first; if you are in a specific area, jump to the per-package link.


1. System at a glance

graph TB
    subgraph Clients["Clients"]
        CLI["agentic CLI
Typer"] UI["React 19 Dashboard
Vite 6 · @xyflow/react"] EXT["External callers
REST · WS · SSE"] end subgraph Runtime["agentic-workflows-v2/"] API["FastAPI Server
REST · WebSocket · SSE · 500-event replay"] ADR["AdapterRegistry
singleton"] NATIVE["Native DAG Engine
Kahn's algorithm · asyncio.wait FIRST_COMPLETED"] LG["LangGraph Engine
StateGraph · checkpointing"] ROUTER["SmartModelRouter
tier routing · circuit breakers · stats persistence"] RAG["RAG Pipeline
chunk · embed · index · retrieve · assemble"] AGENTS["Agents
Base + Coder + Reviewer + Architect + Orchestrator"] end subgraph Eval["agentic-v2-eval/"] SCORER["Scorer
rubric-based"] RUNNERS["Batch · Streaming · AsyncStreaming Runners"] REPORTER["Reporters
json · markdown · html"] end subgraph Shared["tools/ (agentic-tools)"] LLM["LLMClient
10 providers"] BENCH["Benchmarks"] CACHE["Response cache"] end CLI --> ADR UI -->|REST · WS| API EXT -->|REST · WS · SSE| API API --> ADR ADR --> NATIVE ADR --> LG NATIVE --> AGENTS LG --> AGENTS AGENTS --> ROUTER AGENTS --> RAG ROUTER -->|provider calls| LLM API -->|scores| SCORER SCORER --> RUNNERS RUNNERS --> REPORTER classDef rt fill:#4a90d9,stroke:#2c5f8a,color:#fff classDef ev fill:#00b894,stroke:#008060,color:#fff classDef sh fill:#fdcb6e,stroke:#c8a034,color:#333 class API,ADR,NATIVE,LG,ROUTER,RAG,AGENTS rt class SCORER,RUNNERS,REPORTER ev class LLM,BENCH,CACHE sh

The three Python packages have zero cross-package imports. They communicate via:

  • tools/ is published as a wheel (agentic-tools) — the runtime and eval packages consume it like any other library.
  • The runtime exposes agentic-v2-eval through its REST API (POST /runs/:id/evaluation, GET /runs/:id/evaluation) — the eval framework does not import runtime internals.

2. Per-package deep dives

Package Entry point Deep dive
Runtime agentic-workflows-v2/agentic_v2/ architecture-runtime.md
UI agentic-workflows-v2/ui/src/ architecture-ui.md
Evaluation agentic-v2-eval/src/agentic_v2_eval/ architecture-eval.md
Shared tools tools/ architecture-tools.md
Cross-package integration integration-architecture.md

Additional supporting documents:


3. The five load-bearing mechanisms

These are the places where a change ripples across the system. Understand these before proposing architectural work.

3.1 Adapter registry

AdapterRegistry is a process-wide singleton in agentic_v2/adapters/registry.py. Engines register with a name (native, langchain), the CLI resolves --adapter <name> at runtime, and tests reset the singleton via an autouse fixture to prevent cross-test leakage.

  • Why it exists: ADR-001 — dual execution engine.
  • Current default: langchain (configurable per run).
  • Direction of travel: ADR-013 — native DAG as the single long-term engine.

3.2 Typed execution-event wire format

contracts/events.py defines a Pydantic discriminated union covering workflow_start, step_start, step_end, step_complete, step_error, workflow_end, evaluation_start, evaluation_complete. WebSocket and SSE broadcasts validate before emit. TypeScript interfaces in ui/src/api/types.ts mirror this union by hand — drift is detected by convention, not yet by automation.

  • Ratifies: ADR-014.
  • Related: the 500-event replay buffer in server/websocket.py — clients reconnecting mid-run receive missed events.

3.3 SLO gates in git

Time-to-first-span p95 and nightly flake rate are stored as rolling windows in git — measurements are appended to JSON artifacts committed on each CI run, and the gate reads the window, not a fresh sample. This keeps the signal stable across single bad runs.

3.4 SmartModelRouter

Maps tier (tier3_analyst) → capability → best available model at runtime. Health-weighted selection, exponential cooldowns, circuit breakers, persisted stats across restarts, Retry-After header awareness.

  • Ratifies: ADR-002.
  • Provider default for CI: GitHub Models via GITHUB_TOKEN — see ADR-016.

3.5 RAG pipeline

Thirteen modules in agentic_v2/rag/: loader → recursive chunker → embedder (content-hash dedup) → cosine vectorstore + BM25 keyword index → RRF hybrid retriever → token-budget assembler. Full OTEL tracing. Memory backed by MemoryStoreProtocol (InMemoryStore or RAGMemoryStore).


4. The decision record

ADR Domain Status
001 Dual execution engine Accepted (superseded by 013)
002 SmartModelRouter circuit breakers Accepted
003 Deep research supervisor Superseded → 007
007 Multidimensional classification + stop policy Proposed
008 Test value taxonomy Accepted
009 Scoring enhancements Accepted
010 Commit-driven A/B eval harness Proposed
011 Eval harness API design Proposed
012 UI evaluation hub Proposed
013 Native DAG as single engine Accepted
014 Pydantic wire format for execution events Accepted
015 SLO rolling window in git Accepted
016 GitHub Models as default E2E provider Accepted

ADRs 004–006 are intentionally unused — the gap is documented in adr/ADR-INDEX.md and should not be reclaimed.


5. What this document is not