Agentic Runtimes¶
Production-grade multi-agent AI orchestration with DAG execution, circuit-breaker model routing, and rubric-based evaluation.
Build workflows where agents occupy specific roles — planner, researcher, coder, reviewer — and coordinate through declarative YAML files.
Quick Example¶
from agentic_v2.workflows import run_workflow
# Run a code review workflow with parallel analysis
result = await run_workflow(
"code_review",
inputs={"code_path": "src/api/handlers.py"}
)
print(result.outputs["final_report"]) # Consolidated review
print(result.metadata["agents_used"]) # ["architect", "reviewer"]
print(result.cost) # Token usage across all LLM calls
Works with 8+ LLM providers with automatic failover — zero config change:
# Tiered routing: gpt-4o → claude-sonnet → gemini-2.5-flash
# If one fails or rate-limits, the next is tried automatically
Core Features¶
- DAG Executor: Kahn's algorithm with
asyncioparallel dispatch, conditional branching, and cascade failure propagation - Tiered Model Router: Health-weighted selection, adaptive cooldowns, circuit breakers across 8+ providers
- Evaluation Framework: YAML-defined rubrics, multidimensional scoring, LLM-as-judge integration
- Zero-credential dev mode:
AGENTIC_NO_LLM=1runs end-to-end with placeholder backends — all 379 tests pass Production-grade multi-agent AI orchestration.
DAG execution · Circuit-breaker model routing · Rubric-based LLM evaluation · Live React dashboard
Quick Start Architecture GitHub →
Python 3.11+ FastAPI Pydantic v2 LangGraph React 19 OpenTelemetry mypy --strict
How it works¶
A workflow definition flows through a deterministic pipeline — YAML loader, graph compiler, DAG executor, model router — before reaching an LLM provider. Every stage emits OpenTelemetry traces and Pydantic-validated artifacts.
flowchart LR
A[YAML Workflow] --> B[Loader
Pydantic v2 validate]
B --> C[Graph Compiler
Kahn topo sort]
C --> D[DAG Executor
asyncio fan-out / fan-in]
D --> E[Model Router
tier · health · circuit breaker]
E --> F[(Provider)]
F -.observation.-> D
D --> G[Artifacts
Pydantic contracts]
style A fill:#1e3a8a,color:#fff,stroke:#1e40af
style F fill:#0e7490,color:#fff,stroke:#155e75
style G fill:#16a34a,color:#fff,stroke:#15803d
Quick Start in 60 seconds¶
No API keys required. The runtime ships with a deterministic placeholder backend that exercises every code path the real models do.
# 1. Clone and install
git clone https://github.com/tafreeman/agentic-runtimes.git
cd agentic-runtimes/agentic-workflows-v2
pip install -e ".[dev,server]"
# 2. Enable zero-credential mode
export AGENTIC_NO_LLM=1 # Windows: $env:AGENTIC_NO_LLM=1
# 3. Run a workflow
agentic run test_deterministic --input '{"task":"hello"}'
In about a minute you should see the DAG executor emit a structured run record, including step timings, tool calls, and a final scored artifact. Detailed walkthrough →
Four capabilities that matter¶
DAG Executor — Kahn's algorithm scheduling with asyncio parallel dispatch, conditional branching, and cascade failure propagation. Parallel branches execute concurrently; the fan-in waits for all required predecessors before releasing the next step.
Tiered Model Router — Maps workflow steps to capability tiers, not specific models. Health-weighted selection, adaptive cooldowns, and circuit breakers across 8+ providers (OpenAI, Anthropic, Gemini, Azure OpenAI, Azure Foundry, GitHub Models, Ollama, local ONNX).
Evaluation Framework — YAML-defined rubrics, multidimensional scoring (coverage / quality / agreement / recency), LLM-as-judge integration, and 0.0–10.0 scoring. Production gating driven by coverage_score >= 0.80 — not pass/fail unit tests.
Zero-credential dev mode — AGENTIC_NO_LLM=1 runs end-to-end on placeholder backends. All 379 tests pass in this mode; CI runs exclusively in it.
| Metric | Value |
|---|---|
| Python source lines | ~187,000 |
| Test files | 100+ |
| Tests passing | 379 |
| Workflow definitions | 6 YAML |
| LLM providers | 8+ |
| ADRs | 17 |
Project Structure¶
agentic-runtimes/
├── agentic-workflows-v2/ # Core runtime (Python 3.11+)
│ ├── agentic_v2/ # Execution engine, agents, models, RAG
│ ├── ui/ # React 19 dashboard
│ └── tests/ # 100+ test files
├── agentic-v2-eval/ # Evaluation framework
├── tools/ # Shared LLM client + utilities
└── docs/ # Architecture, guides, ADRs
What's Different¶
Why DAG over Pipeline?¶
Multi-agent workflows rarely execute linearly. After planning, two specialist analysts run in parallel over the same evidence. Their outputs merge into verification, which conditionally triggers another research round. A pipeline would serialize unnecessarily; a DAG with asyncio.wait(FIRST_COMPLETED) maximizes throughput.
Why Tiered Model Routing?¶
Model names change, endpoints go down, pricing shifts. Each agent is assigned a capability tier (e.g., tier3_analyst). The router resolves this to the best available model at runtime with fallback chains:
Health-weighted selection, adaptive cooldowns, and circuit breakers ensure resilient execution.
Why Rubric-Based Scoring?¶
LLM outputs resist binary pass/fail evaluation. The scoring system uses weighted criteria, multidimensional classification (S/A/B/C/D/F tiers), and LLM-as-judge for subjective quality assessment.
Workflow Definitions¶
The engine ships with 6 production workflow definitions:
| Workflow | Pattern | Description |
|---|---|---|
code_review |
Fan-out / fan-in | Parse code → parallel architecture + quality reviews → synthesis |
bug_resolution |
Sequential with verification | Reproduce → root cause → fix → test → verify |
fullstack_generation |
Parallel sub-steps | API design → frontend + backend in parallel → integration |
iterative_review |
Multi-loop with bounded iteration | Review → feedback → revise until quality gates pass |
conditional_branching |
Conditional DAG | Steps execute or skip based on runtime conditions |
test_deterministic |
Tier-0 only | Deterministic step for testing without LLM calls |
Quick Start¶
# Clone and setup
git clone https://github.com/tafreeman/agentic-runtimes.git
cd agentic-runtimes
just setup
# Configure environment (or use AGENTIC_NO_LLM=1 for placeholder mode)
cp .env.example .env
# Add your API keys to .env
# Run a workflow
cd agentic-workflows-v2
agentic run test_deterministic --input test-input.json
# Start the dashboard
uvicorn agentic_v2.server.app:create_app --factory --reload --port 8010
# In another terminal:
cd ui && npm run dev
See Quick Start for the full guide.
Documentation¶
For New Developers¶
- Quick Start — 5-minute to 1-hour first-run path
- Architecture Overview — System map and load-bearing mechanisms
- Development Guide — Prerequisites, installation, dev servers, testing
For Feature Work¶
- Backend: Architecture — Runtime + API Contracts
- UI: Architecture — UI + Component Inventory
- Workflows: Workflow Authoring + Pattern Catalog
- Evaluation: Architecture — Eval
Reference¶
- Roadmap — Shipped epics and proposed work
- Known Limitations — Honest accounting of current caveats
- ADR Index — Architecture decision records
- Glossary — Term definitions
Development¶
Code quality toolchain enforced via pre-commit hooks:
Run tests:
See CONTRIBUTING.md for the full contributor workflow.
License¶
This project is licensed under the MIT License — see LICENSE for details.¶
By the numbers¶
What developers say¶
The DAG semantics finally let me reason about an agent system the way I reason about a build graph. Conditional branches, cascade failures, and parallel fan-out are first-class — not glued on.
— Contributor
Rubric-based gating replaced our brittle string-match assertions overnight. We now ship LLM features behind the same
coverage_score >= 0.80threshold that gates the rest of the platform.
— Contributor
AGENTIC_NO_LLM=1is the unsung hero. Local development without burning tokens, plus CI that runs the full suite without a single secret in the repo. Federal-friendly by default.
— Contributor
Where to go next¶
Getting started¶
- Overview — install paths and the 60-second tour
- Installation — Python, Node, and provider setup
- Quick Start — your first workflow run, narrated
- First Workflow — write a two-step DAG from scratch
- No-LLM Dev Mode — zero-credential development
Architecture¶
- Architecture (Umbrella) — system map across the four parts
- Runtime Engine — DAG executor, model router, RAG pipeline
- Evaluation Framework — rubrics, evaluators, runners
- Tools & Providers — multi-provider LLM client and tool registry
- UI Dashboard — React 19 SPA and live streaming
Reference¶
- Workflow Reference — every production workflow with description
- Pattern Catalog — reusable agentic patterns
- Glossary — terminology used across the docs
- ADR Index — every architecture decision, dated and rationalized
- Roadmap — what is shipped, what is in flight, what is proposed
- Known Limitations — honest accounting of caveats
Read the architecture deep-dive →¶
The runtime engine combines two execution backends, an adapter registry, a tiered model router, and a full RAG pipeline. The architecture document is the canonical map of how those pieces fit together.