Agentic Runtimes¶

Production-grade multi-agent AI orchestration with DAG execution, circuit-breaker model routing, and rubric-based evaluation.

Build workflows where agents occupy specific roles — planner, researcher, coder, reviewer — and coordinate through declarative YAML files.

Quick Example¶

from agentic_v2.workflows import run_workflow

# Run a code review workflow with parallel analysis
result = await run_workflow(
    "code_review",
    inputs={"code_path": "src/api/handlers.py"}
)

print(result.outputs["final_report"])  # Consolidated review
print(result.metadata["agents_used"])  # ["architect", "reviewer"]
print(result.cost)                      # Token usage across all LLM calls

Works with 8+ LLM providers with automatic failover — zero config change:

# Tiered routing: gpt-4o → claude-sonnet → gemini-2.5-flash
# If one fails or rate-limits, the next is tried automatically

Core Features¶

DAG Executor: Kahn's algorithm with asyncio parallel dispatch, conditional branching, and cascade failure propagation
Tiered Model Router: Health-weighted selection, adaptive cooldowns, circuit breakers across 8+ providers
Evaluation Framework: YAML-defined rubrics, multidimensional scoring, LLM-as-judge integration
Zero-credential dev mode: AGENTIC_NO_LLM=1 runs end-to-end with placeholder backends — all 379 tests pass Production-grade multi-agent AI orchestration.

DAG execution · Circuit-breaker model routing · Rubric-based LLM evaluation · Live React dashboard

Quick Start Architecture GitHub →

Python 3.11+ FastAPI Pydantic v2 LangGraph React 19 OpenTelemetry mypy --strict

How it works¶

A workflow definition flows through a deterministic pipeline — YAML loader, graph compiler, DAG executor, model router — before reaching an LLM provider. Every stage emits OpenTelemetry traces and Pydantic-validated artifacts.

flowchart LR
    A[YAML Workflow] --> B[Loader
Pydantic v2 validate]
    B --> C[Graph Compiler
Kahn topo sort]
    C --> D[DAG Executor
asyncio fan-out / fan-in]
    D --> E[Model Router
tier · health · circuit breaker]
    E --> F[(Provider)]
    F -.observation.-> D
    D --> G[Artifacts
Pydantic contracts]
    style A fill:#1e3a8a,color:#fff,stroke:#1e40af
    style F fill:#0e7490,color:#fff,stroke:#155e75
    style G fill:#16a34a,color:#fff,stroke:#15803d

Quick Start in 60 seconds¶

No API keys required. The runtime ships with a deterministic placeholder backend that exercises every code path the real models do.

# 1. Clone and install
git clone https://github.com/tafreeman/agentic-runtimes.git
cd agentic-runtimes/agentic-workflows-v2
pip install -e ".[dev,server]"

# 2. Enable zero-credential mode
export AGENTIC_NO_LLM=1   # Windows: $env:AGENTIC_NO_LLM=1

# 3. Run a workflow
agentic run test_deterministic --input '{"task":"hello"}'

In about a minute you should see the DAG executor emit a structured run record, including step timings, tool calls, and a final scored artifact. Detailed walkthrough →

Four capabilities that matter¶

DAG Executor — Kahn's algorithm scheduling with asyncio parallel dispatch, conditional branching, and cascade failure propagation. Parallel branches execute concurrently; the fan-in waits for all required predecessors before releasing the next step.

Tiered Model Router — Maps workflow steps to capability tiers, not specific models. Health-weighted selection, adaptive cooldowns, and circuit breakers across 8+ providers (OpenAI, Anthropic, Gemini, Azure OpenAI, Azure Foundry, GitHub Models, Ollama, local ONNX).

Evaluation Framework — YAML-defined rubrics, multidimensional scoring (coverage / quality / agreement / recency), LLM-as-judge integration, and 0.0–10.0 scoring. Production gating driven by coverage_score >= 0.80 — not pass/fail unit tests.

Zero-credential dev mode — AGENTIC_NO_LLM=1 runs end-to-end on placeholder backends. All 379 tests pass in this mode; CI runs exclusively in it.

Metric	Value
Python source lines	~187,000
Test files	100+
Tests passing	379
Workflow definitions	6 YAML
LLM providers	8+
ADRs	17

Project Structure¶

agentic-runtimes/
├── agentic-workflows-v2/    # Core runtime (Python 3.11+)
│   ├── agentic_v2/          # Execution engine, agents, models, RAG
│   ├── ui/                  # React 19 dashboard
│   └── tests/               # 100+ test files
├── agentic-v2-eval/         # Evaluation framework
├── tools/                   # Shared LLM client + utilities
└── docs/                    # Architecture, guides, ADRs

What's Different¶

Why DAG over Pipeline?¶

Multi-agent workflows rarely execute linearly. After planning, two specialist analysts run in parallel over the same evidence. Their outputs merge into verification, which conditionally triggers another research round. A pipeline would serialize unnecessarily; a DAG with asyncio.wait(FIRST_COMPLETED) maximizes throughput.

Why Tiered Model Routing?¶

Model names change, endpoints go down, pricing shifts. Each agent is assigned a capability tier (e.g., tier3_analyst). The router resolves this to the best available model at runtime with fallback chains:

Tier 3: gemini-2.5-flash → gh:gpt-4o → openai:gpt-4o → anthropic:claude-sonnet

Health-weighted selection, adaptive cooldowns, and circuit breakers ensure resilient execution.

Why Rubric-Based Scoring?¶

LLM outputs resist binary pass/fail evaluation. The scoring system uses weighted criteria, multidimensional classification (S/A/B/C/D/F tiers), and LLM-as-judge for subjective quality assessment.

Workflow Definitions¶

The engine ships with 6 production workflow definitions:

Workflow	Pattern	Description
`code_review`	Fan-out / fan-in	Parse code → parallel architecture + quality reviews → synthesis
`bug_resolution`	Sequential with verification	Reproduce → root cause → fix → test → verify
`fullstack_generation`	Parallel sub-steps	API design → frontend + backend in parallel → integration
`iterative_review`	Multi-loop with bounded iteration	Review → feedback → revise until quality gates pass
`conditional_branching`	Conditional DAG	Steps execute or skip based on runtime conditions
`test_deterministic`	Tier-0 only	Deterministic step for testing without LLM calls

Quick Start¶

# Clone and setup
git clone https://github.com/tafreeman/agentic-runtimes.git
cd agentic-runtimes
just setup

# Configure environment (or use AGENTIC_NO_LLM=1 for placeholder mode)
cp .env.example .env
# Add your API keys to .env

# Run a workflow
cd agentic-workflows-v2
agentic run test_deterministic --input test-input.json

# Start the dashboard
uvicorn agentic_v2.server.app:create_app --factory --reload --port 8010
# In another terminal:
cd ui && npm run dev

See Quick Start for the full guide.

Documentation¶

For New Developers¶

Quick Start — 5-minute to 1-hour first-run path
Architecture Overview — System map and load-bearing mechanisms
Development Guide — Prerequisites, installation, dev servers, testing

For Feature Work¶

Backend: Architecture — Runtime + API Contracts
UI: Architecture — UI + Component Inventory
Workflows: Workflow Authoring + Pattern Catalog
Evaluation: Architecture — Eval

Reference¶

Roadmap — Shipped epics and proposed work
Known Limitations — Honest accounting of current caveats
ADR Index — Architecture decision records
Glossary — Term definitions

Development¶

Code quality toolchain enforced via pre-commit hooks:

pre-commit run --all-files  # black, isort, ruff, mypy, docformatter

Run tests:

cd agentic-workflows-v2
pytest tests/ -v --cov=agentic_v2

cd ../agentic-v2-eval
pytest tests/ -v

See CONTRIBUTING.md for the full contributor workflow.

License¶

This project is licensed under the MIT License — see LICENSE for details.¶

By the numbers¶

187K

Lines of Python

379

Tests passing

17

ADRs

8+

LLM providers

6

Production workflows

What developers say¶

The DAG semantics finally let me reason about an agent system the way I reason about a build graph. Conditional branches, cascade failures, and parallel fan-out are first-class — not glued on.

— Contributor

Rubric-based gating replaced our brittle string-match assertions overnight. We now ship LLM features behind the same coverage_score >= 0.80 threshold that gates the rest of the platform.

— Contributor

AGENTIC_NO_LLM=1 is the unsung hero. Local development without burning tokens, plus CI that runs the full suite without a single secret in the repo. Federal-friendly by default.

— Contributor

Where to go next¶

Getting started¶

Overview — install paths and the 60-second tour
Installation — Python, Node, and provider setup
Quick Start — your first workflow run, narrated
First Workflow — write a two-step DAG from scratch
No-LLM Dev Mode — zero-credential development

Architecture¶

Architecture (Umbrella) — system map across the four parts
Runtime Engine — DAG executor, model router, RAG pipeline
Evaluation Framework — rubrics, evaluators, runners
Tools & Providers — multi-provider LLM client and tool registry
UI Dashboard — React 19 SPA and live streaming

Reference¶

Workflow Reference — every production workflow with description
Pattern Catalog — reusable agentic patterns
Glossary — terminology used across the docs
ADR Index — every architecture decision, dated and rationalized
Roadmap — what is shipped, what is in flight, what is proposed
Known Limitations — honest accounting of caveats

Read the architecture deep-dive →¶

The runtime engine combines two execution backends, an adapter registry, a tiered model router, and a full RAG pipeline. The architecture document is the canonical map of how those pieces fit together.

Open the architecture overview