Skip to content

Agentic Runtimes

Production-grade multi-agent AI orchestration with DAG execution, circuit-breaker model routing, and rubric-based evaluation.

Build workflows where agents occupy specific roles — planner, researcher, coder, reviewer — and coordinate through declarative YAML files.

Quick Example

from agentic_v2.workflows import run_workflow

# Run a code review workflow with parallel analysis
result = await run_workflow(
    "code_review",
    inputs={"code_path": "src/api/handlers.py"}
)

print(result.outputs["final_report"])  # Consolidated review
print(result.metadata["agents_used"])  # ["architect", "reviewer"]
print(result.cost)                      # Token usage across all LLM calls

Works with 8+ LLM providers with automatic failover — zero config change:

# Tiered routing: gpt-4o → claude-sonnet → gemini-2.5-flash
# If one fails or rate-limits, the next is tried automatically

Core Features

  • DAG Executor: Kahn's algorithm with asyncio parallel dispatch, conditional branching, and cascade failure propagation
  • Tiered Model Router: Health-weighted selection, adaptive cooldowns, circuit breakers across 8+ providers
  • Evaluation Framework: YAML-defined rubrics, multidimensional scoring, LLM-as-judge integration
  • Zero-credential dev mode: AGENTIC_NO_LLM=1 runs end-to-end with placeholder backends — all 379 tests pass Production-grade multi-agent AI orchestration.

DAG execution · Circuit-breaker model routing · Rubric-based LLM evaluation · Live React dashboard

Quick Start Architecture GitHub →

Python 3.11+ FastAPI Pydantic v2 LangGraph React 19 OpenTelemetry mypy --strict


How it works

A workflow definition flows through a deterministic pipeline — YAML loader, graph compiler, DAG executor, model router — before reaching an LLM provider. Every stage emits OpenTelemetry traces and Pydantic-validated artifacts.

flowchart LR
    A[YAML Workflow] --> B[Loader
Pydantic v2 validate] B --> C[Graph Compiler
Kahn topo sort] C --> D[DAG Executor
asyncio fan-out / fan-in] D --> E[Model Router
tier · health · circuit breaker] E --> F[(Provider)] F -.observation.-> D D --> G[Artifacts
Pydantic contracts] style A fill:#1e3a8a,color:#fff,stroke:#1e40af style F fill:#0e7490,color:#fff,stroke:#155e75 style G fill:#16a34a,color:#fff,stroke:#15803d

Quick Start in 60 seconds

No API keys required. The runtime ships with a deterministic placeholder backend that exercises every code path the real models do.

# 1. Clone and install
git clone https://github.com/tafreeman/agentic-runtimes.git
cd agentic-runtimes/agentic-workflows-v2
pip install -e ".[dev,server]"

# 2. Enable zero-credential mode
export AGENTIC_NO_LLM=1   # Windows: $env:AGENTIC_NO_LLM=1

# 3. Run a workflow
agentic run test_deterministic --input '{"task":"hello"}'

In about a minute you should see the DAG executor emit a structured run record, including step timings, tool calls, and a final scored artifact. Detailed walkthrough →


Four capabilities that matter

DAG Executor — Kahn's algorithm scheduling with asyncio parallel dispatch, conditional branching, and cascade failure propagation. Parallel branches execute concurrently; the fan-in waits for all required predecessors before releasing the next step.

Tiered Model Router — Maps workflow steps to capability tiers, not specific models. Health-weighted selection, adaptive cooldowns, and circuit breakers across 8+ providers (OpenAI, Anthropic, Gemini, Azure OpenAI, Azure Foundry, GitHub Models, Ollama, local ONNX).

Evaluation Framework — YAML-defined rubrics, multidimensional scoring (coverage / quality / agreement / recency), LLM-as-judge integration, and 0.0–10.0 scoring. Production gating driven by coverage_score >= 0.80 — not pass/fail unit tests.

Zero-credential dev modeAGENTIC_NO_LLM=1 runs end-to-end on placeholder backends. All 379 tests pass in this mode; CI runs exclusively in it.

Metric Value
Python source lines ~187,000
Test files 100+
Tests passing 379
Workflow definitions 6 YAML
LLM providers 8+
ADRs 17

Project Structure

agentic-runtimes/
├── agentic-workflows-v2/    # Core runtime (Python 3.11+)
│   ├── agentic_v2/          # Execution engine, agents, models, RAG
│   ├── ui/                  # React 19 dashboard
│   └── tests/               # 100+ test files
├── agentic-v2-eval/         # Evaluation framework
├── tools/                   # Shared LLM client + utilities
└── docs/                    # Architecture, guides, ADRs

What's Different

Why DAG over Pipeline?

Multi-agent workflows rarely execute linearly. After planning, two specialist analysts run in parallel over the same evidence. Their outputs merge into verification, which conditionally triggers another research round. A pipeline would serialize unnecessarily; a DAG with asyncio.wait(FIRST_COMPLETED) maximizes throughput.

Why Tiered Model Routing?

Model names change, endpoints go down, pricing shifts. Each agent is assigned a capability tier (e.g., tier3_analyst). The router resolves this to the best available model at runtime with fallback chains:

Tier 3: gemini-2.5-flash → gh:gpt-4o → openai:gpt-4o → anthropic:claude-sonnet

Health-weighted selection, adaptive cooldowns, and circuit breakers ensure resilient execution.

Why Rubric-Based Scoring?

LLM outputs resist binary pass/fail evaluation. The scoring system uses weighted criteria, multidimensional classification (S/A/B/C/D/F tiers), and LLM-as-judge for subjective quality assessment.

Workflow Definitions

The engine ships with 6 production workflow definitions:

Workflow Pattern Description
code_review Fan-out / fan-in Parse code → parallel architecture + quality reviews → synthesis
bug_resolution Sequential with verification Reproduce → root cause → fix → test → verify
fullstack_generation Parallel sub-steps API design → frontend + backend in parallel → integration
iterative_review Multi-loop with bounded iteration Review → feedback → revise until quality gates pass
conditional_branching Conditional DAG Steps execute or skip based on runtime conditions
test_deterministic Tier-0 only Deterministic step for testing without LLM calls

Quick Start

# Clone and setup
git clone https://github.com/tafreeman/agentic-runtimes.git
cd agentic-runtimes
just setup

# Configure environment (or use AGENTIC_NO_LLM=1 for placeholder mode)
cp .env.example .env
# Add your API keys to .env

# Run a workflow
cd agentic-workflows-v2
agentic run test_deterministic --input test-input.json

# Start the dashboard
uvicorn agentic_v2.server.app:create_app --factory --reload --port 8010
# In another terminal:
cd ui && npm run dev

See Quick Start for the full guide.

Documentation

For New Developers

  1. Quick Start — 5-minute to 1-hour first-run path
  2. Architecture Overview — System map and load-bearing mechanisms
  3. Development Guide — Prerequisites, installation, dev servers, testing

For Feature Work

Reference

Development

Code quality toolchain enforced via pre-commit hooks:

pre-commit run --all-files  # black, isort, ruff, mypy, docformatter

Run tests:

cd agentic-workflows-v2
pytest tests/ -v --cov=agentic_v2

cd ../agentic-v2-eval
pytest tests/ -v

See CONTRIBUTING.md for the full contributor workflow.

License

This project is licensed under the MIT License — see LICENSE for details.

By the numbers

187K
Lines of Python
379
Tests passing
17
ADRs
8+
LLM providers
6
Production workflows

What developers say

The DAG semantics finally let me reason about an agent system the way I reason about a build graph. Conditional branches, cascade failures, and parallel fan-out are first-class — not glued on.

— Contributor

Rubric-based gating replaced our brittle string-match assertions overnight. We now ship LLM features behind the same coverage_score >= 0.80 threshold that gates the rest of the platform.

— Contributor

AGENTIC_NO_LLM=1 is the unsung hero. Local development without burning tokens, plus CI that runs the full suite without a single secret in the repo. Federal-friendly by default.

— Contributor


Where to go next

Getting started

Architecture

Reference


Read the architecture deep-dive →

The runtime engine combines two execution backends, an adapter registry, a tiered model router, and a full RAG pipeline. The architecture document is the canonical map of how those pieces fit together.

Open the architecture overview