Workflow Authoring Guide¶
How to write, validate, and run YAML workflow definitions for the agentic-workflows-v2 engine.
Overview¶
Workflows are declarative YAML files that define a directed acyclic graph (DAG) of agent-backed steps. Each step runs an agent (deterministic or LLM-backed), receives inputs from upstream steps or workflow parameters, and produces outputs consumed by downstream steps. The DAG executor (Kahn's algorithm) schedules steps with maximum parallelism -- any step whose dependencies are satisfied runs immediately.
Workflow definitions live in:
At runtime the system chains together:
- WorkflowLoader -- parses YAML into a
WorkflowDefinitioncontaining a validatedDAG, typed inputs/outputs, capability metadata, and optional evaluation config. - DAGExecutor -- schedules and runs steps in parallel, respecting dependency edges and
whenconditions. - ExpressionEvaluator -- resolves
${...}variable references and boolean conditions at runtime. - WorkflowRunner -- the top-level orchestrator that validates inputs, seeds context, executes the DAG, and resolves declared outputs.
Workflow Structure¶
A workflow YAML file has these top-level keys:
| Key | Required | Description |
|---|---|---|
name |
Yes | Unique workflow identifier (must match filename without .yaml) |
version |
Yes | Semantic version string, e.g. "1.0" |
description |
Yes | Human-readable purpose of the workflow |
inputs |
Yes | Typed input parameter declarations |
outputs |
Yes | Output mappings from step results to workflow-level outputs |
steps |
Yes | Ordered list of step definitions (the DAG nodes) |
capabilities |
No | Input/output name lists for dataset-workflow compatibility matching |
evaluation |
No | Inline rubric for scoring workflow quality |
experimental |
No | Boolean flag; when true, the workflow is hidden from list_workflows() by default |
_templates |
No | YAML anchor definitions for DRY step templates (ignored by loader) |
tools |
No | Workflow-level tool declarations (tool type, tier) |
Minimal skeleton¶
name: my_workflow
description: A short description of what this workflow does
version: "1.0"
inputs:
topic:
type: string
description: The research topic
required: true
steps:
- name: step_one
agent: tier2_coder
description: Generate code from the topic
inputs:
topic: ${inputs.topic}
outputs:
code: generated_code
outputs:
code:
from: ${steps.step_one.outputs.code}
Input Declarations¶
Each key under inputs: declares a workflow parameter. The loader validates supplied values at runtime and applies defaults.
inputs:
feature_spec:
type: string # string | number | object | array
description: Natural language description of the feature
required: true # default: true
review_depth:
type: string
enum: [quick, standard, deep] # constrain to allowed values
default: standard # applied when caller omits this input
config:
type: object
description: Configuration object
default:
frontend: react
backend: fastapi
seed_urls:
type: array
description: Optional URLs to seed retrieval
required: false
default: []
Supported type values: string, number, object, array. Types are advisory; runtime validation checks required and enum constraints.
Output Declarations¶
Each key under outputs: maps a workflow-level output name to a from: expression that resolves against step results.
outputs:
# Simple: single expression
review:
from: ${steps.review_code.outputs.review}
# Optional outputs resolve to null without error when the source step was skipped
summary:
from: ${steps.generate_summary.outputs.summary}
optional: true
# Composite: map of expressions assembled into a single dict
all_code:
from:
backend: ${coalesce(steps.rework.outputs.backend, steps.generate.outputs.backend)}
frontend: ${steps.generate.outputs.frontend}
Step Definition¶
Each entry in steps: defines a DAG node. The required and optional fields are:
| Field | Required | Type | Description |
|---|---|---|---|
name |
Yes | string | Unique step identifier within this workflow |
agent |
Yes | string | Agent name in tier{N}_{role} format (e.g. tier2_coder) |
description |
Yes | string | What this step does -- passed to the LLM as task context |
inputs |
Yes | mapping | Maps step-local input names to ${...} expressions |
outputs |
Yes | mapping | Maps step output keys to context variable names |
depends_on |
No | list | Step names that must complete before this step runs |
when |
No | string | Boolean ${...} expression; step runs only if true |
tools |
No | list | Explicit tool allowlist (omit for tier-default tools) |
prompt_file |
No | string | Override persona prompt file (relative to prompts/) |
model_override |
No | string | Pin a specific model, e.g. gemini:gemini-2.5-flash |
loop_until |
No | string | ${...} expression; step re-executes until true |
loop_max |
No | integer | Max loop iterations (default: 3) |
Complete annotated example¶
steps:
- name: review_code # unique step ID
agent: tier3_reviewer # LLM tier 3, reviewer role
description: >- # multiline YAML string
Review all generated code for correctness,
security, and style compliance.
prompt_file: reviewer.md # optional persona override
tools: [file_read, grep, code_analysis] # explicit tool allowlist
depends_on: [generate_api, generate_frontend] # wait for both
when: ${inputs.review_depth} != 'quick' # conditional execution
inputs:
backend: ${steps.generate_api.outputs.api_code}
frontend: ${steps.generate_frontend.outputs.ui_code}
outputs:
review_report: code_review # stored in context as "code_review"
suggested_fixes: fixes
Agent naming convention¶
Agent names follow the pattern tier{N}_{role}:
| Tier | Behavior | Token limit | Example agents |
|---|---|---|---|
tier0 |
Deterministic Python (no LLM call) | 0 | tier0_parser |
tier1 |
Lightweight LLM | 4,096 | tier1_linter, tier1_assembler |
tier2 |
Balanced LLM | 8,192 | tier2_coder, tier2_researcher |
tier3 |
Strong LLM | 16,384 | tier3_architect, tier3_reviewer |
tier4 |
Heavy LLM | 16,384 | tier4_writer |
tier5 |
Maximum capability | 32,768 | tier5_synthesizer |
The role suffix (e.g. coder, reviewer) maps to a persona prompt file in agentic_v2/prompts/{role}.md. If no matching file exists, default.md is used.
Expression Language¶
The engine supports ${...} expressions for variable references, function calls, and boolean conditions. Expressions are evaluated by ExpressionEvaluator using a restricted Python AST whitelist -- no arbitrary code execution is possible.
Variable references¶
Reference workflow inputs and step outputs using dotted paths:
# Workflow inputs
topic: ${inputs.topic}
# Step outputs (most common form)
ast: ${steps.parse_code.outputs.ast}
# Nested access
status: ${steps.review_code.outputs.review_report.overall_status}
The coalesce() function¶
Returns the first non-null argument -- essential for conditional/bounded workflows where some steps may have been skipped:
# Pick the latest available evidence, falling back through rounds
evidence: ${coalesce(
steps.round3.outputs.evidence,
steps.round2.outputs.evidence,
steps.round1.outputs.evidence
)}
Boolean logic in when: conditions¶
The when: field accepts boolean expressions. Steps with a when: that evaluates to false are skipped.
# Equality / inequality
when: ${inputs.review_depth} != 'quick'
# List membership
when: ${steps.review_code.outputs.overall_status} not in ['APPROVED', 'APPROVED_WITH_NOTES']
# Boolean operators (and, or, not)
when: ${inputs.max_rounds} >= 2 and not ${steps.audit_round1.outputs.gate_passed}
# Compound conditions
when: >-
${steps.review_r1.outputs.overall_status} not in ['APPROVED', 'APPROVED_WITH_NOTES']
and ${steps.review_r2.outputs.overall_status} not in ['APPROVED', 'APPROVED_WITH_NOTES']
Null-safe chaining¶
When a step is skipped (its when: evaluated to false), accessing its outputs does not raise an error. The engine uses a _NullSafe sentinel that:
- Returns
_NullSafe()for any attribute access (allows deep chaining) - Evaluates to
Falsein boolean context - Equals
Nonein comparisons - Is filtered out by
coalesce()(treated as null)
This means expressions like ${steps.skipped_step.outputs.some_value} safely resolve to None instead of crashing.
Supported operators¶
| Category | Operators |
|---|---|
| Comparison | ==, !=, <, <=, >, >= |
| Membership | in, not in |
| Boolean | and, or, not |
| Arithmetic | +, -, *, /, % |
| Identity | is, is not |
Security model¶
Expression evaluation is secured through an AST whitelist. The engine parses expressions into a Python AST and rejects any node type not in the allowed set. This means:
- No imports or module access
- No function calls except
coalesce() - No attribute assignment
- No lambda, comprehension, or generator expressions
- All evaluation runs with
__builtins__set to{}
Internally, ast.parse() and compile() are used with the restricted whitelist; ast.literal_eval principles are followed but the evaluator supports a broader set of comparison and boolean operations.
Execution Patterns¶
Sequential (depends_on chaining)¶
Steps run one after another. Each step waits for its dependency to complete.
steps:
- name: parse
agent: tier0_parser
description: Parse the input code
inputs:
file_path: ${inputs.code_file}
outputs:
ast: parsed_ast
- name: analyze
agent: tier1_analyzer
description: Analyze code complexity
depends_on: [parse]
inputs:
ast: ${steps.parse.outputs.ast}
outputs:
report: complexity_report
- name: summarize
agent: tier2_summarizer
description: Produce human-readable summary
depends_on: [analyze]
inputs:
report: ${steps.analyze.outputs.report}
outputs:
summary: final_summary
Execution: parse -> analyze -> summarize
Fan-out / Fan-in (parallel steps merging)¶
Multiple steps with the same dependency run in parallel. A downstream step waits for all of them.
steps:
- name: design
agent: tier3_architect
description: Design the system architecture
inputs:
spec: ${inputs.feature_spec}
outputs:
api_spec: api_design
db_schema: database_schema
components: frontend_components
# These three run IN PARALLEL (all depend only on design)
- name: generate_api
agent: tier2_coder
description: Generate backend API
depends_on: [design]
inputs:
api_spec: ${steps.design.outputs.api_spec}
outputs:
api_code: backend_code
- name: generate_frontend
agent: tier2_coder
description: Generate frontend components
depends_on: [design]
inputs:
components: ${steps.design.outputs.components}
outputs:
ui_code: frontend_code
- name: generate_migrations
agent: tier1_generator
description: Generate database migrations
depends_on: [design]
inputs:
schema: ${steps.design.outputs.db_schema}
outputs:
migrations: db_migrations
# Fan-in: waits for ALL parallel steps
- name: integrate
agent: tier2_tester
description: Generate integration tests
depends_on: [generate_api, generate_frontend, generate_migrations]
inputs:
backend: ${steps.generate_api.outputs.api_code}
frontend: ${steps.generate_frontend.outputs.ui_code}
migrations: ${steps.generate_migrations.outputs.migrations}
outputs:
tests: integration_tests
Execution:
design ──┬── generate_api ────────┐
├── generate_frontend ───┤── integrate
└── generate_migrations ─┘
Bounded iteration (loop_until + loop_max)¶
A step re-executes until a condition is met or the iteration cap is reached. Used for QA rework loops.
- name: qa_rework_loop
agent: tier2_coder
description: Run tests, review, and rework until passing
depends_on: [build_verify]
loop_until: >-
${steps.qa_rework_loop.outputs.review_report.overall_status} in ['APPROVED']
and ${steps.qa_rework_loop.outputs.overall_test_status} in ['PASS']
loop_max: 2
inputs:
backend: ${steps.implement_backend.outputs.backend_code}
tests: ${steps.scaffold_tests.outputs.test_stubs}
outputs:
backend_code: qa_backend
review_report: qa_review
overall_test_status: qa_status
Conditional execution (when: expressions)¶
Steps with a when: condition are skipped (status = SKIPPED) when the condition evaluates to false. Downstream steps that depend on a skipped step still run -- the skipped step's outputs resolve to None.
# Only run deep analysis when depth is not "quick"
- name: regression_check
agent: tier1_analyzer
description: Analyze the fix for regressions
depends_on: [generate_fix]
when: ${inputs.resolution_depth} != 'quick'
inputs:
fix: ${steps.generate_fix.outputs.fix}
outputs:
regression_risks: regression_risks
# Conditional rework: only if review did NOT approve
- name: developer_rework
agent: tier2_coder
description: Rework code from review feedback
depends_on: [review_code]
when: ${steps.review_code.outputs.overall_status} not in ['APPROVED', 'APPROVED_WITH_NOTES']
inputs:
backend: ${steps.generate_api.outputs.api_code}
review_report: ${steps.review_code.outputs.raw_response}
outputs:
backend_code: reworked_backend
DAG (complex dependency graphs)¶
Real workflows combine all patterns. The deep_research workflow, for example, uses:
- Sequential setup stages (
intake_scope->source_policy) - Per-round fan-out (
analyst_aiandanalyst_swerun in parallel) - Per-round fan-in (
cove_verifydepends on both analysts) - Conditional rounds (
when:gates ongate_passedfrom the prior round) coalesce()in the final synthesis to pick the latest round's outputs
Advanced Features¶
YAML anchors for DRY templates¶
Use YAML anchors (&name) and merge keys (<<: *name) to avoid repeating agent/tool configurations across similar steps. The loader ignores the _templates top-level key; PyYAML resolves merges before the dict reaches Python.
_templates:
retrieval_step: &retrieval_step
agent: tier2_researcher
tools: [web_search, http_get, context_store]
verify_step: &verify_step
agent: tier3_reviewer
tools: [web_search, http_get, context_store]
steps:
- <<: *retrieval_step
name: retrieval_round1
description: Gather evidence (round 1)
depends_on: [plan_round1]
inputs:
search_plan: ${steps.plan_round1.outputs.search_plan}
outputs:
evidence: evidence_round1
- <<: *retrieval_step
name: retrieval_round2
description: Gather evidence (round 2)
depends_on: [plan_round2]
inputs:
search_plan: ${steps.plan_round2.outputs.search_plan}
outputs:
evidence: evidence_round2
Each step inherits agent and tools from the anchor but defines its own name, description, depends_on, inputs, and outputs.
Inline evaluation rubrics¶
Workflows can embed scoring rubrics for automated quality assessment. Each criterion defines a 1-5 scale, weight, and critical floor.
evaluation:
rubric_id: code_review_v1
scoring_profile: B
criteria:
- name: correctness
definition: Review output correctness and requirement alignment.
evidence_required:
- Requirement-to-review mapping
- No contradiction with code facts
scale:
"1": Major requirement failures
"2": Multiple significant errors
"3": Minimum acceptable correctness
"4": Accurate with minor issues
"5": Fully correct and robust
weight: 0.35
critical_floor: 0.70
formula_id: zero_one
- name: code_quality
definition: Quality and actionability of code feedback.
evidence_required:
- Specific issues identified
scale:
"1": No useful feedback
"5": Comprehensive high-value feedback
weight: 0.30
critical_floor: 0.80
formula_id: zero_one
Rules:
- Criterion weights must sum to 1.0 (+/- 0.01).
- critical_floor must be in [0.0, 1.0].
- formula_id must be a registered normalization formula (e.g. zero_one).
Agent tier and model override¶
The agent tier is inferred from the tier{N}_ prefix in the agent name. To pin a specific model for a step, use model_override:
- name: source_policy
agent: tier2_researcher
model_override: env:DEEP_RESEARCH_SMALL_MODEL|gemini:gemini-2.0-flash-lite
description: Establish trusted-source policy
The model_override format supports environment variable resolution with a fallback:
If the environment variable is set, its value is used. Otherwise, the fallback after | is used.
Tool allowlisting per step¶
By default, a step can use all tools available at its tier level. To restrict to specific tools, use the tools: field:
- name: retrieval
agent: tier2_researcher
tools: [web_search, http_get, context_store] # only these three
# ...
- name: analysis
agent: tier3_analyst
tools: [context_store] # read-only access to context
# ...
Omitting tools: allows all tools at or below the step's tier. Setting tools: [] disables all tools.
Prompt file override¶
Override the default persona prompt (derived from the agent role) with a specific Markdown file:
- name: implement_shared
agent: tier2_coder
prompt_file: developer.md # uses prompts/developer.md instead of prompts/coder.md
Capabilities metadata¶
The capabilities block lists the workflow's input/output names for compatibility matching with datasets:
capabilities:
inputs: [feature_spec, tech_stack]
outputs: [feature_package, review_report, all_code]
This is used by the evaluation framework to match workflows to compatible datasets.
Validation¶
Validate a workflow YAML file before running it:
The validation pipeline checks:
1. YAML syntax -- valid YAML parsing.
2. Required top-level keys -- name, steps must be present.
3. Step schema -- every step must have name and agent.
4. Dependency existence -- every depends_on target must be a defined step name.
5. Cycle detection -- DFS three-color algorithm rejects any circular dependencies.
6. Evaluation constraints -- criterion weights sum to 1.0, critical floors in [0,1], formula IDs registered.
Programmatic validation:
from agentic_v2.workflows.loader import WorkflowLoader
loader = WorkflowLoader()
workflow = loader.load("my_workflow")
workflow.dag.validate() # raises on structural errors
Examples¶
Example 1: Simple 2-step sequential¶
A minimal workflow that parses a code file and produces a complexity report.
name: simple_analysis
description: Parse a code file and report its complexity metrics
version: "1.0"
inputs:
code_file:
type: string
description: Path to the source file to analyze
required: true
steps:
- name: parse
agent: tier0_parser
description: Parse and extract code structure
inputs:
file_path: ${inputs.code_file}
outputs:
ast: parsed_ast
metrics: code_metrics
- name: report
agent: tier1_analyzer
description: Produce a human-readable complexity report
depends_on: [parse]
inputs:
ast: ${steps.parse.outputs.ast}
metrics: ${steps.parse.outputs.metrics}
outputs:
report: complexity_report
outputs:
report:
from: ${steps.report.outputs.report}
Example 2: 3-step fan-out with merge¶
Three specialist agents analyze a codebase in parallel, then a synthesis step merges their findings.
name: parallel_review
description: Run security, performance, and style analysis in parallel then merge
version: "1.0"
inputs:
code_file:
type: string
description: Path to the source file
required: true
steps:
- name: parse
agent: tier0_parser
description: Parse the source file
inputs:
file_path: ${inputs.code_file}
outputs:
ast: parsed_ast
- name: security_scan
agent: tier2_reviewer
description: Analyze code for security vulnerabilities
depends_on: [parse]
tools: [file_read, grep]
inputs:
ast: ${steps.parse.outputs.ast}
outputs:
findings: security_findings
- name: perf_analysis
agent: tier2_reviewer
description: Identify performance bottlenecks
depends_on: [parse]
tools: [file_read, code_analysis]
inputs:
ast: ${steps.parse.outputs.ast}
outputs:
findings: perf_findings
- name: style_check
agent: tier1_linter
description: Check code style and formatting
depends_on: [parse]
inputs:
ast: ${steps.parse.outputs.ast}
outputs:
issues: style_issues
- name: synthesize
agent: tier2_summarizer
description: Merge all analysis results into a unified report
depends_on: [security_scan, perf_analysis, style_check]
inputs:
security: ${steps.security_scan.outputs.findings}
performance: ${steps.perf_analysis.outputs.findings}
style: ${steps.style_check.outputs.issues}
outputs:
report: unified_report
outputs:
report:
from: ${steps.synthesize.outputs.report}
Example 3: Bounded review cycle with conditional rework¶
A code generation workflow with up to 2 review-rework passes. If the first review approves, no rework happens. Otherwise, rework is applied and a second review runs. The final assembly always picks the best available code via coalesce().
name: codegen_with_review
description: Generate code with bounded review cycle (max 2 passes)
version: "1.0"
inputs:
feature_spec:
type: string
description: Feature description
required: true
steps:
# Phase 1: Generate
- name: generate
agent: tier2_coder
description: Generate code from the feature spec
inputs:
spec: ${inputs.feature_spec}
outputs:
code: generated_code
# Phase 2: Review pass 1
- name: review_pass1
agent: tier3_reviewer
description: Review generated code (pass 1)
depends_on: [generate]
inputs:
code: ${steps.generate.outputs.code}
outputs:
review_report: review_r1
suggested_fixes: fixes_r1
# Phase 3: Conditional rework (only if not approved)
- name: rework
agent: tier2_coder
description: Apply fixes from review feedback
depends_on: [review_pass1]
when: ${steps.review_pass1.outputs.overall_status} not in ['APPROVED', 'APPROVED_WITH_NOTES']
inputs:
code: ${steps.generate.outputs.code}
review_report: ${steps.review_pass1.outputs.review_report}
fixes: ${steps.review_pass1.outputs.suggested_fixes}
outputs:
code: reworked_code
# Phase 4: Review pass 2 (only if rework happened)
- name: review_pass2
agent: tier3_reviewer
description: Re-review after rework (pass 2)
depends_on: [rework]
when: ${steps.review_pass1.outputs.overall_status} not in ['APPROVED', 'APPROVED_WITH_NOTES']
inputs:
code: ${coalesce(steps.rework.outputs.code, steps.generate.outputs.code)}
previous_review: ${steps.review_pass1.outputs.review_report}
outputs:
review_report: review_r2
# Phase 5: Final assembly (always runs)
- name: assemble
agent: tier1_assembler
description: Assemble the final deliverable from best available code
depends_on: [review_pass1, review_pass2]
inputs:
code: ${coalesce(steps.rework.outputs.code, steps.generate.outputs.code)}
review: ${coalesce(steps.review_pass2.outputs.review_report, steps.review_pass1.outputs.review_report)}
outputs:
package: final_package
outputs:
package:
from: ${steps.assemble.outputs.package}
review_status:
from: ${coalesce(steps.review_pass2.outputs.review_report, steps.review_pass1.outputs.review_report)}
optional: true
Execution flow:
generate -> review_pass1 ─┬─ [APPROVED] -> assemble
└─ [NOT APPROVED] -> rework -> review_pass2 -> assemble
Quick Reference¶
Running a workflow¶
# CLI
agentic run <workflow_name> --input params.json
agentic validate <workflow_name>
agentic list workflows
# Python
from agentic_v2.workflows.runner import run_workflow
result = await run_workflow("code_review", code_file="main.py")
Checklist for new workflows¶
-
namematches the YAML filename (without.yaml) - Every step has
name,agent,description,inputs,outputs - All
depends_ontargets are valid step names - No dependency cycles
-
when:conditions use valid${...}expression syntax -
coalesce()is used wherever a step may have been skipped - Evaluation criterion weights sum to 1.0 (if
evaluation:is present) - Validated with
agentic validate <workflow_name>before committing