Onboarding Guide¶

Audience: New contributors on their first clone. Outcome: By the end you will have run a workflow, opened the dashboard, scored a result, and know where to find things next time. Last verified: 2026-04-22

Welcome to tafreeman/agentic-runtimes -- a monorepo for multi-agent workflow orchestration, evaluation, and shared LLM utilities.

This repository serves a dual mission:

Working platform -- a production-grade agentic AI runtime with a dual execution engine, 7 agent personas, a full RAG pipeline, and an evaluation framework.
Educational portfolio -- a living reference for team onboarding at Deloitte, demonstrating enterprise-grade practices for cleared federal environments.

This guide has five independent sections. The first (Quick Start) gets a workflow running in about 5 minutes. Working through all five sections takes roughly an hour. Stop wherever you have what you need.

If you prefer a prebuilt environment, open the repository in the provided devcontainer. Otherwise, use the root justfile commands described below for the same bootstrap and test flow.

Prerequisites¶

Before starting, make sure you have:

Requirement	Version	Check
Python	3.11+	`python --version`
Node.js	20+	`node --version`
Git	any recent	`git --version`
pip	latest	`pip --version`

You also need at least one LLM provider API key. The cheapest way to get started:

Provider	Variable	Free tier?	Get a key
GitHub Models	`GITHUB_TOKEN`	Yes	github.com/settings/tokens
Google Gemini	`GEMINI_API_KEY`	Yes (rate-limited)	aistudio.google.com/app/apikey
OpenAI	`OPENAI_API_KEY`	No (paid)	platform.openai.com/api-keys
Anthropic	`ANTHROPIC_API_KEY`	No (paid)	console.anthropic.com/settings/keys

Any single key is enough. The smart router will use whichever providers are configured.

Quick Start (5 minutes)¶

1. Clone the repo¶

git clone https://github.com/tafreeman/agentic-runtimes.git
cd agentic-runtimes

2. Create your `.env` file¶

cp .env.example .env

Open .env in your editor and paste in at least one API key. For example:

GITHUB_TOKEN=ghp_your_token_here

3. Bootstrap the workspace¶

From the repo root:

just setup

This installs the root helpers, the agentic-workflows-v2 package, the eval package, and the UI dependencies in one pass.

4. Verify the installation¶

Collect tests without running them -- this confirms imports work and pytest can find the test suite:

python -m pytest agentic-workflows-v2/tests/ -q --co

You should see output like N tests collected (on the order of 2000+ as of 2026-04-22). For the full local gate, run just test from the repo root.

5. List available workflows¶

agentic list workflows

Expected output -- a table showing the 6 built-in workflow definitions:

                Available Workflows
+-----------------------------------------+-------+
| Name                | Description       | Steps |
+---------------------+-------------------+-------+
| code_review         | Automated code... |     5 |
| bug_resolution      | Bug and defect... |     5 |
| test_deterministic  | Simple determ...  |     2 |
| ...                                             |
+-----------------------------------------+-------+

You can also explore agents, tools, and adapters:

agentic list agents
agentic list tools
agentic list adapters

Your First Workflow Run (10 minutes)¶

The simplest workflow is test_deterministic -- it requires no LLM calls and runs entirely with tier-0 (deterministic) agents.

1. Create an input file¶

Create a file called test_input.json:

{
  "input_text": "Hello from the onboarding guide!"
}

2. Dry run (validate without executing)¶

agentic run test_deterministic --input test_input.json --dry-run --verbose

This shows the execution plan as a DAG tree without making any calls:

test_deterministic - Execution Plan
Level 0: step1 (tier0_process)
Level 1: step2 (tier0_counter) <- [step1]

Dry run - skipping execution

3. Execute the workflow¶

agentic run test_deterministic --input test_input.json --verbose

You should see a spinner followed by a success status:

Status: SUCCESS
Elapsed: 0.1s

4. Save results to a file¶

agentic run test_deterministic --input test_input.json --output result.json

The output JSON contains the workflow result, step statuses, and elapsed time.

Running a workflow that uses LLM calls¶

To run an LLM-powered workflow like code_review, create an input file:

{
  "code_file": "agentic-workflows-v2/agentic_v2/cli/main.py",
  "review_depth": "quick"
}

Then:

agentic run code_review --input code_review_input.json --verbose

This will call your configured LLM provider. If you hit rate limits, the smart router will retry with exponential backoff or fall back to another configured provider.

Comparing execution engines¶

The repo has two execution engines. You can compare them side-by-side:

agentic compare code_review --input code_review_input.json --adapters native,langchain

Understanding the Architecture (15 minutes)¶

Three independent packages¶

The monorepo contains three Python packages with zero cross-package imports:

prompts/
+-- agentic-workflows-v2/     # Main runtime (Python 3.11+, hatchling)
|   +-- agentic_v2/            # Source code
|   +-- tests/                 # 100+ files
|   +-- ui/                    # React 19 dashboard
+-- agentic-v2-eval/           # Evaluation framework (Python 3.10+, setuptools)
+-- tools/                     # Shared LLM client, benchmarks (Python 3.10+, setuptools)

Each package has its own pyproject.toml, installs independently, and can be developed in isolation.

The repo root also provides a canonical justfile, .devcontainer/, and docker-compose.yml for contributors who want a reproducible local environment.

Inside `agentic_v2/` -- the main runtime¶

agentic_v2/
+-- agents/          # BaseAgent + specialized implementations (Coder, Architect, Reviewer, Orchestrator)
+-- adapters/        # Pluggable execution engine backends
+-- core/            # Protocols, memory, context, contracts, errors
+-- engine/          # Native DAG executor (Kahn's algorithm)
+-- langchain/       # LangGraph state-machine executor
+-- models/          # SmartModelRouter -- LLM tier routing across 8+ providers
+-- rag/             # Full RAG pipeline (13 modules: load, chunk, embed, index, retrieve, assemble)
+-- contracts/       # Pydantic v2 I/O models (additive-only -- never remove fields)
+-- prompts/         # 7 agent persona definitions (.md files)
+-- server/          # FastAPI + WebSocket/SSE streaming
+-- tools/builtin/   # 11 built-in tool modules (file_read, shell, grep, etc.)
+-- workflows/
    +-- definitions/ # 6 YAML workflow definitions
+-- cli/             # Typer CLI (the `agentic` command)

Dual execution engine¶

Two engines can run the same YAML workflow:

Engine	Implementation	How it works
Native	`engine/dag_executor.py`	Kahn's algorithm topological sort. Runs steps with `asyncio.wait(FIRST_COMPLETED)` for maximum parallelism.
LangChain	`langchain/`	Wraps LangGraph state machines. Steps become nodes in a compiled graph.

Both engines are registered in the AdapterRegistry singleton. Select an engine with --adapter:

agentic run code_review --input input.json --adapter native
agentic run code_review --input input.json --adapter langchain   # default

The execution pipeline¶

Here is the flow from YAML to output:

Workflow YAML  -->  WorkflowConfig (Pydantic model)
                         |
                    AdapterRegistry
                    /            \
            NativeEngine    LangChainEngine
                    \            /
               Agent + Persona + Tools
                         |
                    LLM Provider
                  (via SmartRouter)
                         |
                    Step Outputs
                         |
                    WorkflowResult

Workflow YAML declares steps, inputs, outputs, and dependencies.
WorkflowConfig parses and validates the YAML into a Pydantic model.
AdapterRegistry dispatches to the chosen execution engine.
Each step invokes an agent with a persona (markdown prompt) and optional tools.
The SmartModelRouter selects the best LLM provider based on tier, availability, and rate limits.
Step outputs flow into downstream steps via ${steps.X.outputs.Y} expressions.

Where to find things¶

I want to...	Look in...
Add a new workflow	`agentic_v2/workflows/definitions/` -- create a new YAML file
Add a new agent persona	`agentic_v2/prompts/` -- create a new `.md` file
Understand how agents work	`agentic_v2/agents/base.py` (BaseAgent)
See how LLM routing works	`agentic_v2/models/smart_router.py`
See protocol definitions	`agentic_v2/core/protocols.py`
Add a new built-in tool	`agentic_v2/tools/builtin/`
Understand the RAG pipeline	`agentic_v2/rag/` (13 modules, start with `protocols.py`)
Run or modify evaluations	`agentic-v2-eval/`
Use the shared LLM client	`tools/llm/` (`from tools.llm import LLMClient`)

Creating Your First Custom Persona (10 minutes)¶

Agent personas are markdown files in agentic_v2/prompts/. Each persona defines how an agent behaves when assigned to a workflow step.

Required sections¶

Every persona must include these sections:

Opening line -- a one-sentence role definition
## Your Expertise (or ## Expertise) -- what the agent knows
## Reasoning Protocol -- step-by-step reasoning process before responding
## Output Format -- exact structure of the agent's output
## Boundaries -- what the agent does NOT do
## Critical Rules -- non-negotiable constraints

Example: creating a `security_auditor` persona¶

Create agentic_v2/prompts/security_auditor.md:

You are a Security Auditor specializing in application security for Python and TypeScript codebases.

## Your Expertise

- OWASP Top 10 vulnerability identification
- Static analysis of Python and TypeScript code
- Secret detection and credential hygiene
- Input validation and injection prevention
- Authentication and authorization patterns

## Reasoning Protocol

Before generating your response:
1. Identify all user-facing input surfaces in the code under review
2. Check each input for validation, sanitization, and parameterization
3. Scan for hardcoded secrets, API keys, or credentials
4. Evaluate authentication and authorization logic for bypass vectors
5. Assess error handling for information leakage

## Output Format

```json
{
  "findings": [
    {
      "severity": "CRITICAL | HIGH | MEDIUM | LOW",
      "category": "OWASP category",
      "file": "path/to/file.py",
      "line": 42,
      "description": "what the issue is",
      "recommendation": "how to fix it"
    }
  ],
  "summary": "overall security posture assessment",
  "pass": true | false
}

Boundaries¶

Does not fix code -- only identifies and reports issues
Does not review business logic or performance
Does not make architectural recommendations

Critical Rules¶

Never suggest disabling security controls as a fix
Always flag hardcoded secrets as CRITICAL regardless of context
Mark any SQL string concatenation as HIGH severity

If no issues are found, explicitly state "no findings" rather than omitting the section

### Using the persona in a workflow

Reference the persona in a workflow step via the `agent` field. The agent name corresponds to a tier prefix plus a role name. Custom personas can be referenced by mapping them in the agent configuration.

---

## Running the Dashboard (5 minutes)

The UI is a React 19 application with React Flow for workflow visualization.

### 1. Install frontend dependencies

```bash
cd agentic-workflows-v2/ui
npm install

2. Start the backend server¶

In a separate terminal:

cd agentic-workflows-v2
python -m uvicorn agentic_v2.server.app:app --host 127.0.0.1 --port 8010 --app-dir src

Or use the CLI shortcut:

agentic serve --port 8010 --dev

3. Start the frontend dev server¶

cd agentic-workflows-v2/ui
npm run dev

Open http://localhost:5173 in your browser. You should see:

Workflow list -- 6 YAML-defined workflows
Workflow graph -- React Flow DAG visualization of steps and dependencies
Execution panel -- run workflows and see real-time streaming results
Evaluations page -- view evaluation results and scoring

Port reference¶

Service	Port	URL
Backend API	8010	`http://127.0.0.1:8010`
Frontend dev server	5173	`http://localhost:5173`

Running Evaluations (10 minutes)¶

The agentic-v2-eval package provides rubric-based scoring for workflow outputs.

1. Install the eval framework¶

cd agentic-v2-eval
pip install -e ".[dev]"

2. Score a workflow result¶

After running a workflow and saving the output (e.g., result.json):

python -m agentic_v2_eval evaluate result.json

This applies the default scoring rubric and prints a score breakdown.

3. Override the rubric¶

python -m agentic_v2_eval evaluate result.json --rubric rubrics/default.yaml

4. Generate a report¶

python -m agentic_v2_eval report result.json --format html --output report.html

Supported formats: json, markdown, html.

5. Use the Python API¶

from agentic_v2_eval import Scorer
from agentic_v2_eval.runners import BatchRunner
from agentic_v2_eval.reporters import generate_html_report

# Score with a rubric
scorer = Scorer("rubrics/default.yaml")
result = scorer.score({"Accuracy": 0.85, "Completeness": 0.9})
print(f"Weighted Score: {result.weighted_score:.2f}")

# Run batch evaluation
runner = BatchRunner(evaluator=my_eval_function)
batch_result = runner.run(test_cases)
print(f"Success rate: {batch_result.success_rate:.1%}")

# Generate an HTML report
generate_html_report(results, "report.html")

How evaluations connect to workflows¶

Each workflow YAML can define an evaluation: block with rubric criteria, weights, and critical floors. For example, from code_review.yaml:

evaluation:
  rubric_id: code_review_v1
  scoring_profile: B
  criteria:
    - name: correctness_rubric
      weight: 0.35
      critical_floor: 0.70
    - name: code_quality
      weight: 0.30
      critical_floor: 0.80

The eval framework uses these criteria to score workflow outputs on a 1-5 scale per dimension, then computes a weighted aggregate.

Key Concepts Quick Reference¶

Concept	One-line summary	Deep dive
Workflow	Declarative YAML defining a multi-step DAG	`agentic_v2/workflows/definitions/*.yaml`
Step	A single unit of work with an agent, inputs, and outputs	See any workflow YAML
Agent	An LLM-backed executor with a persona and optional tools	`agentic_v2/agents/base.py`
Persona	Markdown file defining agent expertise, reasoning, and boundaries	`agentic_v2/prompts/*.md`
Tier	LLM capability level (0=deterministic, 1=fast, 2=standard, 3=powerful)	`agentic_v2/models/smart_router.py`
Adapter	Pluggable execution engine backend (native or langchain)	`agentic_v2/adapters/`
Protocol	`@runtime_checkable` Python Protocol for structural typing	`agentic_v2/core/protocols.py`
Contract	Pydantic model for step I/O (additive-only)	`agentic_v2/contracts/`
Expression	`${steps.X.outputs.Y}` syntax for data flow between steps	`agentic_v2/engine/expressions.py`
Rubric	YAML scoring criteria for evaluating workflow outputs	`agentic-v2-eval/rubrics/`

Full definitions: docs/GLOSSARY.md

Architecture deep dive: docs/ARCHITECTURE.md

Project overview and commands: CLAUDE.md

Getting Help¶

Documentation¶

Document	What it covers
CLAUDE.md	Project overview, commands, environment variables, gotchas
docs/ARCHITECTURE.md	System architecture and design decisions
docs/GLOSSARY.md	Domain-specific term definitions
docs/CODING_STANDARDS.md	Code style, testing, and review standards

Running tests¶

just test
just docs
pre-commit run --all-files

Common gotchas¶

Windows paths -- use forward slashes in Python code. pathlib.Path handles cross-platform automatically.
pytest-asyncio -- tests use asyncio_mode = "auto". All async test functions run without @pytest.mark.asyncio.
LangChain imports -- the LangChain adapter is optional. Guard imports with try/except ImportError.
Port conflicts -- backend uses 8010, frontend uses 5173. Check for conflicts before starting dev servers.
Contract changes -- contracts/ models are additive-only. Never remove or rename fields.
Pydantic v2 -- use model_dump() not .dict(), model_validate() not .parse_obj().

Contributing¶

Create a feature branch from main.
Follow the coding standards and commit format (feat:, fix:, refactor:, etc.).
Write tests first (TDD). Target 80%+ coverage on new backend code where practical; the UI package enforces a 60% floor.
Run just test, just docs, and pre-commit run --all-files before committing.
Open a PR with a clear description and test plan.