Onboarding Guide¶
Audience: New contributors on their first clone. Outcome: By the end you will have run a workflow, opened the dashboard, scored a result, and know where to find things next time. Last verified: 2026-04-22
Welcome to tafreeman/agentic-runtimes -- a monorepo for multi-agent workflow orchestration, evaluation, and shared LLM utilities.
This repository serves a dual mission:
- Working platform -- a production-grade agentic AI runtime with a dual execution engine, 7 agent personas, a full RAG pipeline, and an evaluation framework.
- Educational portfolio -- a living reference for team onboarding at Deloitte, demonstrating enterprise-grade practices for cleared federal environments.
This guide has five independent sections. The first (Quick Start) gets a workflow running in about 5 minutes. Working through all five sections takes roughly an hour. Stop wherever you have what you need.
If you prefer a prebuilt environment, open the repository in the provided devcontainer. Otherwise, use the root justfile commands described below for the same bootstrap and test flow.
Prerequisites¶
Before starting, make sure you have:
| Requirement | Version | Check |
|---|---|---|
| Python | 3.11+ | python --version |
| Node.js | 20+ | node --version |
| Git | any recent | git --version |
| pip | latest | pip --version |
You also need at least one LLM provider API key. The cheapest way to get started:
| Provider | Variable | Free tier? | Get a key |
|---|---|---|---|
| GitHub Models | GITHUB_TOKEN |
Yes | github.com/settings/tokens |
| Google Gemini | GEMINI_API_KEY |
Yes (rate-limited) | aistudio.google.com/app/apikey |
| OpenAI | OPENAI_API_KEY |
No (paid) | platform.openai.com/api-keys |
| Anthropic | ANTHROPIC_API_KEY |
No (paid) | console.anthropic.com/settings/keys |
Any single key is enough. The smart router will use whichever providers are configured.
Quick Start (5 minutes)¶
1. Clone the repo¶
2. Create your .env file¶
Open .env in your editor and paste in at least one API key. For example:
3. Bootstrap the workspace¶
From the repo root:
This installs the root helpers, the agentic-workflows-v2 package, the eval package, and the UI dependencies in one pass.
4. Verify the installation¶
Collect tests without running them -- this confirms imports work and pytest can find the test suite:
You should see output like N tests collected (on the order of 2000+ as of 2026-04-22). For the full local gate, run just test from the repo root.
5. List available workflows¶
Expected output -- a table showing the 6 built-in workflow definitions:
Available Workflows
+-----------------------------------------+-------+
| Name | Description | Steps |
+---------------------+-------------------+-------+
| code_review | Automated code... | 5 |
| bug_resolution | Bug and defect... | 5 |
| test_deterministic | Simple determ... | 2 |
| ... |
+-----------------------------------------+-------+
You can also explore agents, tools, and adapters:
Your First Workflow Run (10 minutes)¶
The simplest workflow is test_deterministic -- it requires no LLM calls and runs entirely with tier-0 (deterministic) agents.
1. Create an input file¶
Create a file called test_input.json:
2. Dry run (validate without executing)¶
This shows the execution plan as a DAG tree without making any calls:
test_deterministic - Execution Plan
Level 0: step1 (tier0_process)
Level 1: step2 (tier0_counter) <- [step1]
Dry run - skipping execution
3. Execute the workflow¶
You should see a spinner followed by a success status:
4. Save results to a file¶
The output JSON contains the workflow result, step statuses, and elapsed time.
Running a workflow that uses LLM calls¶
To run an LLM-powered workflow like code_review, create an input file:
Then:
This will call your configured LLM provider. If you hit rate limits, the smart router will retry with exponential backoff or fall back to another configured provider.
Comparing execution engines¶
The repo has two execution engines. You can compare them side-by-side:
Understanding the Architecture (15 minutes)¶
Three independent packages¶
The monorepo contains three Python packages with zero cross-package imports:
prompts/
+-- agentic-workflows-v2/ # Main runtime (Python 3.11+, hatchling)
| +-- agentic_v2/ # Source code
| +-- tests/ # 100+ files
| +-- ui/ # React 19 dashboard
+-- agentic-v2-eval/ # Evaluation framework (Python 3.10+, setuptools)
+-- tools/ # Shared LLM client, benchmarks (Python 3.10+, setuptools)
Each package has its own pyproject.toml, installs independently, and can be developed in isolation.
The repo root also provides a canonical justfile, .devcontainer/, and docker-compose.yml for contributors who want a reproducible local environment.
Inside agentic_v2/ -- the main runtime¶
agentic_v2/
+-- agents/ # BaseAgent + specialized implementations (Coder, Architect, Reviewer, Orchestrator)
+-- adapters/ # Pluggable execution engine backends
+-- core/ # Protocols, memory, context, contracts, errors
+-- engine/ # Native DAG executor (Kahn's algorithm)
+-- langchain/ # LangGraph state-machine executor
+-- models/ # SmartModelRouter -- LLM tier routing across 8+ providers
+-- rag/ # Full RAG pipeline (13 modules: load, chunk, embed, index, retrieve, assemble)
+-- contracts/ # Pydantic v2 I/O models (additive-only -- never remove fields)
+-- prompts/ # 7 agent persona definitions (.md files)
+-- server/ # FastAPI + WebSocket/SSE streaming
+-- tools/builtin/ # 11 built-in tool modules (file_read, shell, grep, etc.)
+-- workflows/
+-- definitions/ # 6 YAML workflow definitions
+-- cli/ # Typer CLI (the `agentic` command)
Dual execution engine¶
Two engines can run the same YAML workflow:
| Engine | Implementation | How it works |
|---|---|---|
| Native | engine/dag_executor.py |
Kahn's algorithm topological sort. Runs steps with asyncio.wait(FIRST_COMPLETED) for maximum parallelism. |
| LangChain | langchain/ |
Wraps LangGraph state machines. Steps become nodes in a compiled graph. |
Both engines are registered in the AdapterRegistry singleton. Select an engine with --adapter:
agentic run code_review --input input.json --adapter native
agentic run code_review --input input.json --adapter langchain # default
The execution pipeline¶
Here is the flow from YAML to output:
Workflow YAML --> WorkflowConfig (Pydantic model)
|
AdapterRegistry
/ \
NativeEngine LangChainEngine
\ /
Agent + Persona + Tools
|
LLM Provider
(via SmartRouter)
|
Step Outputs
|
WorkflowResult
- Workflow YAML declares steps, inputs, outputs, and dependencies.
- WorkflowConfig parses and validates the YAML into a Pydantic model.
- AdapterRegistry dispatches to the chosen execution engine.
- Each step invokes an agent with a persona (markdown prompt) and optional tools.
- The SmartModelRouter selects the best LLM provider based on tier, availability, and rate limits.
- Step outputs flow into downstream steps via
${steps.X.outputs.Y}expressions.
Where to find things¶
| I want to... | Look in... |
|---|---|
| Add a new workflow | agentic_v2/workflows/definitions/ -- create a new YAML file |
| Add a new agent persona | agentic_v2/prompts/ -- create a new .md file |
| Understand how agents work | agentic_v2/agents/base.py (BaseAgent) |
| See how LLM routing works | agentic_v2/models/smart_router.py |
| See protocol definitions | agentic_v2/core/protocols.py |
| Add a new built-in tool | agentic_v2/tools/builtin/ |
| Understand the RAG pipeline | agentic_v2/rag/ (13 modules, start with protocols.py) |
| Run or modify evaluations | agentic-v2-eval/ |
| Use the shared LLM client | tools/llm/ (from tools.llm import LLMClient) |
Creating Your First Custom Persona (10 minutes)¶
Agent personas are markdown files in agentic_v2/prompts/. Each persona defines how an agent behaves when assigned to a workflow step.
Required sections¶
Every persona must include these sections:
- Opening line -- a one-sentence role definition
## Your Expertise(or## Expertise) -- what the agent knows## Reasoning Protocol-- step-by-step reasoning process before responding## Output Format-- exact structure of the agent's output## Boundaries-- what the agent does NOT do## Critical Rules-- non-negotiable constraints
Example: creating a security_auditor persona¶
Create agentic_v2/prompts/security_auditor.md:
You are a Security Auditor specializing in application security for Python and TypeScript codebases.
## Your Expertise
- OWASP Top 10 vulnerability identification
- Static analysis of Python and TypeScript code
- Secret detection and credential hygiene
- Input validation and injection prevention
- Authentication and authorization patterns
## Reasoning Protocol
Before generating your response:
1. Identify all user-facing input surfaces in the code under review
2. Check each input for validation, sanitization, and parameterization
3. Scan for hardcoded secrets, API keys, or credentials
4. Evaluate authentication and authorization logic for bypass vectors
5. Assess error handling for information leakage
## Output Format
```json
{
"findings": [
{
"severity": "CRITICAL | HIGH | MEDIUM | LOW",
"category": "OWASP category",
"file": "path/to/file.py",
"line": 42,
"description": "what the issue is",
"recommendation": "how to fix it"
}
],
"summary": "overall security posture assessment",
"pass": true | false
}
Boundaries¶
- Does not fix code -- only identifies and reports issues
- Does not review business logic or performance
- Does not make architectural recommendations
Critical Rules¶
- Never suggest disabling security controls as a fix
- Always flag hardcoded secrets as CRITICAL regardless of context
- Mark any SQL string concatenation as HIGH severity
- If no issues are found, explicitly state "no findings" rather than omitting the section
### Using the persona in a workflow Reference the persona in a workflow step via the `agent` field. The agent name corresponds to a tier prefix plus a role name. Custom personas can be referenced by mapping them in the agent configuration. --- ## Running the Dashboard (5 minutes) The UI is a React 19 application with React Flow for workflow visualization. ### 1. Install frontend dependencies ```bash cd agentic-workflows-v2/ui npm install
2. Start the backend server¶
In a separate terminal:
cd agentic-workflows-v2
python -m uvicorn agentic_v2.server.app:app --host 127.0.0.1 --port 8010 --app-dir src
Or use the CLI shortcut:
3. Start the frontend dev server¶
Open http://localhost:5173 in your browser. You should see:
- Workflow list -- 6 YAML-defined workflows
- Workflow graph -- React Flow DAG visualization of steps and dependencies
- Execution panel -- run workflows and see real-time streaming results
- Evaluations page -- view evaluation results and scoring
Port reference¶
| Service | Port | URL |
|---|---|---|
| Backend API | 8010 | http://127.0.0.1:8010 |
| Frontend dev server | 5173 | http://localhost:5173 |
Running Evaluations (10 minutes)¶
The agentic-v2-eval package provides rubric-based scoring for workflow outputs.
1. Install the eval framework¶
2. Score a workflow result¶
After running a workflow and saving the output (e.g., result.json):
This applies the default scoring rubric and prints a score breakdown.
3. Override the rubric¶
4. Generate a report¶
Supported formats: json, markdown, html.
5. Use the Python API¶
from agentic_v2_eval import Scorer
from agentic_v2_eval.runners import BatchRunner
from agentic_v2_eval.reporters import generate_html_report
# Score with a rubric
scorer = Scorer("rubrics/default.yaml")
result = scorer.score({"Accuracy": 0.85, "Completeness": 0.9})
print(f"Weighted Score: {result.weighted_score:.2f}")
# Run batch evaluation
runner = BatchRunner(evaluator=my_eval_function)
batch_result = runner.run(test_cases)
print(f"Success rate: {batch_result.success_rate:.1%}")
# Generate an HTML report
generate_html_report(results, "report.html")
How evaluations connect to workflows¶
Each workflow YAML can define an evaluation: block with rubric criteria, weights, and critical floors. For example, from code_review.yaml:
evaluation:
rubric_id: code_review_v1
scoring_profile: B
criteria:
- name: correctness_rubric
weight: 0.35
critical_floor: 0.70
- name: code_quality
weight: 0.30
critical_floor: 0.80
The eval framework uses these criteria to score workflow outputs on a 1-5 scale per dimension, then computes a weighted aggregate.
Key Concepts Quick Reference¶
| Concept | One-line summary | Deep dive |
|---|---|---|
| Workflow | Declarative YAML defining a multi-step DAG | agentic_v2/workflows/definitions/*.yaml |
| Step | A single unit of work with an agent, inputs, and outputs | See any workflow YAML |
| Agent | An LLM-backed executor with a persona and optional tools | agentic_v2/agents/base.py |
| Persona | Markdown file defining agent expertise, reasoning, and boundaries | agentic_v2/prompts/*.md |
| Tier | LLM capability level (0=deterministic, 1=fast, 2=standard, 3=powerful) | agentic_v2/models/smart_router.py |
| Adapter | Pluggable execution engine backend (native or langchain) | agentic_v2/adapters/ |
| Protocol | @runtime_checkable Python Protocol for structural typing |
agentic_v2/core/protocols.py |
| Contract | Pydantic model for step I/O (additive-only) | agentic_v2/contracts/ |
| Expression | ${steps.X.outputs.Y} syntax for data flow between steps |
agentic_v2/engine/expressions.py |
| Rubric | YAML scoring criteria for evaluating workflow outputs | agentic-v2-eval/rubrics/ |
Full definitions: docs/GLOSSARY.md
Architecture deep dive: docs/ARCHITECTURE.md
Project overview and commands: CLAUDE.md
Getting Help¶
Documentation¶
| Document | What it covers |
|---|---|
| CLAUDE.md | Project overview, commands, environment variables, gotchas |
| docs/ARCHITECTURE.md | System architecture and design decisions |
| docs/GLOSSARY.md | Domain-specific term definitions |
| docs/CODING_STANDARDS.md | Code style, testing, and review standards |
Running tests¶
Common gotchas¶
- Windows paths -- use forward slashes in Python code.
pathlib.Pathhandles cross-platform automatically. - pytest-asyncio -- tests use
asyncio_mode = "auto". All async test functions run without@pytest.mark.asyncio. - LangChain imports -- the LangChain adapter is optional. Guard imports with
try/except ImportError. - Port conflicts -- backend uses 8010, frontend uses 5173. Check for conflicts before starting dev servers.
- Contract changes --
contracts/models are additive-only. Never remove or rename fields. - Pydantic v2 -- use
model_dump()not.dict(),model_validate()not.parse_obj().
Contributing¶
- Create a feature branch from
main. - Follow the coding standards and commit format (
feat:,fix:,refactor:, etc.). - Write tests first (TDD). Target 80%+ coverage on new backend code where practical; the UI package enforces a 60% floor.
- Run
just test,just docs, andpre-commit run --all-filesbefore committing. - Open a PR with a clear description and test plan.