L1 · primitives · LLM execution status: active

ExecutionKit¶

Provider-agnostic LLM execution primitives — consensus, ReAct, budget-aware calls. Zero runtime dependencies. Consensus voting, iterative refinement, ReAct tool loops, structured output, and pipe composition, plus lightweight routing, workflow, planning, and approval-gate primitives — all `mypy --strict` clean, all stdlib only.

Get started View source

$ pip install executionkit # zero runtime dependencies — stdlib only

Python 3.11+ Zero runtime deps mypy --strict OpenAI-compatible httpx (optional) MIT license

part of the Console portfolio

get running

Quick start¶

No framework, no adapter matrix — one Provider, six reasoning patterns.

import asyncio
import os
from executionkit import Provider, consensus

async def main() -> None:
    async with Provider(
        base_url="https://api.openai.com/v1",
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o-mini",
    ) as provider:
        result = await consensus(
            provider,
            "What is the capital of France?",
            num_samples=3,
        )
        print(result.value)                          # Paris
        print(result.metadata["agreement_ratio"])    # 1.0
        print(result.cost)                           # TokenUsage(input_tokens=..., output_tokens=..., llm_calls=3)

asyncio.run(main())

Swap in Ollama, vLLM, Groq, Together, GitHub Models, or Azure via an OpenAI-compatible gateway — same Provider, same patterns.

Full walkthrough

patterns

Six composable reasoning patterns¶

Consensus¶

Run N independent calls, vote on the result, return the majority answer with an agreement-ratio confidence score.

Consensus

Generate, critique, regenerate — a bounded loop with a quality gate. Stops early once target_score is reached.

Iterative refinement

ReAct tool loop¶

Think → act → observe with a hard model→tool boundary: schema-validated arguments, per-call timeouts, and bounded fan-out on every axis.

ReAct tool loop

Structured output¶

Request JSON, parse it, validate it against a custom validator, and repair malformed responses with bounded retries.

Structured output

Pipe composition¶

Chain patterns end-to-end — thread one result into the next prompt with a shared budget tracked across every step.

Pipe composition

Map-reduce¶

Fan a prompt across independent inputs with gather_strict, then reduce with a single call — bounded concurrency throughout.

Patterns overview

why executionkit

The gap between raw chat calls and full orchestration¶

Provider-agnostic¶

Works with any OpenAI-compatible endpoint: OpenAI, Ollama, vLLM, Groq, Together, GitHub Models, llama.cpp, and Azure via an OpenAI-compatible gateway. The LLMProvider protocol is structural — any conforming object works without inheritance.

Zero SDK lock-in¶

One adapter interface. Swap providers per pattern, per call, or via env var — no dependency conflict, no framework to pin.

Budget-aware¶

TOCTOU-safe max_cost enforcement across parallel calls. llm_calls counts every dispatched wire attempt, including failed retries.

Resilient by construction¶

A retryable allowlist plus a token-bucket rate limiter give circuit-breaker / bulkhead-style behavior: full-jitter backoff, immediate cooldown on a 429's retry_after, and fail-fast on non-retryable errors.

Secure by default¶

API key masking, broad credential redaction in error messages, JSON-Schema tool-argument validation, and a prompt-injection-hardened default evaluator.

Eval-aware¶

A deterministic golden suite and a curated model-failure corpus assert output correctness — not just coverage — in every CI run, with an opt-in live-provider regression tier.

metrics

By the numbers¶

0

Runtime dependencies

6

Reasoning patterns

80%

Coverage gate, CI-enforced

15

ADRs

8

OpenAI-compatible providers

3.11+

Python, strict mypy

engineering practice

Built for platform teams¶

A written decision record¶

15 architecture decision records capture context, alternatives, and consequences for every consequential choice — structural protocols over ABCs, flat package layout, zero runtime dependencies, the tool-execution sandbox contract.

ADR index

Correctness gated in CI¶

An 80% coverage gate, ruff and mypy --strict on the full package, and a Bandit SAST job — plus a deterministic golden eval suite that checks output correctness, not just line coverage.

Contributing

A hard tool-execution boundary¶

The model→tool edge is schema-validated, per-call timeout-bounded, and fan-out-capped on every axis — surplus tool calls are rejected with an observation, never executed.

Tool sandbox ADR

Honestly scoped¶

A maintained Anti-Scope section states plainly what a pattern library does not do — no RAG, no multi-agent handoff, no framework. CONTRIBUTING.md draws the line and holds it.

Contributing & anti-scope

documentation

Where to go next¶

Getting started¶

Installation — pip install, extras, Python version support
Quick start — five lines from install to a consensus call
Provider setup — OpenAI, Ollama, Groq, Together, GitHub Models, Azure

Patterns¶

Overview — pick the right pattern for your problem
Consensus — vote across independent samples
ReAct tool loop — think, act, observe
Structured output — JSON with repair retries

Recipes¶

Multi-provider failover — resilient provider chains
Cost-aware routing — route on budget and tier
Conversational assistant — a stateful chat loop
Combining patterns — compose with pipe()

Reference¶

Architecture — module map, dependency graph, error hierarchy
API reference — core, adapters, configuration
ADR index — every architecture decision, dated and rationalized
Security — threat model and hardening notes

Need orchestration on top?¶

ExecutionKit is the execution-primitive layer. For persistent, declarative, multi-agent workflows with tiered model routing and fleet-level evaluation gating, agentic-runtime-platform layers over it — the runtime calls ExecutionKit patterns inside every agent step.

Read the architecture Open agentic-runtime-platform