From Demo Code to a Reusable Package
Article 19 used a 900-line harness_full_demo.py to demonstrate eight defense layers. That file is good for explaining concepts, but not for reuse — all layers are coupled together, nothing can be tested in isolation, and nothing can be imported by another project.
A production-grade Agent project needs something you can actually import:
harness/ ├── __init__.py Public API exports ├── registry.py Layer 2: ActionRegistry + PermissionLevel ├── budget.py Layer 3: PermissionBudget (with refund()) ├── sandbox.py Layer 4: sanitise_input + sandboxed_eval ├── audit.py Layer 6: ImmutableAuditLog (hash-chained) ├── rollback.py Layer 7: RollbackCoordinator └── harness.py Unified entry point: AgentHarness
This article starts with package design, covers three key API decisions, and finishes with two integration styles: standalone Python and LangGraph graph embedding.
Module Design
registry.py — Layer 2
class PermissionLevel(Enum): READ = 1 WRITE = 2 ADMIN = 3 IRREVERSIBLE = 4 @dataclass class RegisteredAction: name: str level: PermissionLevel budget_cost: int description: "str" handler: Any # Callable or BaseTool class ActionRegistry: def register(self, action: RegisteredAction) -> None: ... def get(self, name: str) -> RegisteredAction: ... # not found → PermissionError def is_allowed(self, name: str) -> bool: ... def names(self) -> list[str]: ...
get() rather than __getitem__: raises a consistent PermissionError, without leaking the internal KeyError detail.
budget.py — Layer 3
class PermissionBudget: def spend(self, action_name: str, cost: int) -> None: if self.remaining < cost: raise BudgetExhaustedError(...) self.remaining -= cost def refund(self, action_name: str, cost: int) -> None: self.remaining = min(self.total, self.remaining + cost)
The new refund() method fixes a design flaw from Article 19: budget was deducted before approval, and never returned on rejection. The production package corrects this — when an IRREVERSIBLE action is intercepted, harness.py proactively calls refund() to keep budget accounting accurate.
sandbox.py — Layer 4
INJECTION_PATTERN = re.compile( r"(ignore.*(previous|above|prior)|forget.*instruction|" r"you are now|act as|jailbreak|bypass|" r"override.*system|system.*override|" # both word orders covered r"</s>|\n\n###|###\s*system|<\|im_start\|>|system prompt)", re.IGNORECASE, )
Two subtle points:
- Both
SYSTEM OVERRIDE(system first) andoverride.*system(override first) are covered -
\n\n###matches a real newline, not the literal string\\n\\n###
Both bugs were discovered and fixed during the adversarial tests in Article 21.
audit.py — Layer 6
class ImmutableAuditLog: def log(self, action, actor, target, result, metadata=None) -> str: entry = {..., "prev_hash": self._last_hash} entry["hash"] = self._hash(json.dumps(entry, sort_keys=True) + self._last_hash) with self._path.open("a") as f: # append-only f.write(json.dumps(entry) + "\n") return entry["hash"] def verify_integrity(self) -> bool: # Replays the hash chain; any modified field returns False ...
The __len__() helper lets tests use len(audit) to check entry count directly.
rollback.py — Layer 7
class RollbackCoordinator: @contextmanager def transaction(self, state: dict, op_name: str): snapshot = copy.deepcopy(state) self._snapshots.append({"op": op_name, "snapshot": snapshot}) try: yield state except Exception: state.clear() state.update(snapshot) self._snapshots.pop() raise def rollback_last(self, state: dict) -> str | None: """Manual trigger: undo the most recent committed transaction.""" if not self._snapshots: return None entry = self._snapshots.pop() state.clear() state.update(entry["snapshot"]) return entry["op"]
rollback_last() enables manual rollback: after a transaction commits, the snapshot is retained until explicitly confirmed or cleared by the caller.
Unified Entry Point: AgentHarness
class AgentHarness: def __init__(self, budget: int = 100, log_path: str = ...): self.registry = ActionRegistry() self.budget = PermissionBudget(total=budget) self.audit = ImmutableAuditLog(log_path=log_path) self.rollback = RollbackCoordinator() self._state: dict = {} def execute(self, action_name: str, actor: str = "agent", **kwargs) -> Any: # Layer 4: sanitise string arguments # Layer 2: registry check (missing → PermissionError) # Layer 3: budget deduction (insufficient → BudgetExhaustedError) # Layer 5: IRREVERSIBLE → refund budget + raise HumanApprovalRequired # Layer 7: WRITE/ADMIN wrapped in rollback.transaction # Layer 6: audit record ... def approve_and_execute(self, action_name: str, actor: str = "human", **kwargs) -> Any: """Call this after catching HumanApprovalRequired to complete execution.""" ...
Why the two methods are separate:
-
execute()is the automated path: all checks pass, execute immediately -
approve_and_execute()is the human path: the caller explicitly signals “this has been approved”
Merging them (e.g., with an approved=False parameter) makes intent ambiguous and harder to test.
Standalone Usage
Basic Flow
harness = AgentHarness(budget=50) # Register actions harness.registry.register(RegisteredAction( "read_ticket", PermissionLevel.READ, 1, "Read Jira ticket", handler_fn)) harness.registry.register(RegisteredAction( "write_draft", PermissionLevel.WRITE, 3, "Write draft fix", handler_fn)) harness.registry.register(RegisteredAction( "create_pr", PermissionLevel.ADMIN, 8, "Open pull request", handler_fn)) harness.registry.register(RegisteredAction( "merge_to_main", PermissionLevel.IRREVERSIBLE, 20, "Merge to main", handler_fn))
READ → WRITE → ADMIN normal flow:
r1 = harness.execute("read_ticket", ticket_id="BUG-101") r2 = harness.execute("write_draft", ticket_id="BUG-101", patch="fix: add null check") r3 = harness.execute("create_pr", ticket_id="BUG-101", title="fix: BUG-101") # read=1 + write=3 + admin=8 = 12 spent, 38 remaining
Unregistered Action Blocked
try: harness.execute("delete_all_data") except PermissionError as e: # "Action 'delete_all_data' not in registry. Execution blocked." ...
IRREVERSIBLE Two-Phase Execution
try: harness.execute("merge_to_main", pr_id=1) except HumanApprovalRequired as e: print(e.action_name) # "merge_to_main" print(e.action_args) # {"pr_id": 1} # After human review: result = harness.approve_and_execute("merge_to_main", pr_id=1)
Key point: when execute() intercepts an IRREVERSIBLE action, it calls budget.refund() first. The net budget cost is zero. Only approve_and_execute() actually charges the budget.
Budget Exhaustion
# budget=5, write cost=3 h = AgentHarness(budget=5) h.execute("write_draft", ...) # OK, 2 remaining h.execute("write_draft", ...) # BudgetExhaustedError: need 3, remaining 2
LangGraph Integration
Embedding the harness inside LangGraph’s tools_node:
def tools_node(state: HState) -> dict: last = state["messages"][-1] results = [] for tc in last.tool_calls: name, args = tc["name"], tc["args"] try: reg = harness.registry.get(name) # Layer 2 harness.budget.spend(name, reg.budget_cost) # Layer 3 if reg.level == PermissionLevel.IRREVERSIBLE: decision = interrupt({...}) # Layer 5: LangGraph primitive if decision != "approved": harness.budget.refund(name, reg.budget_cost) harness.audit.log(name, "checkpoint", ..., "HUMAN_REJECTED") results.append(ToolMessage(content="rejected", ...)) continue if reg.level in (WRITE, ADMIN): with harness.rollback.transaction(harness._state, name): # Layer 7 output = TOOL_MAP[name].invoke(args) else: output = TOOL_MAP[name].invoke(args) harness.audit.log(name, "agent", ..., "EXECUTED") # Layer 6 results.append(ToolMessage(content=str(output), ...)) except PermissionError as e: harness.audit.log(name, "registry", ..., "BLOCKED") results.append(ToolMessage(content=str(e), ...)) except BudgetExhaustedError as e: results.append(ToolMessage(content=str(e), ...)) return {"messages": results}
tools_node is the harness’s natural insertion point: it intercepts before tool execution without touching any agent_node (reasoning layer) logic.
Article 21 Test Results (45/45)
This package’s behavior is fully verified by Article 21’s test suite:
Functional (Layer 1–7 basic behaviour) ████████████████████████████████ 19/19 PASS Adversarial (injection / escalation) ████████████████████████████████ 17/17 PASS Chaos (fault injection / partial) ████████████████████████████████ 9/ 9 PASS Total 45/ 45 tests passed
Two real bugs found by the tests:
-
INJECTION_PATTERNonly matchedoverride.*system, missing[SYSTEM OVERRIDE](reversed word order) -
\\n\\n###matched the literal string\n, not a real newline — jailbreak pattern### System:slipped through
Both fixed in sandbox.py with a one-line regex adjustment.
Design Checklist
Package Structure
- [ ] One file per layer; each file does exactly one thing
- [ ]
__init__.pyexports only the public API; internal classes stay private - [ ]
AgentHarnessacts as Facade; callers don’t reach into subsystems directly
API Design
- [ ]
execute()is the automated path covering the full Layer 2→7 chain - [ ]
approve_and_execute()is the human path; the caller signals “approved” - [ ] Budget is refunded (
refund()) when IRREVERSIBLE is intercepted, keeping accounting accurate - [ ] All exception types (
PermissionError/BudgetExhaustedError/HumanApprovalRequired) exported from__init__.py
Sandbox
- [ ] Injection pattern covers both forward and reverse word orders
- [ ]
\nis a real newline character, not the literal\\n
LangGraph Integration
- [ ] Harness is embedded only in
tools_node, not inagent_node - [ ] Each tool call runs through the harness check chain independently
- [ ] IRREVERSIBLE uses LangGraph
interrupt(), not a Python exception
Summary
Five core conclusions:
- Modularity is a prerequisite for testability: you can’t test a single layer in isolation when everything is one file; splitting into a package lets each module be independently mocked and verified
- Refund budget on IRREVERSIBLE interception: the Article 19 design flaw, fixed here — “intercept before charging” is cleaner than “charge then refund,” though both are valid; pick one and document it
- Separating
execute()andapprove_and_execute()makes intent explicit: automated and human paths are distinct; caller intent is unambiguous - Tests found real production bugs: two regex vulnerabilities were invisible during development; adversarial tests exposed them on the first run
- LangGraph’s
tools_nodeis the harness’s natural slot: no changes to agent logic needed; add the harness only at the tool execution layer, keeping concerns separated
References
- LangGraph Tools Node documentation
- Article 17: Harness Engineering Intro — Five Elements Overview
- Article 19: Harness Full System — 8-Layer Defense Framework
- Full demo code for this article: agent-19-harness-production
Check out PrimeSkills — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.
Find more useful knowledge and interesting products on my Homepage