ProductFebruary 5, 202610 min read

Designing AI Agents for Regulated Payment Workflows

Autonomy is easy. Auditable autonomy is hard. Notes from building agent loops that survive contact with compliance, treasury, and operations teams.

It is easy to build an AI agent that moves money. You give a language model a tool, a prompt, and a set of credentials, and within an afternoon you have a system that can initiate transfers, query balances, and respond to instructions in natural language. The demo is impressive. The demo is also the easiest part of the problem.

Regulated payment workflows do not reward autonomy. They reward auditable autonomy: the ability to explain, after the fact, why every decision was made, by whom or what, under which policy, against which inputs. An agent loop that cannot survive contact with a compliance officer, an internal auditor, and a regulator is not a product. It is a liability.

The four constraints autonomy has to live inside

Across the agent systems we have built and reviewed in payments and treasury, four constraints come up consistently. Skip any of them and the system either does not ship or does not survive its first audit cycle.

Determinism at the boundary

The model can reason in probability space. The action it triggers must execute in deterministic, auditable space. That means every tool call is a typed, versioned contract; every argument is validated against a schema; every side-effecting operation has an idempotency key. The language model is allowed to be creative about which tool to call. It is not allowed to be creative about how the tool behaves.

Policy as code, not as prompt

A surprising number of agent systems encode their guardrails in the system prompt. "Do not approve payments over $50,000." "Always require a second approver for cross-border transfers." This is a category error. Policy in a prompt is unenforceable, ungovernable, and silently overrideable by any sufficiently insistent input.

Policy belongs outside the model — in a typed rules layer the agent calls into, the same way a human user would. The model proposes. The policy layer disposes. When the policy changes, you change one file, not every prompt in the system. When the regulator asks which rule applied to a given payment, you can point at a version-controlled artifact, not a screenshot of a chat.

Human-in-the-loop is a feature, not a fallback

The temptation is to treat human approval as an embarrassment — a sign the agent is not good enough yet. In regulated workflows, it is the opposite. The presence of a human approver at the right threshold is a deliberate design choice that buys you accountability, intervention capacity, and a clean audit story. The work is in choosing the thresholds carefully and making the approval surface fast enough that the human is a partner, not a bottleneck.

A well-designed agent loop should make the easy 95% of decisions instantly, escalate the 5% that genuinely need judgment, and capture the reasoning trace for both.

Every action is a record

Auditability is not a logging requirement. It is an architectural one. The agent's inputs, the model's reasoning, the policy decisions, the tool invocations, the human approvals, and the eventual outcomes all need to live in a single, queryable, immutable record keyed by a payment identifier. If you cannot reconstruct, six months later, exactly what the agent saw and exactly why it did what it did, the system is not production-ready for regulated use.

What the loop actually looks like

In practice, the agent loop in a regulated payment workflow is narrower than the open-ended chat-style loops that dominate consumer AI. It looks more like this:

  1. A structured intent arrives — from a user, an upstream system, or a scheduled trigger.
  2. The agent enriches the intent with context from typed data sources.
  3. The agent proposes an action plan, expressed as a sequence of typed tool calls.
  4. A policy layer evaluates the plan against versioned rules. It approves, modifies, or escalates.
  5. Approved actions execute through deterministic tools with idempotency guarantees.
  6. The full trace — intent, context, plan, policy decision, execution, outcome — is written to the audit record.

The model is doing real work. It is reading messy inputs, choosing among possible plans, and handling edge cases that would otherwise need a human. But it is doing that work inside a frame the rest of the system can trust.

Why this matters now

The pressure to put AI into financial workflows is going to keep rising. Treasury teams, banks, and fintechs are all under cost and speed pressure that agent-based systems can genuinely relieve. The teams that succeed in shipping these systems into regulated environments will not be the ones with the most capable models. They will be the ones who treat the model as one component of a larger, auditable system — and who design the rest of that system with the same rigor they would apply to any other piece of payments infrastructure.

That is the bet HATI is built around: AI is a force multiplier on a well-designed payments backbone. It is not a substitute for one.