Use case

AI agent billing: meter multi-step agents and tool calls

Metering AI agents is harder than metering single LLM calls. One "agent run" can fan out into 20 tool calls and 50 LLM calls. Vevee handles agent-level and step-level metering with composite events and atomic reservations.

Last updated: 2026-05-10

The problem

AI agents - research agents, code agents, browser agents - make many sub-calls per user request. One "summarize this PDF" ask might trigger 12 LLM calls and 4 tool invocations.

Should you bill per-agent-run? Per-LLM-call? Per-token? The answer depends on your pricing, but the metering layer needs to express all three views without you re-instrumenting every call.

You also need quota gating: "this user has $5 of agent runs remaining" needs to block the next agent BEFORE it kicks off 100 sub-calls and burns $20.

The solution

Reserve the agent run upfront, before the first sub-call: vevee.reserve(userId, "agent.run", expectedCost, { agent: "research" }). Use an upper-bound estimate.

Track each sub-step as it happens with a child event tagged with the parent run ID. Vevee's match-rule engine routes each sub-event into the right limit groups (agent runs, LLM tokens, tool calls).

On success, commit the reservation and reconcile actual cost vs reserved upper bound (refund the difference via a track event). On error, release.

Example

Agent reserves a budget upfront. Sub-calls track separately for analytics, but the reservation guards the wallet.

import { createClient } from "@vevee/sdk";
import { nanoid } from "nanoid";

const vevee = createClient({ apiKey: process.env.VEVEE_KEY! });

export async function runResearchAgent(userId: string, query: string) {
  const runId = nanoid();
  const budgetCents = 50;

  const r = await vevee.reserve(userId, "agent.run", budgetCents, {
    agent: "research",
    runId,
  });
  if (!r.allowed) return { error: "limit_reached", reasons: r.reasons };

  let actualCents = 0;
  try {
    while (!done()) {
      const stepCost = await runOneStep(runId);
      actualCents += stepCost;
      // Track each step for analytics (does NOT increment the agent.run group)
      await vevee.track(userId, "agent.step", 1, { runId, kind: "llm" });

      if (actualCents > budgetCents) break; // safety stop
    }
    await vevee.commit(r.reservationId!);
    if (actualCents < budgetCents) {
      await vevee.track(userId, "agent.run.refund", budgetCents - actualCents, { runId });
    }
    return { runId, costCents: actualCents };
  } catch (err) {
    await vevee.release(r.reservationId!);
    throw err;
  }
}

Pick the billable unit deliberately

For most agent products the right billable unit is the agent run, not the underlying token count. Users buy "5 research runs / month", not "100k tokens of agent activity." Token cost still matters internally as a margin guardrail - track it as a sub-event, but don't expose it to the customer.

Upper-bound reservations protect against runaway agents

A pathological agent loop can chew through hundreds of sub-calls. By reserving a budget upfront and breaking the loop when actual cost exceeds it, you cap the blast radius. The reserved quota is held atomically, so two parallel agent runs can't both pass.

Per-sub-event analytics without quota impact

You probably want to see "this agent on average makes 14 tool calls and 38 LLM calls" - but you don't want each of those incrementing the user's agent.run quota. Use match rules so agent.step events route only into analytics groups, not into quota groups.

Tool calls and external API costs

If your agent calls paid external APIs (search, scraping, code execution), track those as separate events and either include them in the agent.run cents budget or in their own limit group ("100 tool calls / month"). Your pricing is your call.

Frequently asked questions

How do I estimate the upfront budget for a reservation?

Use a conservative upper bound based on agent type and config. For research agents, 50 cents is usually safe. Refund the unused portion at commit time so the user only pays what they actually used.

What if the agent times out mid-run?

If your code crashes, the reservation auto-releases after 60 seconds. If your agent legitimately runs longer, call vevee.commit() on partial success and track a refund for the unused portion.

Can I bill on both runs and total tokens?

Yes. Define two limit groups: agent.run (count) and llm.tokens (tokens). One sub-event can hit both. Free plans usually only enforce runs; usage-based premium plans can also enforce tokens for safety.

How do I show the user what their agent run cost?

Each event's response includes counters and remaining quota. Surface those in your UI. Or use vevee.usage(userId) on the client with a pk_live_ key for a live "X runs left this month" display.

Other use cases