Rate limit
A short-window cap on how often a request can be made - typically per-second or per-minute. Distinct from a quota, which is a long-window cap.
Last updated: 2026-05-10
Definition
Rate limits protect infrastructure from spikes (60 req/sec/user). Quotas enforce business rules (100 image renders / month / user). The two are complementary: rate limits live at your edge (Cloudflare, Vercel) and care about IP / second; quotas live in your business logic and care about user / month. AIPricingLab is built for quotas, not rate limits - you typically still want a Redis or edge-based rate limiter for sub-second protection.
Example
A free-tier user has rate limit 30 req/min (anti-abuse) AND quota 20 renders/month (cost cap). Both apply.
Related terms
Quota
A hard or soft cap on a usage unit (tokens, renders, seconds, cents) over a period (daily, monthly, lifetime). The "100 / month" in "100 image renders / month".
GlossaryLimit group
The basic unit of quota in AIPricingLab: a label, a unit, a quota, a period, and a list of match rules that decide which events count toward it.
Glossaryreserve / commit / release
The atomic three-step pattern that gates AI calls under concurrency: reserve atomically holds quota with a 60s TTL; commit confirms after the call succeeds; release rolls back on error.