The AI agent boom has produced an uncomfortable truth that few teams want to confront: most agent systems will become economically unsustainable long before they become technically impressive. A recent analysis published by O'Reilly Radar argues that the industry's fixation on model selection, prompt engineering, and orchestration frameworks is masking a far more fundamental problem — one rooted in how costs compound nonlinearly across every reasoning loop, API call, and retry in an agent workflow.
The Compounding Trap
The core argument is straightforward but widely underappreciated. AI agent costs do not scale in a straight line. Double the usage and you do not simply double the bill. Each additional step in a multi-step agent workflow introduces its own token consumption, latency overhead, and potential for retries. These costs compound in a predictable but frequently unmodelled fashion, creating a financial cliff that teams discover only when the production invoice arrives.
This is not a theoretical concern. Modern coding agents — tools like Anthropic's Claude Code, OpenAI's Codex, and Google's Jules — have dramatically lowered the barrier to building complex multi-step workflows. A developer can now chain together reasoning steps, tool calls, and contextual retrieval in minutes rather than weeks. The friction that once served as a natural brake on system complexity has effectively disappeared.
That ease of creation is, paradoxically, the problem. When prototyping is cheap, teams iterate rapidly without tracking the economic footprint of each additional step. By the time a workflow reaches production, its per-task cost may be orders of magnitude higher than the original estimate — if anyone estimated it at all.
Four Practices for Economic Sustainability
The analysis proposes a "cost-first architecture" approach built on four practices that challenge how most teams currently operate.
Model before you build. Teams should estimate the full per-task cost of a workflow before writing production code, explicitly accounting for reasoning steps, retries, tool calls, and the growth of context windows over multi-turn interactions. Too many teams treat cost modelling as an operations exercise to be handled after deployment.
Monitor at granularity. Aggregate API spend reports — the kind cloud providers routinely offer — are insufficient. Teams need task-level cost tracking to identify which specific steps are driving expenditure. Without this visibility, cost optimisation becomes guesswork.
Architect with intent. The ease of adding another agent step creates what the analysis describes as a pattern of frictionless expansion. Every action in a workflow must justify its cost against a measurable positive outcome. "Because we can" is not a valid architectural argument when each step adds unpredictable compounding expenses.
Route with economics. Not every subtask requires the most capable — and most expensive — model available. Lightweight, cheaper models should handle simple classification or retrieval tasks, with advanced models reserved for steps that genuinely demand their reasoning power. This tiered approach mirrors established infrastructure principles: you do not run every workload on premium hardware.
What This Means for Practitioners
For IT teams evaluating or building agent systems, the analysis raises a critical question the industry has largely avoided: what does "production-ready" actually mean when the economics are this poorly understood?
The gap between technical capability and economic viability is widening. Demonstrations look spectacular. Cost projections for sustained, production-scale use often do not survive contact with reality. Teams that treat cost architecture as a first-class design constraint — on par with reliability, security, and performance — will be the ones that ship sustainable systems.
Those that do not will find themselves managing impressive prototypes they cannot afford to run.
The broader lesson will be familiar to anyone who has lived through previous waves of infrastructure complexity. Cloud cost overruns, microservice sprawl, and serverless bill shock all followed the same pattern: frictionless adoption, unchecked growth, painful reckoning. AI agents are following the same trajectory — only faster, and with compounding costs that are harder to predict from the outside.
For IT professionals tasked with evaluating agent investments, the message is clear: demand granular cost projections before committing to any multi-step workflow architecture. The technology works. Whether it can be operated within a sustainable budget remains the defining challenge.
AI 代理的蓬勃發展揭示了一個令許多團隊不願面對的殘酷現實:大多數代理系統在技術上變得令人印象深刻之前,經濟上早已變得不可持續。O'Reilly Radar 最近發表的一份分析指出,業界對模型選擇、提示工程和編排框架的執迷,掩蓋了一個更為根本的問題——這個問題根源於成本如何在代理工作流程中的每個推理迴圈、API 調用和重試中,以非線性的方式疊加累積。
疊加的陷阱
核心論點簡單直接,卻遠未受到應有的重視。AI 代理的成本並非按直線比例增長。用量翻倍,並非只是帳單翻倍。多步驟代理工作流程中的每個額外步驟,都會引入其自身的 Token 消耗、延遲開銷以及重試的可能性。這些成本以可預測但經常未被建模的方式疊加累積,形成一個財務懸崖,而團隊通常只有在收到生產環境帳單時才會發現。
這並非理論上的擔憂。現代的編碼代理——例如 Anthropic 的 Claude Code、OpenAI 的 Codex 以及 Google 的 Jules 等工具——已大幅降低了構建複雜多步驟工作流程的門檻。開發者現在可以在數分鐘而非數週內,將推理步驟、工具調用和上下文檢索串聯起來。曾經作為系統複雜性天然制約的摩擦,實際上已經消失。
然而,這種易於創建的特性,矛盾地正是問題所在。當原型製作成本低廉時,團隊會快速迭代,卻不去追蹤每個額外步驟的經濟足跡。當工作流程進入生產環境時,其單次任務成本可能比原始估計高出數個數量級——前提是如果有人曾做過估計的話。
經濟可持續性的四個實踐
該分析提出了一種「成本優先架構」方法,建立在四個挑戰大多數團隊當前運作模式的實踐之上。
建模先於構建。 團隊在編寫生產代碼之前,應估計工作流程的完整單次任務成本,明確考慮推理步驟、重試、工具調用以及多輪對話中上下文窗口的增長。太多團隊將成本建模視為部署後才需處理的運維工作。
進行細粒度監控。 聚合式的 API 支出報告——雲端供應商通常提供的那種——是遠遠不夠的。團隊需要任務級別的成本追蹤,以識別哪些具體步驟正在推動支出。沒有這種可見性,成本優化就變成了盲目猜測。
有目的地設計架構。 輕易增加一個代理步驟的便利性,創造了分析所描述的「無摩擦擴展」模式。工作流程中的每一個操作都必須根據可衡量的正面成果來證明其成本的合理性。「因為我們可以」在每一步都增加不可預測的疊加費用時,並不是一個有效的架構論據。
基於經濟性進行路由。 並非每個子任務都需要最強大——且最昂貴——的模型。輕量級、較便宜的模型應處理簡單的分類或檢索任務,而高級模型則保留給那些真正需要其推理能力的步驟。這種分層方法反映了成熟的基礎設施原則:你不會在高端硬件上運行所有工作負載。
對從業者意味著什麼
對於評估或構建代理系統的 IT 團隊而言,這份分析提出了一個業界很大程度上迴避的關鍵問題:當經濟學原理如此晦澀難懂時,「生產就緒」究竟意味著什麼?
技術能力與經濟可行性之間的差距正在擴大。演示看起來往往令人驚嘆。但對於持續、生產規模使用的成本預測,經常在接觸現實時不堪一擊。那些將成本架構視為與可靠性、安全性和效能同等重要的首要設計約束的團隊,才是能夠交付可持續系統的團隊。
而那些未能如此的團隊,將會發現自己正在管理一些他們負擔不起運行的、令人印象深刻的原型。
更廣泛的教訓對於經歷過先前基礎設施複雜性浪潮的任何人來說都將是熟悉的。雲端成本超支、微服務蔓延以及無伺服器帳單衝擊,都遵循相同的模式:無摩擦的採用、不受控制的增長、痛苦的清算。AI 代理正遵循著相同的軌跡——只是速度更快,且疊加的成本從外部更難預測。
對於負責評估代理投資的 IT 專業人士而言,信息非常明確:在承諾任何多步驟工作流程架構之前,要求提供細粒度的成本預測。技術是可行的。但它能否在可持續的預算內運營,仍然是決定性的挑戰。
