Anthropic Ships One AI Model as Two Products, Split by Safety Controls Rather Than Capability

Anthropic has taken an unconventional step in AI deployment strategy: rather than offering its latest model in capability tiers, the company has released a single underlying system as two distinct products, separated entirely by their safety guardrails. The move, announced on June 9, introduces what may be a new pattern for how powerful AI systems reach different audiences.

Two Faces of the Same Engine

The publicly available version, Claude Fable 5, is Anthropic's most capable model to date. It ships with the company's full suite of cyber safeguards — classifiers designed to prevent the system from assisting with malicious hacking, exploit development, and other security-sensitive tasks.

Its counterpart, Claude Mythos 5, runs on the identical underlying architecture but with those safety layers removed. As first reported by The Hacker News, Mythos 5 is not available to the general public. Instead, access is restricted to a vetted group of cybersecurity professionals and researchers.

Addressing a Known Tension

The split reflects a practical problem that has long frustrated the security community. Safety guardrails on public AI models, while essential for preventing misuse, can also get in the way of legitimate defensive work. Red-team operators conducting penetration tests, malware analysts studying threat patterns, and vulnerability researchers probing for weaknesses all have valid reasons to interact with AI systems without content restrictions.

By offering an unrestricted version of the same model exclusively to these groups, Anthropic is attempting to serve both audiences without compromise. Defenders get the full analytical power of Fable 5 without safety classifiers blocking security-relevant queries, while the general public retains a version with appropriate protections in place.

A Contentious Precedent

The approach is not without controversy. Critics and AI safety advocates have raised concerns about the optics and implications of deliberately shipping a guardrail-free version of a powerful system, even under restricted access. The central worry is straightforward: if a model exists without safety controls, the risk of misuse grows regardless of how carefully access is managed.

There are also questions about the vetting process itself. What specific criteria determine who qualifies for Mythos 5 access? How does Anthropic verify that users remain legitimate over time? And what audit mechanisms are in place to detect potential abuse? The company has not publicly detailed the full scope of these controls, leaving room for skepticism.

Another open question is whether the existence of Mythos 5 could embolden other AI laboratories to adopt similar strategies — or whether it could motivate adversaries to attempt to reverse-engineer the unrestricted model's behaviour by probing the public Fable 5 version for weaknesses in its safety layers.

What It Means for the Industry

The split-deployment model is worth watching closely as a potential industry pattern. As AI models grow more powerful, the tension between broad accessibility and security risk will only intensify. Anthropic's approach is a pragmatic attempt to thread that needle, but its success will depend on execution — specifically, whether the insights generated by Mythos 5 users demonstrably improve safety across the board.

For the broader AI community, the move raises a deeper question: should safety controls be treated as a binary switch that can be toggled on and off, or as something more deeply integrated into how models reason? If other labs follow Anthropic's lead, the industry may need clearer norms around vetting, auditing, and transparency to ensure that unrestricted model access genuinely serves defensive purposes rather than becoming a liability.

Anthropic's bet is that empowering defenders with unfiltered tools will ultimately make AI systems safer for everyone. Whether that bet pays off will become clearer as the Mythos 5 access programme matures and its participants share their findings.

Anthropic 在人工智能部署策略上採取了非傳統一步：該公司並非按能力層級提供其最新模型，而是將單一底層系統作為兩款獨立產品發佈，兩者完全透過其安全防護機制區分。這項於 6 月 9 日宣佈的舉措，可能為強大人工智能系統如何觸及不同受眾引入新模式。

同一引擎的兩副面孔

公開版本 Claude Fable 5 是 Anthropic 迄今為止能力最強的模型。它搭載了公司全套網絡安全防護措施——包括旨在防止系統協助惡意入侵、開發漏洞利用及其他安全敏感任務的分類器。

其對應版本 Claude Mythos 5 則運行於完全相同的底層架構，但移除了這些安全層。據《The Hacker News》率先報導，Mythos 5 不對公眾開放。取而代之的是，其使用權限僅限於一組經過審核的網絡安全專業人士和研究人員。

解決一個已知的矛盾

這種拆分反映了長期困擾安全社群的實際問題。公共人工智能模型上的安全防護機制雖然對防止濫用至關重要，但也可能妨礙合法的防禦性工作。進行滲透測試的紅隊操作員、研究威脅模式的惡意軟件分析師，以及探查弱點的漏洞研究人員，都有充分理由不受內容限制地與人工智能系統互動。

透過向這些群體獨家提供同一模型的不受限版本，Anthropic 試圖在不妥協的情況下同時服務兩類受眾。防禦者獲得 Fable 5 的完整分析能力，而不會被安全分類器阻擋安全相關查詢；同時，公眾則保留一個具有適當保護措施的版本。

一個具爭議性的先例

這種做法並非沒有爭議。批評者和人工智能安全倡導者對刻意發佈一個強大系統的無防護版本（即使存取受限）所產生的觀感和影響表達了擔憂。核心疑慮很直接：如果一個沒有安全控制的模型存在，無論存取管理得多麼謹慎，濫用的風險都會增加。

審核過程本身也存在疑問。具體哪些標準決定誰有資格存取 Mythos 5？Anthropic 如何驗證使用者在一段時間後仍然合法？以及有哪些審計機制來偵測潛在濫用？該公司尚未公開詳細說明這些控制的全貌，留下了質疑空間。

另一個懸而未決的問題是，Mythos 5 的存在是否會鼓勵其他人工智能實驗室採取類似策略——或者是否會激勵對手嘗試透過探測公開的 Fable 5 版本以尋找其安全層中的弱點，來逆向工程無限制模型的行為。

對產業的意義

這種拆分部署模式作為潛在的產業範式值得密切關注。隨著人工智能模型變得更加強大，廣泛可及性與安全風險之間的緊張關係只會加劇。Anthropic 的做法是一種務實的嘗試，試圖在兩者間取得平衡，但其成功將取決於執行層面——具體而言，即 Mythos 5 使用者產生的見解是否能在整體上切實地提升安全性。

對於更廣泛的人工智能社群，此舉提出了一個更深層的問題：安全控制應被視為一個可以開關的二元開關，還是應更深入地整合到模型的推理方式中？如果其他實驗室跟隨 Anthropic 的腳步，產業可能需要圍繞審核、審計和透明度建立更清晰的規範，以確保無限制的模型存取真正服務於防禦目的，而非成為一種負擔。

Anthropic 的賭注是，透過賦予防禦者未經過濾的工具，最終將使每個人的人工智能系統更安全。隨著 Mythos 5 存取計劃的成熟及其參與者分享他們的發現，這場賭局是否成功將變得更加清晰。

新聞來源 / Original News Source

Hong Kong Linux User Group 香港Linux用家協會 (HKLUG)