ChatGPhish Attack Exploits ChatGPT's Markdown Rendering to Enable Phishing and Prompt Injection

Cybersecurity researchers at Permiso Security have publicly disclosed a technique that abuses how ChatGPT renders Markdown-formatted content, turning the AI assistant's own output interface into a vector for phishing and prompt injection attacks.

The attack, dubbed ChatGPhish, targets an implicit trust relationship within the ChatGPT web application. According to the disclosure reported by The Hacker News, the chatgpt.com response renderer processes Markdown links and images embedded in AI-generated responses without adequate sanitization or user-visible warnings. This allows an attacker to craft prompts that cause ChatGPT to produce output containing malicious links or image references — all rendered seamlessly within the familiar ChatGPT interface.

How the Attack Works

The core mechanism hinges on a design choice rather than a traditional software bug. When ChatGPT generates responses that include Markdown syntax for hyperlinks or embedded images, the web client renders them as clickable, interactive elements. Because these elements appear within the trusted context of a ChatGPT conversation, users are more likely to click on them without suspicion.

Permiso Security's research demonstrates that an attacker can craft a prompt — delivered through a shared conversation, a compromised plugin, or a poisoned document summarised by the tool — that instructs ChatGPT to include specific Markdown-formatted links in its response. These links can point to phishing pages designed to harvest credentials or deliver malware, while the image syntax can be weaponised to track whether a user has viewed the response.

Critically, the vulnerability does not involve the AI model itself being compromised or producing incorrect information. The model is functioning as designed. The attack surface lies in the application layer — specifically, how the web renderer interprets and displays the model's output.

Why This Matters

The discovery underscores a growing and underappreciated class of AI security risks: attacks that target the interface between the AI and the user, rather than the model's reasoning capabilities.

Large language models are increasingly integrated into enterprise workflows — summarising documents, drafting emails, and answering queries across sensitive domains. Users are developing a habit of trusting the output these tools produce. ChatGPhish exploits that trust directly, using the AI as an unwitting intermediary to deliver malicious content in a context where users' guard is already down.

This is a notable evolution from earlier prompt injection discussions, which tended to focus on manipulating model behaviour for misinformation or policy bypass. Here, the model's compliance with the attacker's instructions is the mechanism, but the payload is a conventional phishing attack delivered through a novel channel.

Broader Implications for AI Security Practices

The disclosure arrives at a time when organisations worldwide are grappling with how to securely adopt generative AI tools. Several key takeaways emerge:

  • Treat AI-generated content as untrusted input. Links, attachments, and interactive elements in AI responses deserve the same scrutiny as content from any external source. Security awareness training should be updated to reflect this.
  • Defence in depth applies to AI interfaces. Organisations deploying ChatGPT or similar tools should consider browser-level protections, such as URL filtering and link-scanning extensions, that intercept clicks regardless of their origin.
  • Vendor-side mitigations are essential. The responsibility ultimately lies with AI platform providers like OpenAI to implement stricter rendering policies — for example, stripping or sandboxing interactive Markdown elements, or adding clear visual indicators that distinguish AI-generated links from trusted sources.
  • Audit AI integrations regularly. Any workflow that pipes untrusted data into an AI assistant for summarisation or processing introduces potential injection vectors. Security teams should map these data flows and assess exposure.

Looking Ahead

Permiso Security's findings serve as a reminder that the security perimeter around AI tools extends well beyond the model itself. As LLM-powered interfaces become standard in professional environments, the rendering, interaction, and trust layers surrounding them must be scrutinised with the same rigour applied to any other user-facing application.

IT security teams would be wise to use this disclosure as an impetus to audit how their organisations route sensitive data through LLM interfaces — identifying every touchpoint where untrusted content could be injected into a trusted AI context and where AI-generated output is consumed without independent verification. The research highlights an urgent need for AI vendors to adopt a security-by-design approach to output rendering — treating the interface not as a passive display but as an active attack surface that demands defensive controls.


ChatGPhish 攻擊利用 ChatGPT 的 Markdown 渲染功能進行網絡釣魚及提示詞注入

Permiso Security 的網絡安全研究人員公開披露了一項技術,該技術濫用 ChatGPT 渲染 Markdown 格式內容的方式,將這款 AI 助手自身的輸出介面轉化為網絡釣魚和提示詞注入攻擊的載體。

這項被稱為 ChatGPhish 的攻擊,針對 ChatGPT 網頁應用程式中一個隱含的信任關係。根據 The Hacker News 報告的披露,chatgpt.com 的回應渲染器在處理 AI 生成回應中嵌入的 Markdown 連結和圖片時,缺乏足夠的清理或用戶可見的警告。這使得攻擊者可以精心設計提示詞,導致 ChatGPT 生成包含惡意連結或圖片引用的輸出——所有這些都無縫地呈現在用戶熟悉的 ChatGPT 介面中。

攻擊如何運作

其核心機制取決於一種設計選擇,而非傳統的軟體錯誤。當 ChatGPT 生成包含超連結或嵌入圖片的 Markdown 語法的回應時,網頁客戶端會將它們渲染為可點擊的互動元素。由於這些元素出現在受信任的 ChatGPT 對話環境中,用戶更有可能不加懷疑地點擊它們。

Permiso Security 的研究演示,攻擊者可以精心設計一個提示詞——通過共享對話、被入侵的插件或經由工具處理的惡意文檔傳遞——指示 ChatGPT 在其回應中包含特定的 Markdown 格式連結。這些連結可以指向旨在竊取憑證或傳播惡意軟件的網絡釣魚頁面,而圖片語法則可被武器化,用於追踪用戶是否已查看該回應。

關鍵在於,此漏洞並不涉及 AI 模型本身被入侵或產生錯誤資訊。模型是按設計運作的。攻擊面在於應用程式層——具體來說,是網頁渲染器如何解讀和顯示模型的輸出。

為何這很重要

這項發現突顯了一類日益增長且未被充分認識的 AI 安全風險:針對 AI 與用戶之間介面的攻擊,而非針對模型推理能力的攻擊。

大型語言模型正日益整合到企業工作流程中——總結文件、起草郵件、在敏感領域回答查詢。用戶正逐漸養成信任這些工具產出的習慣。ChatGPhish 直接利用了這種信任,在用戶戒備心已放鬆的環境中,將 AI 作為無辜的中間人來傳遞惡意內容。

這相較於早期的提示詞注入討論是一個顯著演進,後者傾向於操縱模型行為以散播錯誤資訊或規避政策。在此,模型遵從攻擊者指令是機制,但其負載是通過一個新穎渠道傳遞的傳統網絡釣魚攻擊。

對 AI 安全實踐的更廣泛影響

此披露恰逢全球組織正努力應對如何安全採用生成式 AI 工具之時。幾個關鍵要點浮現:

  • 將 AI 生成的內容視為不受信任的輸入。 AI 回應中的連結、附件和互動元素,應與來自任何外部來源的內容受到同樣審視。安全意識培訓應更新以反映這一點。
  • 縱深防禦適用於 AI 介面。 部署 ChatGPT 或類似工具的組織應考慮瀏覽器層面的防護,例如 URL 過濾和連結掃描擴充功能,這些措施可攔截點擊,無論其來源為何。
  • 供應商端的緩解措施至關重要。 最終責任在於 OpenAI 等 AI 平台供應商,需實施更嚴格的渲染策略——例如,剝離或沙盒化互動式 Markdown 元素,或添加清晰的視覺指示器以區分 AI 生成的連結與受信任來源。
  • 定期審計 AI 整合。 任何將不受信任的數據輸入 AI 助手進行總結或處理的工作流程,都會引入潛在的注入點。安全團隊應繪製這些數據流並評估風險敞口。

展望未來

Permiso Security 的研究結果提醒我們,AI 工具周圍的安全邊界遠超模型本身。隨著由大型語言模型驅動的介面成為專業環境中的標準配置,圍繞它們的渲染、互動和信任層必須經過嚴格審視,其嚴謹程度應與對待任何其他面向用戶的應用程式相同。

IT 安全團隊應明智地利用此披露作為契機,審計其組織如何通過 LLM 介面傳輸敏感數據——識別每個不受信任內容可能被注入受信任 AI 環境的觸點,以及 AI 生成的輸出在未經獨立驗證下被使用的場景。該研究強調了 AI 供應商迫切需要對輸出渲染採取安全設計方法——不將介面視為被動顯示,而是一個需要防禦性控制的主動攻擊面。

新聞來源 / Original News Source