A newly identified supply-chain attack has compromised 19 science-focused Python packages on the Python Package Index (PyPI), collectively downloaded hundreds of thousands of times, raising fresh concerns for developers and organisations worldwide that depend on open-source libraries for data science, research, and engineering work.
The campaign, dubbed "Shai-Hulud," delivered malware designed to exfiltrate developer secrets — including credentials, API keys, and tokens that could be leveraged to pivot into private codebases and cloud infrastructure. According to BleepingComputer, the compromised packages targeted the scientific computing community, a segment of the Python ecosystem that underpins research, financial modelling, and data analysis workflows across sectors from academia to fintech.
A Familiar Attack Vector
Supply-chain compromises of package repositories are not new, but they remain a persistent and effective threat. The incident echoes high-profile cases such as the 2018 hijacking of the popular event-stream npm package, where a malicious actor gained maintainer access and injected cryptocurrency-stealing code, and the 2021 Codecov breach, in which a compromised build tool was used to siphon environment variables — including secrets — from thousands of CI/CD pipelines.
In each case, the attack succeeded not through a zero-day exploit but by abusing the trust relationship between developers and the open-source packages they install, often without auditing source code. The Shai-Hulud campaign appears to follow a similar playbook, targeting packages that may receive less scrutiny than mainstream frameworks yet remain widely depended upon in specialised domains.
Why Science Packages?
While the full motivation behind targeting science-focused libraries has not been publicly confirmed, such packages present an attractive target for several reasons. Scientific Python libraries are frequently deployed in environments where credentials for cloud services, databases, and internal APIs sit alongside development code. Packages in this category also tend to have smaller maintainer teams and less rigorous security review compared to high-visibility infrastructure tools, potentially making them easier to compromise.
For Hong Kong's growing community of data scientists, quantitative researchers, and AI engineers — many of whom rely on PyPI packages daily — the incident serves as a reminder that even niche dependencies can become entry points for credential theft and lateral movement within corporate networks.
Practical Steps for Developers
Security researchers consistently recommend several measures to reduce exposure to supply-chain risks:
- Pin dependency versions and verify checksums against known-good values before installing or updating packages.
- Use lock files (such as
poetry.lockorpip freezeoutputs) to ensure reproducible builds and detect unexpected changes. - Audit installed packages regularly, particularly after updates, using tools like
pip-auditor commercial software composition analysis platforms. - Limit secrets exposure in development environments by using environment variable managers and short-lived tokens rather than long-lived credentials.
Organisations should also consider adopting internal package mirrors with vulnerability scanning — a practice increasingly recommended by industry best-practice frameworks.
The Broader Picture
The Shai-Hulud campaign underscores that the open-source software supply chain remains one of the most accessible attack surfaces for threat actors. As Python continues to dominate fields such as machine learning, scientific research, and data engineering, the security posture of its package ecosystem carries weight well beyond the developer community. Every compromised package is a potential foothold into the organisations that depend on it — whether a multinational bank running quantitative models or a university research lab processing experimental data.
The full scope of the attack, including the identities of all compromised packages and the number of affected users, continues to be assessed. Developers are advised to review their recent dependency installations and rotate any secrets that may have been exposed in their development environments.
一個新近識別的供應鏈攻擊已侵入Python套件索引(PyPI)上19個科學計算Python軟件包,這些軟件包累計下載量達數十萬次,再次引起全球依賴開源程式庫進行數據科學、研究與工程工作的開發者和組織的關注。
此一被稱為「Shai-Hulud」的攻擊活動傳播了旨在外洩開發者認證憑證的惡意軟件,包括可被利用來入侵私有程式碼庫和雲端基礎設施的認證資訊、API金鑰及權杖。據BleepingComputer報導,受感染的軟件包目標鎖定科學計算社群,此Python生態系統的環節支撐著從學術界到金融科技等多個領域的研究、財務模型建立和數據分析工作流程。
熟悉的攻擊途徑
針對套件儲存庫的供應鏈攻擊並非新事,但仍然是一種持續且有效的威脅。此事件令人聯想到多起高調案例,例如2018年熱門npm套件event-stream遭劫持事件,當時惡意行為者取得維護者權限並注入竊取加密貨幣的程式碼;以及2021年的Codecov漏洞事件,攻擊者利用受感染的build工具,從數千條CI/CD pipeline中竊取了環境變數——包括認證憑證。
在每個案例中,攻擊的成功並非依靠零日漏洞,而是利用了開發者與其安裝的開源軟件包之間的信任關係,而開發者往往不會審查原始碼。「Shai-Hulud」攻擊活動似乎遵循類似手法,瞄準那些可能比主流框架受到較少審視、但在專業領域仍被廣泛依賴的軟件包。
為何瞄準科學計算軟件包?
雖然針對科學計算程式庫的完整動機尚未公開確認,但此類軟件包因多種原因成為誘人的目標。科學計算Python軟件包通常部署在環境中,其雲端服務、資料庫及內部API的認證憑證與開發程式碼並存。與高可見度的基礎設施工具相比,此類別的軟件包往往擁有較小的維護團隊,且安全審查較不嚴謹,這可能使其更易遭受入侵。
對於香港日益壯大的數據科學家、量化研究員及AI工程師社群——其中許多人日常依賴PyPI軟件包——此事件是一個警示:即使是一些小眾的依賴項,也可能成為竊取認證憑證並在企業網絡內進行橫向移動的入口點。
開發者的實用措施
安全研究人員一致建議採取多項措施以降低暴露於供應鏈風險的程度:
- 鎖定依賴項版本,並在安裝或更新軟件包前,驗證校驗和是否與已知的良好值相符。
- 使用鎖定文件(例如
poetry.lock或pip freeze的輸出)以確保可重複build並偵測非預期的更改。 - 定期審計已安裝的軟件包,特別是在更新後,可使用
pip-audit等工具或商業軟件組成分析平台。 - 在開發環境中限制認證憑證的暴露,應使用環境變數管理器和短期權杖,而非長期有效的認證憑證。
企業亦應考慮採用帶有漏洞掃描功能的內部套件鏡像——此一做法日益受到行業最佳實踐框架推薦。
更宏觀的圖景
「Shai-Hulud」攻擊活動凸顯了開源軟件供應鏈仍然是威脅行為者最容易利用的攻擊面之一。隨著Python在機器學習、科學研究和數據工程等領域持續佔據主導地位,其套件生態系統的安全態勢影響深遠,遠超開發者社群本身。每一個被入侵的軟件包,都是潛在進入相關組織的立足點——無論是運行量化模型的跨國銀行,還是處理實驗數據的大學研究實驗室。
此次攻擊的完整影響範圍,包括所有受感染軟件包的身份及受影響用戶數量,仍在評估中。建議開發者審查其近期的依賴項安裝記錄,並輪換其開發環境中可能已暴露的任何認證憑證。
