CWE-1039

自動認識メカニズムにおける逆境的入力摂動の不適切な検出と処理

Inadequate Detection or Handling of Adversarial Input Perturbations in Automated Recognition Mechanism

脆弱性作成中

日本語説明

この製品は、機械学習のような自動化されたメカニズムを使用して、複雑なデータ入力（画像や音声など）を特定の概念やカテゴリとして認識しますが、メカニズムが異なる不正確な概念を検出するような方法で変更または構築された入力を適切に検出または処理しません。

機械学習のような技法が入力ストリームを自動的に分類するために使用され、それらの分類がセキュリティ上重要な判断に使用される場合、分類に誤りがあると、攻撃者が製品に誤ったセキュリティ上の判断をさせたり、自動化されたメカニズムのサービスを中断させたりする脆弱性が生じる可能性があります。メカニズムが十分な入力データで開発されていないか、「訓練」されていないか、テストと評価が十分に行われていない場合、攻撃者は、意図的に誤った分類を引き起こす悪意のある入力を作成することができるかもしれません。

対象となる技術には以下のものが含まれるが、必ずしもこれらに限定されるものではない：

例えば、攻撃者が道路標識や路面標示を改ざんし、自律走行車を騙して標識や標示を読み違えさせ、危険な動作を実行させることが考えられます。別の例としては、高度に特殊で複雑なプロンプトを作成してチャットボットを「脱獄」させ、安全性やプライバシーの仕組みを迂回させる攻撃者（プロンプトインジェクション攻撃としてよく知られている）が挙げられる。

原文 (English)

The product uses an automated mechanism such as machine learning to recognize complex data inputs (e.g. image or audio) as a particular concept or category, but it does not properly detect or handle inputs that have been modified or constructed in a way that causes the mechanism to detect a different, incorrect concept.

When techniques such as machine learning are used to automatically classify input streams, and those classifications are used for security-critical decisions, then any mistake in classification can introduce a vulnerability that allows attackers to cause the product to make the wrong security decision or disrupt service of the automated mechanism. If the mechanism is not developed or "trained" with enough input data or has not adequately undergone test and evaluation, then attackers may be able to craft malicious inputs that intentionally trigger the incorrect classification.

Targeted technologies include, but are not necessarily limited to:

For example, an attacker might modify road signs or road surface markings to trick autonomous vehicles into misreading the sign/marking and performing a dangerous action. Another example includes an attacker that crafts highly specific and complex prompts to "jailbreak" a chatbot to bypass safety or privacy mechanisms, better known as prompt injection attacks.