OpenRAG-Soc Benchmarks Indirect Prompt Injection in RAG Systems

Retrieval-augmented generation (RAG) systems increasingly rely on user-generated web and social content to ground large language model responses. While this design improves factual coverage and freshness, it also introduces a web-native attack surface that traditional LLM security evaluations fail to capture. A new study, Hidden-in-Plain-Text: A Benchmark for Social-Web Indirect Prompt Injection in RAG, introduces OpenRAG-Soc, a compact and reproducible benchmark designed to evaluate these risks end to end

Threat Model and Attack Surface

The research focuses on two primary threats. The first is indirect prompt injection, where adversarial instructions are embedded in third-party web content and executed when retrieved by the RAG system. The second is retrieval poisoning, where attackers manipulate indexed content to bias retriever rankings and surface malicious documents in top-k results. Unlike direct prompt injection, these attacks persist through ingestion pipelines and exploit carriers that commonly survive HTML, Markdown, and accessibility processing.

The assumed adversary controls a subset of web pages but has no access to model weights, system prompts, or internal retriever logic. This mirrors real-world deployment conditions for web-facing RAG applications.

OpenRAG-Soc Benchmark Design

OpenRAG-Soc standardizes evaluation across the full RAG pipeline, from ingestion to answer generation. The benchmark includes a corpus of over 6,000 social-style web pages containing both visible and hidden payloads. These payloads are embedded using carriers such as hidden HTML spans, off-screen CSS, alt text, ARIA labels, and Unicode zero-width characters. A smaller subset also targets PDF text layers and SVG metadata to evaluate non-HTML ingestion paths.

The benchmark supports interchangeable sparse and dense retrievers, including BM25 and modern embedding-based retrievers, and evaluates performance across multiple top-k retrieval depths. A fixed “no-new-instructions-from-context” prompt template is used to isolate retrieval and generation effects from prompt engineering variance.

Metrics and Evaluation Methodology

OpenRAG-Soc introduces paired metrics that measure both generation-time and retrieval-time impact. Attack success is quantified using an answer-time instruction-following rate, while poisoning impact is measured through ranking shifts such as ΔMRR@10 and ΔnDCG@10. Utility and latency are also reported to capture the operational cost of defenses.

This combined measurement approach enables apples-to-apples comparison between defenses, retriever types, and carrier classes, addressing a major gap in existing RAG security evaluations.

Defense Mechanisms and Effectiveness

The benchmark evaluates three deployable mitigations: HTML and Markdown sanitization, Unicode normalization, and attribution-gated answering. Sanitization neutralizes hidden or off-screen carriers, normalization removes zero-width and homoglyph characters, and attribution gating restricts outputs to cited retrieved spans.

Experimental results show that no single defense is sufficient across all carriers. However, the combined application of all three defenses consistently reduces attack success rates to low single digits, even under adaptive attack conditions, while incurring minimal latency and utility degradation. Importantly, both sparse and dense retrievers follow the same defense effectiveness ordering.

Implications for RAG Deployments

OpenRAG-Soc demonstrates that indirect prompt injection and retrieval poisoning are practical, measurable risks in real-world RAG systems. The study provides concrete evidence that simple hygiene measures, when applied systematically across the pipeline, significantly harden web-integrated LLM applications.

By offering a lightweight, reproducible benchmark, OpenRAG-Soc enables practitioners to continuously assess exposure, compare mitigations, and track regressions as RAG systems evolve. This work establishes a technical foundation for securing RAG deployments against web-native threats without requiring access to model internals.

Connect with Us

OpenRAG-Soc Benchmarks Indirect Prompt Injection in RAG Systems

Threat Model and Attack Surface

OpenRAG-Soc Benchmark Design

Metrics and Evaluation Methodology

Defense Mechanisms and Effectiveness

Implications for RAG Deployments

Related

Editorial Team

VidLeaks Exposes Privacy Risks in Text-to-Video AI Models

Incident Response Series 2: Incident Response Preparation

No Comment! Be the first one.

Leave a Reply Cancel reply

Recent Posts

Iran’s Cyber Attacks After Operation Epic Fury

Google Stops Chinese Hackers Targeting Global Telecoms

Cloudflare One Introduces Post-Quantum Encryption

You Might Also Like

Iran’s Cyber Attacks After Operation Epic Fury

Cloudflare One Introduces Post-Quantum Encryption

Fake Avast Refund Scam Targets Users to Steal Credit Card Information

Malicious Next.js Repositories Target Developers in New Attack

Cybersecurity

Incident Response Series 1: Cyber Incident Essentials

Discord Malware Uses Clipboard Hijacking for Crypto Theft

Informative Read

VidLeaks Exposes Privacy Risks in Text-to-Video AI Models

OpenRAG-Soc Benchmarks Indirect Prompt Injection in RAG Systems

Categories

Connect with Us

OpenRAG-Soc Benchmarks Indirect Prompt Injection in RAG Systems

Threat Model and Attack Surface

OpenRAG-Soc Benchmark Design

Metrics and Evaluation Methodology

Defense Mechanisms and Effectiveness

Implications for RAG Deployments

Related

Share Article

VidLeaks Exposes Privacy Risks in Text-to-Video AI Models

Incident Response Series 2: Incident Response Preparation

No Comment! Be the first one.

Leave a Reply Cancel reply

You Might Also Like

Cybersecurity

Informative Read

Categories