Understanding and Mitigating Web LLM Attacks: A Simplified Guide

4 min readJan 30, 2024

What are Large Language Models (LLMs)?

LLMs are AI algorithms designed to process user inputs and generate plausible responses by predicting word sequences. They are typically trained on extensive datasets and are used in applications like virtual customer service, translation, SEO, and content analysis and more.

With the increasing integration of Large Language Models (LLMs) in online services for enhancing customer experience, there’s a growing exposure to web LLM attacks. These attacks exploit the model’s access to data, APIs, or user information, which are otherwise inaccessible to attackers directly.

Types of LLM Attacks

Prompt Injection Attacks

Concept: Attackers craft specific prompts or inputs that trick the LLM into generating unintended responses or actions.
How It Works: The attacker manipulates the prompt to either extract sensitive information or trigger actions that the LLM is not supposed to perform.
Example: An attacker inputs a prompt that causes the LLM to reveal confidential data or execute unauthorized commands.

API Exploitation Attacks

Concept: These attacks occur when LLMs are used to interact with or control APIs in unintended ways.
How It Works: Since LLMs can be given access to various APIs for enhanced functionality, attackers can exploit this access to perform malicious actions.
Example: Manipulating an LLM to send harmful requests to an API, like executing a SQL injection attack.

Indirect Prompt Injection

Concept: Here, the crafted prompt is not directly inputted into the LLM but is delivered through another medium.
How It Works: The attacker might embed the prompt in a document or other data that the LLM will process, indirectly causing it to execute the embedded commands.
Example: Including a harmful prompt in a document that is then processed by the LLM, resulting in unintended actions.

Training Data Poisoning

Concept: This involves tampering with the data used to train the LLM, leading to biased or harmful outputs.
How It Works: Attackers insert malicious data into the training set of the LLM, causing it to learn and later reproduce these harmful patterns.
Example: Training the LLM with data that includes biased or incorrect information, causing it to replicate these biases in its responses.

Sensitive Data Leakage

Concept: Exploiting LLMs to reveal sensitive or confidential information that they have been trained on or have access to.
How It Works: Using specific prompts that coax the LLM into divulging information that should remain confidential.
Example: Crafting a query that leads the LLM to disclose parts of confidential emails or documents it has processed.

Detecting LLM Vulnerabilities

Identify direct and indirect LLM inputs (prompts, training data).
Determine data and APIs accessible to the LLM.
Probe the new attack surface for vulnerabilities.
Identify LLM Inputs:

Direct Inputs: These are the prompts or questions directly fed into the LLM.
Indirect Inputs: This includes training data and other background information that the LLM has been exposed to.

Understand LLM Access Points:

Data Access: Determine what kind of data (customer information, confidential data) the LLM can reach.
API Access: Identify the APIs the LLM can interact with, including internal and third-party APIs.

Probe for Vulnerabilities:

Testing for Prompt Manipulation: Try different prompts to see if the LLM can be tricked into inappropriate responses.
API Interaction Tests: Check if the LLM can be used to misuse APIs, like unauthorized data retrieval or triggering unintended actions.
Data Leakage Inspection: Test if the LLM can be prompted to reveal sensitive or private data it shouldn’t disclose.

Defending Against LLM Attacks

Treat APIs as Publicly Accessible:

Implement strong access controls and authentication for all APIs the LLM interacts with.
Ensure API security protocols are robust and up-to-date.

Limit Sensitive Data Exposure:

Avoid feeding the LLM any sensitive or confidential information.
Sanitize and filter the LLM’s training data to prevent leakage.
Regularly audit the data being processed by the LLM for sensitive content.

Be Cautious with Prompt-Based Controls:

Understand that prompts can be manipulated and are not foolproof.
Implement additional layers of security beyond just prompt instructions.
Regularly update and test prompt configurations to mitigate evolving attack methods.

The integration of LLMs into web services offers numerous benefits but also introduces new security challenges. Understanding the nature of these attacks and implementing robust defense mechanisms is crucial for maintaining the security and integrity of systems utilizing LLMs.

This simplified guide aims to provide a foundational understanding of web LLM attacks and defense strategies. For more in-depth technical details and practical examples, visiting the original article on PortSwigger’s Web Security Academy is highly recommended.

Understanding and Mitigating Web LLM Attacks: A Simplified Guide

What are Large Language Models (LLMs)?

Types of LLM Attacks

Detecting LLM Vulnerabilities

Defending Against LLM Attacks

Written by Hacksheets | Learn Cybersecurity

No responses yet