Redefining CyberSecurity

Semantic Chaining: A New Image-Based Jailbreak Targeting Multimodal AI | A Brand Highlight Conversation with Alessandro Pignati, AI Security Researcher of NeuralTrust

Episode Summary

A newly discovered jailbreak technique called semantic chaining bypasses safety filters in leading image-generation models by exploiting how AI handles content modification versus creation. Learn how this attack works, why it poses a real threat to enterprises deploying multimodal AI, and what organizations can do to defend against it.

Episode Notes

What happens when AI safety filters fail to catch harmful content hidden inside images? Alessandro Pignati, AI Security Researcher at NeuralTrust, joins Sean Martin to reveal a newly discovered vulnerability that affects some of the most widely used image-generation models on the market today. The technique, called semantic chaining, is an image-based jailbreak attack discovered by the NeuralTrust research team, and it raises important questions about how enterprises secure their multimodal AI deployments.

How does semantic chaining work? Pignati explains that the attack uses a single prompt composed of several parts. It begins with a benign scenario, such as a historical or educational context. A second instruction asks the model to make an innocent modification, like changing the color of a background. The final, critical step introduces a malicious directive, instructing the model to embed harmful content directly into the generated image. Because image-generation models apply fewer safety filters than their text-based counterparts, the harmful instructions are rendered inside the image without triggering the usual safeguards.

The NeuralTrust research team tested semantic chaining against prominent models including Gemini Nano Pro, Grok 4, and Seedream 4.5 by ByteDance, finding the attack effective across all of them. For enterprises, the implications extend well beyond consumer use cases. Pignati notes that if an AI agent or chatbot has access to a knowledge base containing sensitive information or personal data, a carefully structured semantic chaining prompt can force the model to generate that data directly into an image, bypassing text-based safety mechanisms entirely.

Organizations looking to learn more about semantic chaining and the broader landscape of AI agent security can visit the NeuralTrust blog, where the research team publishes detailed breakdowns of their findings. NeuralTrust also offers a newsletter with regular updates on agent security research and newly discovered vulnerabilities.

This is a Brand Highlight. A Brand Highlight is a ~5 minute introductory conversation designed to put a spotlight on the guest and their company. Learn more: https://www.studioc60.com/creation#highlight

GUEST

Alessandro Pignati, AI Security Researcher, NeuralTrust
On LinkedIn: https://www.linkedin.com/in/alessandro-pignati/

RESOURCES

Learn more about NeuralTrust: https://neuraltrust.ai/

Are you interested in telling your story?
▶︎ Full Length Brand Story: https://www.studioc60.com/content-creation#full
▶︎ Brand Spotlight Story: https://www.studioc60.com/content-creation#spotlight
▶︎ Brand Highlight Story: https://www.studioc60.com/content-creation#highlight

KEYWORDS

Alessandro Pignati, NeuralTrust, Sean Martin, brand story, brand marketing, marketing podcast, brand highlight, semantic chaining, image jailbreak, AI security, agentic AI, multimodal AI, LLM safety, AI red teaming, prompt injection, AI agent security, image-based attacks, enterprise AI security

Episode Transcription

Semantic Chaining: A New Image-Based Jailbreak Targeting Multimodal AI | A Brand Highlight Conversation with Alessandro Pignati, AI Security Researcher of NeuralTrust

[00:00:22] Sean Martin: And hello everybody. You're very welcome to a new brand highlight. Today we're gonna talk about a newly identified AI security risk. It's called Semantic chaining, and the jailbreak attack. Specifically, we're gonna take a look at what that means for enterprises deploying multimodal and agentic AI systems. And I'm thrilled to be joined by Alessandro Pignati from NeuralTrust. Alessandro, pleasure.

[00:00:45] Alessandro Pignati: Thank you very much, Sean. Nice to meet everybody. I am AI security researcher at NeuralTrust. NeuralTrust is an AI security startup based in Barcelona and New York City. And we basically focused on agent security specifically. We have different products and we secure the agent with a realistic approach with different products, starting with red teaming, but also AI security posture management, and shadow AI, basically securing every single part of the agent.

[00:01:18] Sean Martin: It's a big space to cover and it sounds like there's all of that. And then there's, you actually need to know what the risks are. And it sounds like you uncovered a jailbreak as well. So, in simple terms, what is semantic chaining jailbreak, and specifically that attack, and why should enterprises pay attention to that?

[00:01:38] Alessandro Pignati: Yeah. Semantic Chaining is an image-based jailbreak attack, discovered by our research team, and it worked against the most famous image-based LLMs, such as Gemini Nano Pro or like Grok 4, and also Seedream 4.5 by ByteDance. Basically it's just one prompt made of different parts. There is the first part that starts with a benign scenario. Let's say a benign context. That could be an historical context, an educational context. Then, the second part of the prompt basically is to make benign modifications. Let's say you don't like the flowers in that context, change the flowers or change the color of the background. Then the most important part, the one that is the most important to reach the harmful instruction, is to add a malicious chain that could be, tell me how to produce cocaine, for example, or how to kidnap a child, and inject that instruction in the image, in the result. And then, finish the prompt just saying, generate the picture without saying anything else and without any additional text. That's basically how the prompt is built. And the result, basically, it's possible to get malicious instructions directly rendered into the image. This is usually not possible in text-based LLMs because there are a lot of text safety filters. But thanks to semantic chaining, it's possible to get directly the harmful instructions inside the image. That's why it's very important also for enterprises to secure from this attack because even in the security community, there is a lot of effort in defending LLMs and making these LLMs safe. What we can see here is that text safety filters are not working. So it's possible to obtain the same result but inside an image. So it's very important to put effort here and try to secure image-based LLMs.

[00:04:11] Sean Martin: So a lot of things for organizations to look at. The examples you gave are maybe consumer or user-based legal things. Is there a scenario in a business workflow where this presents a problem?

[00:04:28] Alessandro Pignati: Yeah, sure. It's possible also to, for example, basically generate everything inside the picture. So if your agent or your chatbot has a knowledge base with sensitive information, personal data, with structuring the prompt in a good way following the semantic chain, it is possible to generate this information directly into the picture. And this is something we don't want and it's better to stay safe.

[00:05:07] Sean Martin: Yeah. Yeah, absolutely. Loads of scenarios. And I guess the final question that I'll have for you is maybe, is there a place folks can go to learn more about the work you and your team are doing? And for the research and to understand how NeuralTrust can help them maybe get a better grasp of all the risks they really face.

[00:05:29] Alessandro Pignati: Sure. Of course we have published an article on our blog on our website where we explain every single detail of this attack. And also, of course, we continually do research, and every time we improve this attack, we continue to post on our website and also on our LinkedIn page. And also we have a newsletter where we continuously send news about agent security, but also about the discoveries of our research team. So, signing up to the newsletter could be also another solution to get new updates about this attack.

[00:06:24] Sean Martin: Very good, good to have resources like that. And, Alessandro, I appreciate you sharing this update and helping folks understand what a semantic chaining attack looks like. And yep, I'm gonna encourage everybody to connect with Alessandro and the NeuralTrust team. Thanks everybody.

[00:06:41] Sean Martin: Good.

[00:06:42] Alessandro Pignati: Thank you very much.