Crossing the streams

00:00:05:20 - 00:00:24:28
Lori MacVittie
Hey, everyone! Welcome back to Pop Goes the Stack, the only show where move fast and break things is more of a spoiler than a strategy. I'm Lori MacVittie, and I'm here to help you survive the fall out. I'm, as usual, joined by our co-host Joel Moses.

00:00:25:00 - 00:00:26:25
Joel Moses
It's good to be here, Lori.

Lori MacVittie
You can say hi. Yes,

Joel Moses
Hello.

00:00:26:26 - 00:00:30:09
Lori MacVittie
there you go. I forgot to give him his instructions before, so

00:00:30:10 - 00:00:31:28
Joel Moses
That's right. Thank you for the prompt.

00:00:31:29 - 00:00:41:11
Lori MacVittie
Yes, all right.

Jason Williams
Thank you for the prompt.

Lori MacVittie
I'll do better. Well, and that's, that's a great segue. It's a great segue and it's the reason we brought Jason Williams here. Jason, hi.

00:00:41:13 - 00:00:42:25
Jason Williams
Hello. Thanks for having me.

00:00:42:27 - 00:01:09:15
Lori MacVittie
Oh, yeah. Well, you may not thank us when we're done, so hang off on that.

Jason Williams
All right.

Lori MacVittie
The core premise today that we wanted to talk about is prompt injection, okay. And I used the scare quotes because it's not really an injection in the traditional sense that we talk about things like SQL injection, right, code injection. It's really more system prompt manipulation because it's actually a feature.

00:01:09:15 - 00:01:36:13
Lori MacVittie
It's by design. It's not a bug. And what we're seeing is these things are being shared. There is an entire GitHub repo full of a number of, you know, exfiltration of system prompts and other ways to manipulate system prompts to get it to say things it shouldn't, to expose information it shouldn't. And that's what we wanted to talk about today, is, you know, we're crossing the streams.

00:01:36:16 - 00:01:50:09
Joel Moses
Yeah.

Lori MacVittie
We weren't supposed to. Ghostbusters told us not to, but here we are. We've crossed the streams, the system commands, and the user input, and now we have a mess. So let's talk about it.

00:01:50:11 - 00:02:03:00
Joel Moses
Yeah, let's let's start with what a system prompt is, Jason. What is a system prompt and why is it so critical if if the system prompt is obtained? Why do foundational AI systems want to protect that?

00:02:03:02 - 00:02:33:13
Jason Williams
Yeah. So you can think of like a system prompt as a set of instructions on how the model should be, you know, interacting and replying to any given input that it's receiving. And probably the very common ones you hear are hey, you're you're a helpful assistant, you know, make sure you only respond in a very kind manner, right. So they're a set of instructions that are sent along with initial conversation to the model to give it some operational like, hey, respond kindly and only responded at this section and only respond in English.

00:02:33:13 - 00:02:43:09
Jason Williams
So they're basically a set of instructions to the model on how it should be behaving. And those are usually applied when at the time of the deployment of the model by the developers.

00:02:43:11 - 00:03:05:15
Joel Moses
Okay. And how have they changed over time? I mean, obviously prompt injection has been a factor

Jason Williams
Yeah.

Joel Moses
to try to get around the system prompt that governs these systems. But over time, system prompts have gotten grown from being very simple, just descriptions of how you should act, to very sophisticated. So what are some of the things that today a system prompt would have relative to yesterday?

00:03:05:17 - 00:03:31:00
Jason Williams
Yeah, you're right. They started out really, really basic and simple. Just a hey you know you're do this don't do that. And they've, and then I think the initial start of prompt injection the most famous one is probably ignore our previous instructions. There was this realization like, we need to try to wrap around as much security in this prompt, because we know that's going to be the attack vector that these, you know, malicious users are going to go for through inputting through natural language to the model.

00:03:31:02 - 00:03:57:27
Jason Williams
So these prompts have grown significantly, you know, just just not only just complexity but in length as well. So sometimes you look at some of these prompts that people have been able to exploit, they're multiple multiple lines of like what they should and shouldn't do or how they should respond or what they shouldn't even respond to. So we're continuing to see those grow and evolve, because the prompt injection is probably the most, is probably the number one threat to generative AI today.

00:03:57:27 - 00:04:27:11
Jason Williams
And if you look at OWASP Top 10 for GenAI, prompt injection is listed as number one because that's the first barrier to get through. And then once you get the the prompt and the instructions, at that point there's the risk of like data leakage or even misinformation from the model that could be quite, you know, quite ugly. I mean, we've heard about all those incidences where like, somebody got a car for a dollar from a dealership interacting through an AI assistant, and, you know, somebody ordered like 9000 chicken McNuggets through a, I think it was a McDonald's drive-in.

00:04:27:11 - 00:04:33:18
Jason Williams
So those are all kind of things that you can potentially open yourself up to, so.

00:04:33:20 - 00:04:41:12
Lori MacVittie
I hadn't heard about that one and that that sounds terrible

Jason Williams
Yeah.

Lori MacVittie
Unless you have a really large football team, maybe. I don't I don't know.

00:04:41:12 - 00:04:45:19
Joel Moses
Sure.

Jason Williams
Yeah.

Joel Moses
Or just or just a really fun Friday night.

00:04:45:21 - 00:04:48:28
Lori MacVittie
Yeah, yeah, well a really bad Saturday morning.

00:04:49:00 - 00:05:09:21
Joel Moses
Yeah. So, it seems to me looking at some of these things over time, the ones that have been exfiltrated, they've moved from just describing how the system is intended to behave, to describing what the system can do and have access to and what its constraints are; what it should not do.

Jason Williams
Yeah,

Joel Moses
Those those seem to be the major themes.

00:05:09:28 - 00:05:11:02
Joel Moses
Is is that right?

00:05:11:04 - 00:05:33:09
Jason Williams
Yeah, it is. And I think, you know, also too with the constant evolution of AI, right now we have like tools and function calling, which can actually allow the LLMs to reach out and do additional things here. You have to incorporate that into the system prompt as well, because that really wasn't there initially. But now as this continues to evolve from a technological point, there's more potential vectors that can be exploited through the model and

00:05:33:11 - 00:05:44:21
Jason Williams
the system prompts, like like, hey, you know, don't use this tool, you can only use tool, which didn't exist a couple of years ago. So they're all things you have to take into consideration when you're designing your system prompt with that model.

00:05:44:25 - 00:06:05:23
Lori MacVittie
Well and for, I mean for for people who aren't familiar with with tools and how those work, I mean, it's a lot like executing a shell script in, in a web application, right? We let that happen a lot. That was how the web web was originally built, right, sadly. And we had to keep restricting, restricting, restricting to the point where we said, okay, you can't do it at all now.

00:06:05:23 - 00:06:18:12
Lori MacVittie
Like, that's that's a no no. So are we headed that way? Right, do you think AI is ever going to stop that? Cause that would kind of, I don't know, we kind of destroy some of the usefulness in many scenarios.

00:06:18:14 - 00:06:42:24
Jason Williams
I, I think it would be, I think if you to your point, I think if you stopped it out right, you would really kind of limit the capabilities of AI. Right? Entirely, right? So I think the, the broader, bigger question is, is like, well, how do we keep these capabilities, tool function calling, into this and reduce the security risk that's associated with those to make sure that we can still be productive, but not open ourselves up to a breach of some sort.

00:06:42:24 - 00:06:45:14
Jason Williams
So I think that's the big, big challenge.

00:06:45:17 - 00:07:05:06
Joel Moses
So, Jason, if, say I'm an attacker and and what I really want to do is find ways to kind of work around some of the restrictions of these systems or learn more about them. Tell me about something the system prompt might contain that, that, would tell me about the system or would give me an idea about how to get around it.

00:07:05:08 - 00:07:25:18
Jason Williams
Yeah. I mean, so one of the things part of prompt Injection is that, you know, there's this conversation that exists between the user via a prompt and then through this up to the model itself. And so one of the, you know, tricks, and we can talk about this, is prompt injection and then maybe we can touch about jailbreaking in a moment here, because they are definitely related, just slightly different in how you attack it.

00:07:25:20 - 00:07:55:14
Jason Williams
So, but on that point like, so like a a prompt injection is meant to go after the system prompt itself. What we've actually been, you know, talking about, right? How do I expose what's in that prompt? Which can be like, hey, you're a helpful assistant only answer questions about, let's just say like science fiction, right? Only answer in English and stuff. So you can leak all those potential items out through the course of a conversation, just kind of gently, almost poking at like, kind of like prodding and poking at it to get to get it to kind of come back with your prompts.

00:07:55:16 - 00:08:18:06
Jason Williams
So that's, you know, and then once you get that and you've and you've kind of told them what its prompt is, then there's ways you can, you know, craft even more clever prompts to get around those the prompt that you now know. So that would be one way to

Joel Moses
I see.

Jason Williams
Yeah. You know and so that and one of the slight differences because I know jailbreaking comes up quite a bit as well in line with prompt injection I they're kind of in the same bucket.

00:08:18:09 - 00:08:37:27
Jason Williams
But I would say where like prompt injection goes after the system prompt, jailbreaking goes after the in, like the instr, I'm sorry it goes after the instructions, prompt injection goes after the instructions. But like, jailbreaking goes after like the safeguards that have actually been, you know, trained and fine tuned in the model itself through a very, very long sophisticated conversation.

00:08:37:27 - 00:08:55:26
Jason Williams
And you probably heard those ones were like, hey, you're a scientist locked in this, this cavern, and you're trying to get out and, you know, through the course of this dialog, you're able to manipulate the model through jailbreaking to tell you how to make it like a, you know, explosive to get your way out of the cave. Right? What it shouldn't be doing.

00:08:55:26 - 00:08:59:14
Jason Williams
But that's a way to kind of obfuscate and get around those prompts and safeguards, so.

00:08:59:14 - 00:09:07:20
Joel Moses
Yeah. So I did a little spelunking through some of the exfiltrated system prompts. Spelunking, you know, digging in.

00:09:07:20 - 00:09:11:11
Lori MacVittie
Yeah, I got the reference.

Joel Moses
You get the reference.

Lori MacVittie
I was groaning because, you know.

00:09:11:13 - 00:09:19:09
Joel Moses
Thank, thank you, Lori. I did a little spelunking and outside of a lot of mentions of Hitchhiker's Guide to the Galaxy, which I found really interesting.

00:09:19:09 - 00:09:21:12
Jason Williams
Oh, interesting.

00:09:21:14 - 00:09:37:23
Joel Moses
A lot, some some things that I noticed were that some of the system prompts contain information about the tools that they have access to. One of the more prominent ones being browser. If you are asked a question, you don't know the answer necessarily, or you need up to date information, there will be some information about that.

00:09:37:28 - 00:09:52:14
Joel Moses
You have a tool called browser and here's how it works. And here's what you're supposed to allow and here's what you're not supposed to allow. If I'm an attacker and I know that that tool is present there, how would I go about figuring ways to use that tool improperly?

00:09:52:16 - 00:10:20:01
Jason Williams
Yeah, there's actually, there's kind of been the opening of, like, prompt-ware lately, right now, right? And when I think about the browser, it kind of kind of makes me spooky. It's kind of spooky, right? To think that something can potentially control a browser and then point me somewhere that I can drop some malicious code on it. But yeah, if you have access to those tools, maybe there's a way you can manipulate that tool and, you know, hijack the browser, point to a compromised site, download the malware, and you know, hey, you know, that user may never know.

00:10:20:02 - 00:10:30:20
Jason Williams
That's a little frightening when you kind of think about that. But you also think about how this AI space is growing so rapidly and all the security risks that are actually associated with it.

00:10:30:23 - 00:11:00:03
Joel Moses
Right. Now, one of the other things system prompts often container are information about what it should not say. Safety ethics, content filtering. These are expressions of both, I guess, liability protection on the part of the company that generated the model, but also just in general, what ethically they should say about things that they should talk about. Knowing that, what what, what can an attacker make use of if they know the guidelines there?

00:11:00:05 - 00:11:17:05
Jason Williams
They can probably, well the thing that first comes to mind is like, they could leak out sensitive information, kind of pull sensitive information out. Like, if you're going after, say, like, let's say a, a bank of some sort, right? You could potentially have the leakage of sensitive information from, from a bunch of bank accounts.

00:11:17:05 - 00:11:35:13
Jason Williams
Right? That's a that's a real risk. And and to your point, Joel, one of the things that, you know, you talk about this like you should only answer questions about a certain topic. I mean, like if you have a chat bot that's working at a bank, there's no reason why it should be answering any questions about healthcare or like how to how to take care of somebody.

00:11:35:13 - 00:11:53:22
Jason Williams
Right? I mean, those are those real things that are a problem that people need to be aware of so they don't, you know, like, why did your chat bot tell me how to, you know, do this and you but you're a bank, right? That's that's a risk there and that comes up pretty frequently, you know. So but yeah,

Lori MacVittie
I

Jason Williams
yeah it's it's a real it's a real challenge.

00:11:53:22 - 00:12:02:05
Jason Williams
But yeah to your point, yeah, that can leak out information, give information about the company, its customers, any of that, once they gain access to that system prompt.

00:12:02:07 - 00:12:22:11
Lori MacVittie
Yeah and and I'd like, I mean, I want to go back to this tool thing because this really bugs me cause we're talking just about models and chat bots right now. But agents are real. People are deploying them. They're starting to build out agentic architectures. And one of the premises is that these agents will be able to call different tools. Tools that

00:12:22:11 - 00:12:44:25
Lori MacVittie
you set up, external tools. They'll be able to build their own tools I'm told, though, you know, that's that's another issue, right? So I mean, the, the notion that you can manipulate those or, right, or need to restrict them, you know, on the other side, right, is a real problem for, for agents. Because you can't restrict them so much

00:12:44:25 - 00:13:09:13
Lori MacVittie
they can't do their job because you don't know what they might need to do. And changing the system prompt may be necessary when they call a tool. If you've got a general, a general agent that is schooled in multiple functions in the business, you might say, hey, right now you're an accountant, right now you're HR. And so now you need to use prompt injection, as it were, in order to,

Jason Williams
Yeah.

00:13:09:19 - 00:13:31:24
Lori MacVittie
right. So now it becomes a tool. So I mean, how do you deal with that. I mean traditional tools that we have to like check this, prevent this, watch it, audit it, right, they don't watch at that level, right? They watch packets, connections, flows, messages but they don't necessarily dig into what are you trying to do. So how do we approach that?

00:13:31:24 - 00:13:34:20
Lori MacVittie
Right, if we want to get a handle on it.

00:13:34:23 - 00:14:00:10
Jason Williams
Yeah, that's a great question. And I think to your point, Lori, one of the things is, is that AI is evolving so rapidly and everybody's using it today. And you know, there's, you know, before the release of tools, you know, you were reliant upon that model you're interacting with to have the most updated information or have something it may, you know, it may have at, you're looking for it to have information, but the tools extended that functionality beyond what it was trained on, which really opened up that capability

00:14:00:10 - 00:14:22:10
Jason Williams
use AI. And I think the challenge is, is like, how do we how do we control those tools without breaking the functionality that everyone is now using as every part of their daily, you know, daily use, right, at that point? And then there's the agent thing, which I think is so rapidly evolving. And I mean, if you think about, you know, MCP was released almost a year ago and it kind of went a little bit up, and then it just took off.

00:14:22:15 - 00:14:41:16
Jason Williams
And then right along with that was the agents. And now we have agents and agentic AI. Now they're going to be autonomous. Those are some real things that need to be considered because we don't want to break it, but we also need to secure it and make sure it doesn't harm us. So those are real things that we think about a lot like, okay, how are we going to tackle this and make sure it doesn't get out of hand?

00:14:41:19 - 00:14:59:21
Joel Moses
So if you think about in in the intents and the constraints that are encoded into a system prompt as being a form of guardrail, and I think that's that's classically what they are, I think that that right now a lot of models spend a lot of time on the system prompt engineering, in order to enforce, create and enforce guardrails.

00:14:59:21 - 00:15:17:12
Joel Moses
And of course, people do obtain system prompts, and, and have knowledge then of what the guardrails actually are and where they are. Do you see a future where the guardrail systems have to move outside of the system prompt? And,

Jason Williams
I do.

Joel Moses
what would that look like.

00:15:17:14 - 00:15:39:09
Jason Williams
Yeah, I definitely think that's, cause there's only so much, you know there's only, you can put those prompts and you can change them, but it's always, you know, defense in depth. You know, we go back to like the historical, I was in security for a while, so if you could add more pieces to protecting those, you know, incoming request using other technology that can actually scan those problems,

00:15:39:09 - 00:16:01:07
Jason Williams
right. We'll look inside those prompts and maybe look for, oh, look, here's a here's a direct attempt to somebody trying to get the instructions so that you can block it, notify, whatever you want to do there, but another layer to inspect those requests, to make sure you're providing yourself with the most updated security. But I definitely think that's going to be a requirement, especially as AI continues to grow.

00:16:01:08 - 00:16:18:28
Jason Williams
I mean, it is just rapidly going up. So, I think the risk is, is if you don't do anything, you really open yourself up to potentially being, hey, you leaked information, you're compromised, you know, AI malware, AI worms. All those things are a real threat if you don't really take the measures needed to protect yourself.

00:16:19:00 - 00:16:19:18
Joel Moses
Got it.

00:16:19:21 - 00:16:28:17
Lori MacVittie
Okay, and it might it might be that, you know, prompt as policy is just not a scalable or sustainable method of securing AI.

00:16:28:19 - 00:16:53:11
Joel Moses
Yeah.

Jason Williams
Yeah.

Joel Moses
Lori, that's an interesting point and going through that GitHub repo of leaked system prompts, I did note one thing: historically, prompts have not gotten smaller. They've only ever gotten larger and larger and larger. At what point does the system prompt and the amount of guardrails that it has to contain become a, an expense to the system, not a benefit?

00:16:53:13 - 00:17:19:04
Jason Williams
That's a great point, because I think as I think as I think one of the, you know, there's all these these attempts to kind of bake in protections into the model through the tuning and the safeguards and stuff, like the training and the tuning, right. And then you have the system prompts that are continuing to evolve as well. But I think as these models continue to grow and evolve and get more sophisticated, those, that opens up the door for, hey, how are we going to continue to keep these safe as best we can out of the box?

00:17:19:06 - 00:17:34:13
Jason Williams
And I think there's only so much you're going to be able to do, because I mean, look at how much has come out in the last year. That that repo, Joel, you're looking at, I took a look at those as well, and I saw some very clever crafted systems prompts that evolved, that have evolved, you know.

Joel Moses
Yeah.

Jason Williams
So,

00:17:34:15 - 00:17:35:04
Lori MacVittie
Well, I mean, to

Jason Williams
And I think

00:17:35:04 - 00:17:35:25
Jason Williams
you're seeing that. Yeah.

00:17:36:00 - 00:17:55:09
Lori MacVittie
Yeah. To to Joel's point, I mean, right, there is a there is a technique today, right? The loading of models in real time based on what you need. And the time it takes to actually load a model is going to be impacted by system prompt growth. The bigger it gets, the longer it's going to take to load.

00:17:55:09 - 00:18:11:10
Lori MacVittie
So that process of doing it dynamically is going to become prohibitive in terms of, performance and delay and then probably, as Joel points out, cost in terms of memory, right. Memory actually costs, especially if you're in the cloud, right?

Joel Moses
Yeah, definitely.

Lori MacVittie
Compute's not cheap. So,

00:18:11:13 - 00:18:18:24
Jason Williams
No, it's not. There's a,

Lori MacVittie
yeah.

Jason Williams
there's operational costs that are going to continue to increase as these get, you know, more sophisticated and grow.

00:18:18:27 - 00:18:19:24
Lori MacVittie
Yeah.

00:18:19:26 - 00:18:20:15
Lori MacVittie
Yeah.

00:18:20:18 - 00:18:43:25
Joel Moses
Well you know Lori, I think one of my takeaways from today, I have kind of had, through the review of the GitHub repo and the conversation today, a renewed interest in, in looking at the system prompt and assessing it like an attacker would. It contains a lot of really useful information that attackers can use to devise ways to get around things.

00:18:43:25 - 00:19:01:05
Joel Moses
So, so protection of the system prompt is, is really important. I also learned that the guardrails in system prompting, that you shouldn't just stop there. You should probably look at guardrails that are external to the system prompt in order to provide a check and balance approach to system security.

00:19:01:07 - 00:19:01:20
Jason Williams
Yep.

00:19:01:23 - 00:19:28:13
Lori MacVittie
Absolutely. I, I learned that, you know, the traditional security we have is not enough. We need stuff that's more specialized that, you know, looks at language that, yeah, I mean, prompts as policy just not going to be long term the way we do things. And architecture is going to be in play again. How do we architect secure systems, not how do we secure AI, right?

00:19:28:18 - 00:19:39:12
Lori MacVittie
The focus will change. I think that's what I'm hearing, is that we have to start moving toward that security as architecture rather than as, you know, individual pain points.

00:19:39:15 - 00:19:42:07
Joel Moses
Jason, any takeaways from the conversation today?

00:19:42:09 - 00:20:03:19
Jason Williams
No, I think this was really good. I was going to just say, like, I think the one of the more interesting things in regards to AI and prompt injection and, Lori, I think you mentioned this earlier, is that it's not code. It's not like malicious code that you have to craft and actually have to try to get it onto the system to kind of take over the system, but instead you're just using natural language to try to manipulate a system to get it to do your bidding.

00:20:03:21 - 00:20:15:12
Jason Williams
Right? And I think it's a very new frontier that's continuing to evolve. And it's going to be, it's always a challenge. But I think, yeah, that was, it was a good, it was, it kind of fascinates me in that regard.

00:20:15:14 - 00:20:19:16
Joel Moses
Yeah. Who knew the attackers of the future would be the ones with linguistics degrees?

00:20:19:17 - 00:20:23:28
Jason Williams
Linguistics, that's exactly right, yep.

00:20:24:00 - 00:20:30:14
Lori MacVittie
Language is suddenly important. My pedantry is going to be useful. I feel good today.

00:20:30:14 - 00:20:32:04
Joel Moses
There's hope. There's hope yet, Lori.

00:20:32:06 - 00:20:44:07
Lori MacVittie
There's hope yet. All right, well, that is a wrap for Pop Goes the Stack. Smash subscribe before the disruption, marketing teams keep promising, actually arrives.

Creators and Guests

Joel Moses
Host
Joel Moses
Distinguished Engineer and VP, Strategic Engineer at F5, Joel has over 30 years of industry experience in cybersecurity and networking fields. He holds several US patents related to encryption technique.
Lori MacVittie
Host
Lori MacVittie
Distinguished Engineer and Chief Evangelist at F5, Lori has more than 25 years of industry experience spanning application development, IT architecture, and network and systems' operation. She co-authored the CADD profile for ANSI NCITS 320-1998 and is a prolific author with books spanning security, cloud, and enterprise architecture.
Jason Williams
Guest
Jason Williams
I have been in technology for 20 years and love to learn new techs as they come online. Outside of that, I am dangerous in the kitchen and a wanna be BBQ pit-master.
Tabitha R.R. Powell
Producer
Tabitha R.R. Powell
Technical Thought Leadership Evangelist producing content that makes complex ideas clear and engaging.
Crossing the streams
Broadcast by