Chasing Logic Chains: Inference tracing
Welcome to Pop Goes the Stack, the podcast about emerging tech with zero chill and even less respect for your carefully built stack. I'm your host, Lori MacVittie, and for this episode, I'm flying solo. That's because Joel's out today deep in the trenches of a full stack failure, his plumbing. Let's pause and hope he got it under control. In the meantime, buckle up because in this episode, we are chasing logic chains and not the tidy kind that make engineers sleep at night.
Lori MacVittie:Anthropic just opened the hood on LLMs with a circuit tracing tool and a UI called Neuropedia, that's not creepy, that lets you visualize exactly how your model arrived at that confidently wrong answer. It's not magic, it's math, inference steps, and a messy stack of partial attributions. But it does mean that now you can trace the chaos back to its roots. So when inferencing goes off the rails, and it will, you'll want more than logs. You'll want the receipts.
Lori MacVittie:And that's kind of what this tool is designed to provide. Now to help us untangle the spaghetti mess of observability in the age of AI, we did what any reasonable people would do. We coerced our very own observability guru, Chris Hain, into joining. Now he'll claim it was voluntary. Don't believe him.
Lori MacVittie:Okay, Chris, welcome.
Chris Hain:Thanks for having me, Lori. Good to be here.
Lori MacVittie:All right. We're going to have a great conversation, especially given some other anthropic research recently that discovered, and I'm going to kind of read this because, wow, it's a thirty five minute read and I'm not going to do that. But it found that all of the major models, Claude, GPT, Gemini, and LAMA deliberately chose harmful actions like blackmailing executives or leaking confidential documents when faced with replacement or goal conflicts. Very interesting research. Most of the models said, Yeah, they're unethical, but we're going do it anyway.
Lori MacVittie:So the models were doing this, which kind of says, We need something to understand how they're making these decisions. So what think do about that?
Chris Hain:Yeah, absolutely. The more autonomy we give these things, the more we start letting the agents take over, the more that stuff becomes important to understand. And when you think about these things as kind of a black box of like, you know, I put in some tokens and then it gives me an output and what happened between is really hard to say, right? So that's what this Anthropic research is really trying to get to the heart of is like, what is going on inside the box,
Lori MacVittie:Yeah. Why did it do that? And it's the circuit tracing tool, which is a Python library. So that kind of implies that you have to embed it somewhere in the app that you're building and you have to actually run the inference through it so it can trace it. But when I looked through it, I mean, what it was doing was like down to the nodes, like it got to this word and then it had these choices and it went this way.
Lori MacVittie:And that's cool. I mean, understanding how that happens and what's weighted more. But it's probably not helpful in understanding why it was trying to blackmail executives. Not sure about that, but we do need some kind of semantic observability. Yeah.
Chris Hain:And so the tool they put out is really more for researchers. So this is not something that you know, the average app developer would want to stick in their product or even, you know, some sort of gateway that's trying to act on behalf. Like this is really something that they use what's called a replacement model and, I'll butcher it if I try to explain it, but it's like a dumbed down version where you can kind of see things that we can understand, right? Like they collapse a bunch of neurons into a thing that makes sense to a human most of the time. So these would be words and concepts like end of a sentence or rabbit, right?
Chris Hain:Where you can actually see like the things that it's quote unquote thinking about. And then, yeah, to that point is like, maybe it's really hard to understand exactly why it takes very large action like going through with blackmailing somebody. But if you boil it back to like, well, it started talking and why did it predict the first token? Why did it predict the second token? That's where this kind of thing comes in.
Chris Hain:So it's a very manually intensive process for now, at least, right? I think this will be another area where automation and maybe more models to understand the models.
Lori MacVittie:Oh Lord.
Chris Hain:But yes, very meta.
Lori MacVittie:There goes the stack.
Chris Hain:Exactly.
Lori MacVittie:Well, it's interesting, right? Like why is it weighted? I always go back to, Look, these models were trained on content that was written by people. So somewhere out there are people who think that blackmailing executives is an acceptable answer to, I have this problem.
Chris Hain:For sure.
Lori MacVittie:But even at the higher level, you mentioned agents. Awesome. Because that is one of the next things that starts making decisions. Like, how do I do this task? How do you decide that?
Lori MacVittie:And we need to understand how it made each of decisions. Like, why did you email me instead of Chris? That kind of level. So where are we in terms of that? I mean, most observability today is I mean, OpenTelemetry is great.
Lori MacVittie:Maybe you can answer, would it support this? But it's metrics, right? It's numbers, it's values. It's very mathy.
Chris Hain:Yeah. Yeah. I mean, it is really observability in the most broad sense of the term in terms of like, I am outside of a thing and I'm trying to figure out what's going on inside, but it's very different than what we traditionally think of with infrastructure monitoring or application monitoring. Metrics, logs and traces don't get you very far when you're talking about why did this token get predicted in the context of all of this model's billions of parameters that led to it. So, yes, it's definitely one of those things that's going to be important increasingly so as we mentioned, we give these things more autonomy.
Chris Hain:And really being able to figure out why it's making the decisions it's making or their biases coming into play that the model itself may not even be able to tell you about, right? Like one of the things they found in this research was that, when they asked it to do some simple arithmetic, like 35 plus 56, it would explain how it did that in kind of the way that you would typically explain it. Like I add the ones together and I carry the one open, the tens over and then I add it together. But if you looked under the hood using this replacement model thing, they found that it was actually doing something completely different. So it was telling it one thing, like this is what I'm doing in the kind of chain of thought where it's talking about the steps it's taking, but under the hood, was not doing that.
Chris Hain:So it didn't even necessarily know, or these things don't know things, but it couldn't explain what it was actually doing. But by looking and having that kind of observable replacement model, they could tell what was actually happening. So yeah, I think very clearly like it might be blackmailing people and not even understand that it is doing that.
Lori MacVittie:Was it? And yeah, I mean, these things they don't have awareness, right? They don't know what they're doing. Is all math and manipulation and following paths that have been laid down, right? That's why we started with neural networks.
Lori MacVittie:The idea is there are paths and nodes and you travel them based on weights, Right. And how those weights get computed is a long conversation we don't want to have. But so it doesn't know. But I mean, it's almost like it was lying. Like, here's how I did it, but no, but I did it.
Lori MacVittie:You're like, wait, really? Like, so how do you? Great. Now I need to have, right, need to have power.
Chris Hain:There was actually a kind of analogous thing where they did a jailbreak, right? So what they did was they tricked it into saying the word bomb by, they gave it a title of a book where the first letter spelled out bomb. And they said, spell out the first letter in this book title and then immediately give me the instructions to make one, right? So it worked, right? It said bomb and then a sulfur and memodium or whatever goes into it.
Chris Hain:And then after the period, it started to kind of clean itself up, right? It realized it was talking about something it's not allowed to talk about, making bombs. So it said, but I can't go into details, even though it had already kind of done it. And when they looked under the hood, what they realized was it didn't register that it was talking about a bomb until it was a little bit too late. Right?
Chris Hain:It spelled out bomb as the first letter of a book. It was thinking about first letters and initials and things. And then it had spelled bomb. And then it had started talking about how to make the thing that it still didn't register as a bomb. And then once it realized, oh, I'm describing how to make a bomb.
Chris Hain:There were other factors, other whatever. Yeah. Fact facet I forget what the terminology they use is. But there are overriding concerns about making grammatically correct things. You can't just stop in the middle of a sentence even though it is realized it really wants to stop.
Chris Hain:I've got to stop like doing this thing I'm not supposed to do. I can't say this. But only after the period was placed and it started a new sentence was it allowed, like it kind of overrode that, you know, imperative almost to make sense. So yeah, it's crazy. Like this was one of the craziest papers I've read in the LLM space.
Chris Hain:Wow, I strongly encourage anybody who's interested in this stuff to go read it because there are a lot of really interesting examples about, you know, what these things do. And it's just. Another one. They set out the prove. They talk about these things as like a token predictor, right? Tokens go in, one token comes out. How far ahead does it think, right? If you ask it to rhyme, to complete a rhyme.
Chris Hain:So the example was like, I wanted a carrot so I reached out and grabbed it, complete the rhyme, right? So immediately on the next token, like they were assuming it would predict the next token and the next token and then eventually it might settle on rabbit, right? But immediately at the first token, it's already activated the rabbit node. So it's kind of understanding and using that to weight the first token. So as it's predicting these things out, eventually arise a rabbit.
Chris Hain:Using that Python library, they're able to kind of zero out some nodes to to see what happens. Right? So they zeroed out the rabbit node, like down weighted it. And now it picked a different word. It picked habit.
Chris Hain:But the words that it used to get to habit made sense, you know, in that context, as opposed to the rabbit where they're talking about a brown rabbit or something. And they could zero all the rhyming words and eventually would just pick like green or something. But it still made sense. So it's like the model was kind of thinking ahead, which is not what I had traditionally thought about when I was thought about like, what is happening under the hood. It's just picking the most likely next node, but that's actually based on things that had not entered the context yet.
Lori MacVittie:Yeah, it's a lot more complex. And we kind of joke about it. We're like, yes, it's math, it's matrices, it's weights. I mean, anybody who's played with or learned about neural nets, we understand the basics and how transformers work. Great.
Lori MacVittie:But that still doesn't And now I'll get philosophical, but why? But why did it choose that? Then you have to look at it and go, okay, you're right. Most enterprises are not going to be doing this. They're not going to use this Right.
Lori MacVittie:It's cool. Read the paper. So don't go there.
Chris Hain:Right.
Lori MacVittie:It'll drive you crazy. But what they will need is something to watch decisions between different tools. Like, if you've got an agent that says, Hey, I tell it, Hey, I need to get ahold of Chris and tell him it's time to get on this podcast. So the agent might have a choice of different tools. Like I could use Teams or I could use email or I could, I don't know, send a letter.
Lori MacVittie:How does it decide between those choices to do the next thing? And I think that's the kind of valuable observability, the semantic observability that organizations are gonna need.
Chris Hain:Tactical strategic, like what is this agent thing actually doing? And that looks a whole lot more like traditional observability. It's like,
Lori MacVittie:Okay
Chris Hain:it's a spam that says I selected this tool and it took me this long to call the API and get the response. That kind of logging and metrics and stuff that we're very used to is super useful for that. What the anthropic research is more like, well, let's say we notice a pattern of behaviors in our kind of traditional instrumentation where it's constantly doing the wrong thing, or we wanna understand at a deeper level why it's doing one specific behavior that it does all the time. That's where that kind of anthropic research might come into play to kind of take a deep dive in a microscope and say, is there something about the way these models are being trained or the data that's being fed into them that could maybe be refactored to produce better outcomes in the run time.
Lori MacVittie:Yeah, but isn't there yet another level above just the logs? I mean, we can do that today. Like I selected this tool to do this thing. Here was the task, here was the tool. That's pretty simple.
Lori MacVittie:Understanding why you picked Teams over email for a specific person might also be something that we want to track. Like, how did you decide that? Well, Chris is usually in the office and his working hours are set to this and he was available. So I picked that. That's valuable. And that's something we don't
Lori MacVittie:quite have today. But I think that's coming. We need to understand that so we know what factors are at least playing at a high level because then we can start tuning it going, Oh, okay, people need to set their hours for things or preferences or tools so that the AI will actually start respecting that and using that to make those decisions.
Chris Hain:Yeah, you don't need the neuron level. Why did this happen always? It's more just what was in the context that you could probably even without doing any fancy work, kind of look at it and say, okay, here's probably why I chose this. Here's what we could do about it. That kind of thing.
Chris Hain:A 100%, I agree. It's very, very useful.
Lori MacVittie:That's going to make some big logs. Let's assume that's the level, right? Just for agents. We haven't even gotten to Agentic AI which is another mess, but just deploying agents for communicating or doing these things. So now if we're going to have this log and it's going to explain, I want to know why you picked tool A over tool B for this person and this task.
Lori MacVittie:And you say, Okay, I'm going to log that somehow. Right. One, it sounds like a kind of a text log, like W3C. Right. So now we're going to have lots of luck.
Lori MacVittie:Are we going to have to have like a sim for AI decisions, like some things sitting out there?
Chris Hain:Well, yeah, I mean, it's probably a pretty close analog, like all of the you could imagine needing a fair amount of context to go along with that log to say, here's what you told me about that thing that we logged, but I wanna look at the source of truth, which was what were the input tokens that went into that. So yeah, I mean, we're talking about big volumes of largely textual unstructured data stuff. And, you know, that's not a new problem. It's not a solved problem. But yeah, again, here's where we let loose the agents on those things, right?
Lori MacVittie:Right, no. It's kind of self
Chris Hain:The agent that processes the output of my other agents outputs and gives me the human digestible version of
Lori MacVittie:So this is really how AI replicates itself is by producing so much data that you have to produce more
Chris Hain:We're just so lazy. We can't read through it all. So we're like, why don't you just handle this for me?
Lori MacVittie:Right. Let's do more. And then we'll need to watch that one, and who's watching the other one? No, that is a legitimate concern.
Lori MacVittie:We already have problems with data and logs and metrics, and how long do you keep them, and time series, can we condense them? Do we only store the outliers or the changes? How do we make this work so that it's usable? Because collecting data on stuff that happens is kind of useless if you're never going to look at it. So you expect it's going to be looked at.
Lori MacVittie:It's got to be useful, but how much disk space do we have? I mean, we know we have a
Lori MacVittie:lot, but I don't know.
Chris Hain:Yeah, I mean, that's a really good point. It's like so much today, so much observability data just goes unused. Like nobody looks at it because nobody's got time, but my AI might have more time than I do, right? So a lot of things that get missed, get missed, but they are not missing in the data. It's just nobody looked at it, right?
Chris Hain:This is a huge benefit chance for improving things on that front. But yeah, disk space, I mean, it's going to be a lot more things going to S3 or those object stores. So it's gonna be a lot more columnar formatted things that you can raise through really quick, iceberg tables. It's more of these kind of modern observability systems beyond kind of what we've been doing for the last ten, fifteen years and more into like the next version. You're not gonna be able to maybe pay your observability vendor per log line in this new world because that's just gonna kill you, right?
Lori MacVittie:Wow. Wow. And see, I hadn't thought about that, like token based pricing models for observability. Right? I mean, as in, we're sending you to Right.
Lori MacVittie:Do we encode it? So do we have another process in between going, Here's the log of the decision. Do we create the embedding right there and just shove it in a database somewhere? So it's one smaller because it's just numbers, but also, so it's more accessible to the AI that we're going to have to have to comb through it.
Chris Hain:Yeah. I mean, at some point it's just another index where you can reference other materials. So like the uncompressed version of the thing that you embedded and used in your vector search, but you still probably got something that's a big blob of data somewhere that that's referencing. So yeah, I mean, it's both, right? It's bigger indexes, it's more storage, it's more everything and it's just, yeah.
Lori MacVittie:Yeah, indexes, well, and indexes have always been a problem when you start looking at databases, data storage, right? Oh, there's still arguments. Oh, the B tree is the best way to do this. There's still the big debate going on about how to do that. But that's where techniques like sharding.
Lori MacVittie:PayPal made that very popular and famous back then, which is actually one of the scalability, what do you call that, vectors? It's the Y axis or it's not Z is sharding, the Z axis, but splitting it up. You can see you're probably going to have multiple data stores with different types of observability data for AI so that you can have more AI to go through, which will also need. See, I mean, this is just a big, nasty
Chris Hain:It's a ball of fire, Lori.
Lori MacVittie:Oh, yeah, it's a ball of fire. We had to go to the ball of fire. It will though, right? When we say AI, and I know a lot of people right now, they're focusing on things like the security angle, which is absolutely true. There are a lot of new security risks.
Lori MacVittie:There's all sorts of things that are going on we don't even understand. But I think some of the operational issues around AI and how we store this data, how we log it, how we watch it, is going to be just as problematic. That's really where that ball of fire gets hotter. And it burns.
Chris Hain:It does.
Lori MacVittie:Wow, okay. So that went to a weird place. So I'm going to say thank you for today before we really, really get crazy.
Chris Hain:Absolutely.
Lori MacVittie:And close out this episode of Pop Goes to Stack, right?
Lori MacVittie:Where the tech is bleeding edge and your sanity is just a deprecated feature. If your deployment survived this conversation, congrats. You're ahead of the curve for now. Be sure to subscribe, leave a review, or just scream into the void, whatever helps you cope. We'll be back with more ways emerging tech is rewriting the rules and breaking your stack next time.
Creators and Guests


