Latest Episodes
Local-first AI: Keep context out of the cloud
“Just throw it in the cloud” gets complicated when the data is your meetings, your IP, and your operating context. In this episode of Pop Goes the Stack, Lori MacVitti...
DevOps meets AI agents: Risk, audit, and the Deming playbook
AI is no longer a lab tool; it’s showing up in pipelines, production systems, and the places where “seemed like a good idea” becomes a 2 a.m. incident. In this episode...
Model routing isn’t load balancing (And that’s why you’re not ready)
Multi-model AI isn’t a buzzword anymore, it’s how organizations are actually operating. In this episode of Pop Goes the Stack, Lori MacVittie and Joel Moses dig into f...
KV cache is the real inference bottleneck (Not GPUs)
GPUs get all the attention, but in inference, the real bottleneck is often memory, specifically the KV cache. In this episode of Pop Goes the Stack, Lori MacVittie sit...
Measuring what matters: Observability for agents
Agents break the old rules of observability. Latency, throughput, and error rates still matter, but once software starts making decisions and taking actions on someone...
