A minimal LLM Ops stack with tracing and model costs

A minimal LLM Ops stack with tracing and model costs

Many “LLM app” demos stop the moment the model produces a decent-looking answer. However, when the app becomes more real, you get extra questions: What context did the model actually see? Did retrieval find anything useful. Or nothing at all? What did this request cost? How do you compare it to another request? Did a “small prompt tweak” quietly break refund handling? In an attempt to make those questions easier to answer, I built a tiny FastAPI “customer support reply drafter” app and integrated it with Langfuse. The goal was to have a workflow where every request leaves a trail you can inspect, and where changes are measurable. ...

January 14, 2026 · 11 min