A minimal LLM Ops stack with tracing and model costs

Wed, 14 Jan 2026 12:30:35 +0000

Many “LLM app” demos stop the moment the model produces a decent-looking answer. However, when the app becomes more real, you get extra questions:

What context did the model actually see?
Did retrieval find anything useful. Or nothing at all?
What did this request cost? How do you compare it to another request?
Did a “small prompt tweak” quietly break refund handling?

In an attempt to make those questions easier to answer, I built a tiny FastAPI “customer support reply drafter” app and integrated it with Langfuse. The goal was to have a workflow where every request leaves a trail you can inspect, and where changes are measurable.

Ai on {IT}

A minimal LLM Ops stack with tracing and model costs