Llm on {IT}

STATUS.md: a shared file for multi-agent work

Mon, 25 May 2026 19:19:04 +0000

When I work on a bigger task – a new feature, a Terraform change, a small PoC – I usually run it across multiple agents at once. Claude Code in one window for the code, a Cowork session in another for planning and content, sometimes Desktop Claude in a third.

The split works well until I switch between them and have to type some flavour of “where are we?” so the agent can guess. Each one has its own TODO list. None of them can see the others’. And so I end up as the human message bus, with the context windows filling up with status updates instead of actual work.

A minimal LLM Ops stack with tracing and model costs

Wed, 14 Jan 2026 12:30:35 +0000

Many “LLM app” demos stop the moment the model produces a decent-looking answer. However, when the app becomes more real, you get extra questions:

What context did the model actually see?
Did retrieval find anything useful. Or nothing at all?
What did this request cost? How do you compare it to another request?
Did a “small prompt tweak” quietly break refund handling?

In an attempt to make those questions easier to answer, I built a tiny FastAPI “customer support reply drafter” app and integrated it with Langfuse. The goal was to have a workflow where every request leaves a trail you can inspect, and where changes are measurable.

RAG: A (mostly) no-buzzword explanation

Wed, 19 Nov 2025 18:41:45 +0000

LLMs, like the ones behind ChatGPT or Gemini, have two big weaknesses:

Their knowledge is frozen at training time (“knowledge cutoff”)
They can “hallucinate” or confidently make things up

Retrieval-Augmented Generation (RAG) is a pattern that fixes both problems by giving an LLM access to the right data at answer time. Instead of asking the model to “remember everything”, RAG lets it look things up first, then answer.

Core idea

RAG = search for relevant documents → feed them into the LLM → have the LLM respond using (also) those documents.