A lone sailboat on a calm, wide-open sea

The 1M context window, and what it actually costs you 💸

A 1M-token LLM context window is a tool, not a target. How context actually works, why long threads cost more on every turn, and when to start a fresh chat versus keep going, with practical Claude Code tips and /context.

June 10, 2026 · 8 min
Random pipes

STATUS.md: a shared file for multi-agent work

A file-based pattern for coordinating multiple LLM agents on one task: a single shared, per-feature STATUS.md the agents read and write under explicit rules, how to make them follow a protocol and survive context loss, and when not to use it.

May 25, 2026 · 8 min
A minimal LLM Ops stack with tracing and model costs

A minimal LLM Ops stack with tracing and model costs

Building a minimal LLM Ops stack: a FastAPI “customer support reply drafter” instrumented with Langfuse for request tracing, grounded retrieval, and per-request model cost tracking, so every LLM call is inspectable.

January 14, 2026 · 11 min

RAG: A (mostly) no-buzzword explanation

Retrieval-Augmented Generation (RAG) explained without buzzwords: how it gives an LLM the right data at answer time to fix stale knowledge and hallucinations, the step-by-step flow, its benefits over fine-tuning, and when RAG is not the answer.

November 19, 2025 · 4 min