A lone sailboat on a calm, wide-open sea

The 1M context window, and what it actually costs you 💸

A 1M-token LLM context window is a tool, not a target. How context actually works, why long threads cost more on every turn, and when to start a fresh chat versus keep going, with practical Claude Code tips and /context.

June 10, 2026 · 8 min
A minimal LLM Ops stack with tracing and model costs

A minimal LLM Ops stack with tracing and model costs

Building a minimal LLM Ops stack: a FastAPI “customer support reply drafter” instrumented with Langfuse for request tracing, grounded retrieval, and per-request model cost tracking, so every LLM call is inspectable.

January 14, 2026 · 11 min