LLM

A minimal LLM Ops stack with tracing and model costs

January 14, 2026January 14, 2026 by Igor

0 Comments

I built a minimal FastAPI “customer support reply drafter” with TF-IDF retrieval and Langfuse tracing. You’ll see exactly what context the model used, where latency came from, and what each request cost, plus the trade-offs behind the design.

Lots of green and a friendly grasshopper

RAG: A (mostly) no-buzzword explanation

November 19, 2025 by Igor

Computers Productivity

Comments are closed

Retrieval-Augmented Generation (RAG) is a pattern that fixes the knowledge cutoff and hallucination problems by giving an LLM access to the right data at answer time. Instead of asking the model to “remember everything”, RAG lets it look things up first, then answer.