<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Llm on {IT}</title><link>https://igortkanov.com/computers/llm/</link><description>Recent content in Llm on {IT}</description><generator>Hugo</generator><language>en-us</language><copyright>Copyright © 2026 {IT}. All rights reserved. Unless otherwise stated, all text, images, diagrams, and other original content on this blog may not be reproduced, distributed, or used without prior written permission.</copyright><lastBuildDate>Mon, 25 May 2026 19:19:04 +0000</lastBuildDate><atom:link href="https://igortkanov.com/computers/llm/index.xml" rel="self" type="application/rss+xml"/><item><title>STATUS.md: a shared file for multi-agent work</title><link>https://igortkanov.com/status-md-for-multi-agent-work/</link><pubDate>Mon, 25 May 2026 19:19:04 +0000</pubDate><guid>https://igortkanov.com/status-md-for-multi-agent-work/</guid><description>&lt;p&gt;When I work on a bigger task – a new feature, a Terraform change, a small PoC – I usually run it across multiple agents at once. Claude Code in one window for the code, a Cowork session in another for planning and content, sometimes Desktop Claude in a third.&lt;/p&gt;
&lt;p&gt;The split works well until I switch between them and have to type some flavour of &amp;ldquo;where are we?&amp;rdquo; so the agent can guess. Each one has its own TODO list. None of them can see the others&amp;rsquo;. And so I end up as the human message bus, with the context windows filling up with status updates instead of actual work.&lt;/p&gt;</description></item><item><title>A minimal LLM Ops stack with tracing and model costs</title><link>https://igortkanov.com/minimal-llm-ops-stack-with-tracing-and-model-costs-langfuse/</link><pubDate>Wed, 14 Jan 2026 12:30:35 +0000</pubDate><guid>https://igortkanov.com/minimal-llm-ops-stack-with-tracing-and-model-costs-langfuse/</guid><description>&lt;p&gt;Many &amp;ldquo;LLM app&amp;rdquo; demos stop the moment the model produces a decent-looking answer. However, when the app becomes more real, you get extra questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What &lt;strong&gt;context&lt;/strong&gt; did the model actually see?&lt;/li&gt;
&lt;li&gt;Did &lt;strong&gt;retrieval&lt;/strong&gt; find anything useful. Or nothing at all?&lt;/li&gt;
&lt;li&gt;What did this request &lt;strong&gt;cost&lt;/strong&gt;? How do you compare it to another request?&lt;/li&gt;
&lt;li&gt;Did a &amp;ldquo;small prompt tweak&amp;rdquo; quietly &lt;strong&gt;break&lt;/strong&gt; refund handling?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In an attempt to make those questions easier to answer, I built a tiny &lt;strong&gt;&lt;a href="https://fastapi.tiangolo.com" target="_blank" rel="noopener noreferrer"&gt;FastAPI&lt;/a&gt;&lt;/strong&gt; &amp;ldquo;customer support reply drafter&amp;rdquo; app and integrated it with Langfuse. The goal was to have a workflow where &lt;strong&gt;every request leaves a trail&lt;/strong&gt; you can inspect, and where changes are measurable.&lt;/p&gt;</description></item><item><title>RAG: A (mostly) no-buzzword explanation</title><link>https://igortkanov.com/rag-a-no-buzzword-explanation/</link><pubDate>Wed, 19 Nov 2025 18:41:45 +0000</pubDate><guid>https://igortkanov.com/rag-a-no-buzzword-explanation/</guid><description>&lt;p&gt;LLMs, like the ones behind ChatGPT or Gemini, have two big weaknesses:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Their knowledge is frozen at training time (“knowledge cutoff”)&lt;/li&gt;
&lt;li&gt;They can “hallucinate” or confidently make things up&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Retrieval-Augmented Generation (RAG)&lt;/strong&gt; is a pattern that fixes both problems by giving an LLM access to the right data at &lt;em&gt;answer time&lt;/em&gt;. Instead of asking the model to “remember everything”, RAG lets it look things up first, &lt;strong&gt;then&lt;/strong&gt; answer.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="core-idea"&gt;Core idea&lt;/h2&gt;
&lt;p&gt;RAG = search for relevant documents → feed them into the LLM → have the LLM respond using (also) those documents.&lt;/p&gt;</description></item></channel></rss>