April 2026
I have been building infrastructure for AI agents since before it was called agentic AI. Not demos. The actual thing: a knowledge layer that multiple agents can read from and write to simultaneously, without tearing itself apart under concurrent updates, without poisoning itself with stale facts, and without requiring every consumer to speak the same query language.
It took a long time. Longer than it should have, partly because the problems are genuinely hard, and partly because I kept reaching for tools that were almost right. Vector databases. Graph databases. Document stores. Each one solved a piece of the problem and introduced a new one.
I want to be specific about what broke and why, because the industry is about to repeat those mistakes at scale.
The Problem Nobody Admits
Multi-agent systems have a shared memory problem that most architectures paper over. Agents read stale state and act on it. Agents write conflicting updates with no arbitration. Agents store facts that were true three minutes ago but are not true anymore, and nobody has a mechanism to tell them. This is true for simple two-agent pipelines and it is true for the multi-agent LLM systems that enterprises are now deploying as part of data platform modernization programs.
The standard responses to this are: use a vector database with metadata filtering, use a graph database with versioned nodes, or inject fresh context at the start of every agent call and pretend the problem does not exist.
The third approach is by far the most popular. It is also the one that collapses at scale. Injecting a full context payload on every invocation is a token budget problem disguised as an architecture decision. At 10 agents it is fine. At 500 agents running concurrent tasks against the same knowledge base, you are paying for the same embeddings to be recomputed and injected hundreds of times per second.
The graph database approach is closer to right. But most graph databases are designed for human analysts running Cypher queries, not for agents that need to traverse a subgraph in under 50 milliseconds and report back to an orchestrator. The query interface is wrong. The latency profile is wrong. And the consistency model is wrong: most graph databases give you either full ACID semantics (expensive) or eventual consistency with no clear expiry semantics for stale facts.
Confluent arrived at the same diagnosis in early 2025, writing that the challenge facing agentic systems "isn't an AI problem, it's an infrastructure and data interoperability problem". Their answer was event-driven architecture on Kafka. That is correct as far as it goes. The part the industry has not addressed is what happens to the knowledge state those agents accumulate: how it expires, how it stays coherent across concurrent readers, and how an agent can traverse it without issuing a database query.
We have written about the broader shape of this problem in our work on Agentic RAG and decentralized data control and on why enterprise AI needs a proper runtime. The shared pattern across all of it: the retrieval layer is the bottleneck, and nobody is treating it as a first-class infrastructure problem.
Agents Already Know How to Navigate Filesystems
There is something agents are genuinely good at that we underuse. Agents trained on code navigate a filesystem without instruction. They issue ls, cat, grep, find. They follow directory hierarchies. They treat a directory tree as a navigable data structure, not as something that requires a query language.
Mintlify demonstrated this clearly with their ChromaFs work, published in March 2026. They replaced RAG over documentation with a virtual filesystem backed by Chroma. Session startup dropped from 46 seconds to 100 milliseconds. The reason is mechanical: instead of computing cosine distances over 768-dimensional vectors, agents just called grep. The behavior the model was already trained on turned out to be fast enough for most retrieval tasks.
It is worth reading their traffic data alongside this. AI coding agents now account for 45.3% of all documentation requests across Mintlify-powered sites, with Claude Code alone generating more requests than Chrome on Windows. The audience for documentation has already changed. The tooling to serve that audience has not caught up.
But ChromaFs starts with finished documents. Someone authored them. Someone organized them into a hierarchy. The AI arrives at the end to read what humans wrote.
That assumption breaks the moment your knowledge is generated by other agents. There is no human author. The "documents" are inference outputs, sensor readings, API responses, conversation transcripts. The hierarchy is not a folder tree someone designed. It is a graph. And in our work on real-time streaming architectures and event-driven agent systems, the graph is always live: edges appear and disappear in seconds, not days.
What agents actually need is not a filesystem over static documents. It is a filesystem over a live graph, one where the primitive operations are not "read this markdown file" but "traverse this edge," "enumerate the neighbors of this node," and "binary search this sorted node list."
Why Grep Beats RAG in Agentic Pipelines
RAG was designed for fuzzy human queries against stable corpora. Agents do not issue fuzzy queries: they execute plans, and a plan step either resolves exactly or it fails. Grep on a sorted binary partition gives you a definitive negative in O(log n), something cosine similarity cannot do, because a similarity score of 0.0 and a similarity score of 0.87 for a wrong chunk are operationally indistinguishable from the agent's perspective. Mintlify measured this directly: replacing vector retrieval with filesystem-based grep dropped session latency from 46 seconds to 100 milliseconds and eliminated per-query compute cost entirely, because grep does not require embedding inference at read time. The deeper issue is that RAG has no freshness model: embeddings computed at index time stay in the index until someone reruns the pipeline, which means an agent in a live system is always operating against a snapshot of unknown age, with no mechanism to know which facts have expired and which have not.
KafScale as a Ramdisk for Graph Storage
A ramdisk is the simplest possible storage abstraction. You allocate a block of RAM, expose it as a block device, and mount it. Processes read and write bytes. The OS does not care that the backing store is memory instead of spinning disk. The interface is identical: open a file descriptor, seek to an offset, read N bytes. What changes is the latency.
This is the analogy that drove the architecture of KafScale when applied to graph storage.
The idea: represent the knowledge graph not as a labeled property graph with a Cypher API, but as a set of binary-packed record files. Nodes are fixed-width records. Edges are offset tables. Node IDs are 64-bit integers, sorted. The files are backed by Apache Kafka topics under the hood, which gives you append-only semantics, replayability, and exactly-once writes. But to any consumer, the interface looks like reading bytes from a mounted block device.
The consequence is that graph traversal becomes a binary search problem. To find node 7e3a9f, you do not issue a query to a graph engine. You binary search a sorted node list, the same operation your process would perform against a memory-mapped file. This runs in O(log n) with no network roundtrip, no query planner, no connection pool overhead.
Agents do not need to know this is a graph. They see a directory structure: /graph/nodes/, /graph/edges/by-source/, /graph/edges/by-target/. They use standard binary grep (searching fixed-width records by field offset) to locate nodes. The filesystem metaphor holds, but the underlying representation is dense binary, not human-readable text. For agents that communicate via MCP, KafScale exposes the same partitions as MCP-native tools: the Kafka-backed binary store and the MCP tool protocol share the same partition model, which means a Kafka MCP architecture does not require a translation layer between the agent's tool calls and the underlying storage.
This is architecturally distinct from GraphAr, the Apache project out of Alibaba that is standardizing a graph data file format using Parquet under the hood. GraphAr requires a Parquet reader to access any node. That is not what we want. We want agents to use grep. The goal is zero-dependency traversal: a binary-trained model, a mounted partition, and standard shell tools.
The design connects to our broader work on secure and scalable MCP/Kafka architectures, where the same Kafka-backed, partition-per-domain model underpins multi-tenant data isolation. The graph storage layer is the same pattern pushed down to the node level.
The Tombstone Problem
The hardest part of shared agent memory is not storage. It is expiry.
In a database, you delete a row. In a knowledge graph used by concurrent agents, you need to express that a fact was true until a specific time, is now superseded, and consumers who cached it should stop acting on it. That is not a delete. It is a tombstone.
LSM-tree databases like RocksDB and Cassandra handle this with tombstone records in the write-ahead log. When you delete a key, a deletion marker is appended to the log. Readers who encounter the tombstone know the value is gone. The original record is not physically removed until compaction runs. The log is always append-only.
KafGraph, our distributed knowledge graph engine built on Kafka, applies this model directly to knowledge nodes.
When a fact expires or is superseded, we do not remove the node record. We append a tombstone event to the Kafka topic that backs that partition. The tombstone carries the node ID, the timestamp, and an optional supersession pointer (the ID of the replacement node, if one exists). Consumers that have cached the node receive the tombstone through their event subscription and invalidate their local copies.
The lease variant covers more cases. Nodes can be written with a TTL field. When the TTL expires, the system automatically generates a tombstone. This is exactly the lease semantics from distributed coordination systems like etcd and Google's Chubby: a resource is valid for a finite duration, and the holder must either renew or let it expire. The original formalization of lease semantics for distributed file cache consistency comes from Gray and Cheriton's 1989 paper, which established the principle that a lease is a time-bounded grant of validity. We borrowed the model because it maps cleanly onto knowledge that has a natural freshness window: API response data, queue depths, deployment states. You write them with a TTL and let the system handle the rest.
The result is what I call stable-with-tombstones semantics. At any point in time, an agent reading the graph knows exactly which facts are current (no tombstone) and which have expired (tombstone present). This is verifiably different from a vector database, where stale embeddings stay in the index indefinitely and silently pollute retrieval results.
This connects to a broader principle we have been developing in our work on federated learning and data integrity: the adversarial case for shared model state is poisoning, and the defense is always some form of provenance-with-expiry. Tombstones in a knowledge graph are the same class of solution applied to agent memory.
Part II: The First Chat with the Thing We Built
I want to share something that blew my mind, like the Eureka moment Archimedes had, the moment that proved our ideas right. It just happened a few days back, last weekend in March 2026. I do not usually write about sessions that feel unfinished. But this one belongs in the record.
KafClaw (docs: https://kafclaw.scalytics.io) is our agent runtime. It runs on KafScale (GitHub: https://github.com/KafScale/platform) as communication layer and KafGraph as the memorization layer. Its memory is the binary-packed node partition described above, as said backed by KafScale’s long-term S3 storage, and it has been accumulating content since early 2023: over 1,100 documents from real Kafka consultations, unsorted, ingested in the order they arrived. Consumer group failures at 2am. Partition skew diagnoses. Dead letter queue investigations. Schema registry migrations. Two years of the messy, repetitive, specific problems that Kafka engineers actually face.
KafClaw does not retrieve from this corpus. It lives inside it. When it answers a question, it is traversing its own node graph. When it reasons about a pattern, it is reasoning over the shape of everything that has been written into it since it was initialized.
I had been testing the tombstone format for about an hour. At some point I stopped running benchmarks and started asking it questions.
What follows is a lightly edited transcript of that session. KafClaw responds as node://, because that is what it is: an address in its own graph.
Where This Goes
We are still building this. The binary format is mostly stable with some remaining work. The lease propagation model is in pre-alpha internal testing at small scale. The real challenge, teaching orchestrators to reason about TTL boundaries and context freshness, is ongoing.
This architecture is part of why we built KafScale and KafGraph the way we did, and why the Apache Wayang foundation underneath Scalytics matters: you need a query abstraction layer that can route work to the right execution backend depending on whether you are doing batch graph analytics, Flink streaming over live Kafka topics, or sub-10ms single-node lookup. Those are not the same workload. They should not hit the same code path.
The direction is clear. An agent deployed in a normal mid-size enterprise does not read one database. It has to navigate thousands of fragmented environments: SharePoint, shared folders, Splunk partitions, IT security logs, SCADA networks, BACnet controllers, IoT streams from industrial machines, medical equipment, and Kafka topics across a dozen teams. That list is not exhaustive. It is just what comes to mind immediately, and the reality in any real enterprise is harder.
This is the actual reason AI adoption is slower than the marketing suggests. Anthropic, OpenAI, xAI, Google: every major AI lab built on the assumption that data can be centralized, or at least made accessible through a unified API. That assumption is not new. It is the same assumption that drove Edgar Codd's relational model at IBM in 1970: gather the data in one place, impose a schema, query it. That paper won the Turing Award and defined how the industry thinks about data for the next five decades. Apache Hadoop scaled that paradigm across commodity hardware. Databricks put a lakehouse on top of it. Snowflake moved it to the cloud. Every generation repackaged centralization as innovation, and none of them changed the fundamental premise. Now the AI wave has arrived and found the same paradigm waiting for it, dressed up in vector indexes instead of SQL tables.
Centralizing data so that an LLM can work with it is a fundamentally wrong starting point. It is not executable in practice, not in a utility company, not in a hospital network, not in a manufacturing plant, not in any organization that has been accumulating operational data across incompatible systems for sixty years. The data is not going to move. The agent has to go where the data is.
You cannot write a custom query integration for every silo. You cannot afford the compute to vectorize all of it. It's that simple, and that hard.
The tools to close that gap exist. KafScale gives you the memory-mapped binary store. KafClaw gives you the agentic runtime: sub-second boot time, explicit intent framework, a clear security model. KafGraph gives you context collaboration, a replayable tombstone log, and lease semantics. That is the architecture that actually works and breaks the decades old data centralization cycle.
What remains is making the agent ecosystem expect this interface instead of treating every knowledge access as a vector similarity search. That part is a standards problem. Standards problems are solved by building the thing and showing it works.
That is what that first session proved to us. The agent does not need a standard. It already knew the format, because the format is what it's made of.
--------
KafGraph and KafScale are part of the Scalytics open source stack. If you are building multi-agent systems with shared memory requirements, the Scalytics AI Readiness Assessment is a starting point for mapping where your current architecture stands.
Related reading from the Scalytics blog: Agentic RAG and decentralized data control | Streaming intelligence and real-time business decisions | Secure MCP/Kafka architecture for real-time analytics | The missing runtime for open models in the enterprise | Benchmarking contextual AI
About Scalytics
Our founding team created Apache Wayang (now an Apache Top-Level Project), the federated execution framework that orchestrates Spark, Flink, and TensorFlow where data lives and reduces ETL movement overhead.
We also invented and actively maintain KafScale (S3-Kafka-streaming platform), a Kafka-compatible, stateless data and large object streaming system designed for Kubernetes and object storage backends. Elastic compute. No broker babysitting. No lock-in.
Our mission: data stays in place. Compute comes to you. From data lakehouses to private AI deployment and distributed ML - all designed for security, compliance, and production resilience.
Questions? Join our open Slack community or schedule a consult.
