Bottom Line
Centralized ontologies deliver semantic consistency but create unavoidable staleness, synchronization complexity and query bottlenecks for agentic systems operating against live business events. The superior alternative is the Decision Fabric. This Kafka-native substrate makes every relevant event observable in real time. Agents and policies consume events where they are produced. Every decision is emitted back as a new event on the stream. There is no central model to keep synchronized. The immutable log is the source of truth.
Implemented through the Scalytics open-source stack of KafScale for transport, KafGraph for shared memory, KafClaw for agent runtime and KafSIEM for link analysis, this pattern has been used in multiple enterprise deployments. It enables responsive, observable and fully auditable agentic AI that scales naturally with event volume.
Why Event Streams Beat Centralized Ontologies
Organizations evaluating agentic AI quickly discover that prototype success does not translate easily to production. Early pilots often rely on a central ontology or knowledge graph to provide grounding, reduce hallucinations and enforce governance. Industry analysts have positioned such ontologies as a foundational enterprise architecture asset for trustworthy agentic decisioning.
Yet this approach collides with the realities of modern digital operations. Business events arrive continuously across distributed systems. Customer behavior changes by the second. Supply chains update in real time. Market conditions shift. A central ontology updated through periodic ETL or change data capture is always at least one step behind. The lag may be acceptable for weekly reporting. It is rarely acceptable for agents expected to act autonomously on current conditions.
The second constraint is scale. As the number of deployed agents grows from dozens to hundreds, every agent querying the same central service creates contention. Even with caching and read replicas the coordination cost rises. Platform teams find themselves managing query quotas, shard keys and consistency windows instead of focusing on agent logic.
Governance itself becomes burdensome. Maintaining a single shared model that serves multiple lines of business requires complex conflict resolution, approval workflows and versioning. When two agents assert conflicting facts the central store must arbitrate. The resulting complexity often slows innovation.
These challenges are not theoretical. Client engagements at Scalytics have repeatedly shown platform teams spending disproportionate effort on ontology maintenance and synchronization pipelines. One financial services organization had invested more than 18 months building a comprehensive risk ontology. Despite sophisticated CDC processes, agents frequently made decisions based on credit data that was 45 minutes old. When market volatility increased the ontology lag became a material risk.
Event streams solve these problems at the architectural level. By making the append-only log the system of record, freshness is guaranteed. Agents read events directly from the topics where producers write them. There is no separate synchronization job. Kafka's consumer group mechanism and partitioning provide built-in horizontal scaling and workload isolation.
The replay capability is transformative. When an agent model is updated or a new policy is introduced, operators can replay historical event streams to validate behavior or to bring new agents up to current state without custom backfill logic. This is difficult to replicate with mutable central stores.
Research from multiple sources supports the centrality of event-driven architectures for production agentic systems. An article on using Apache Kafka as the event broker with the Agent2Agent protocol and MCP highlights the decoupling, durability and observability benefits that point-to-point approaches cannot match. Guidance from Red Hat explains how Kafka enables the real-time context delivery and replayability that agentic systems require at scale. Confluent has similarly argued that the Google Agent2Agent protocol needs Kafka to achieve true production characteristics.
The pattern also aligns with established enterprise architecture principles. Event sourcing has long demonstrated that immutable logs provide stronger consistency and audit guarantees than mutable databases for many domains. Extending this principle to the agent layer is a logical progression.
This does not mean semantic models are obsolete. Structured, queryable memory is still essential for agents to reason effectively. The critical distinction is whether that memory is maintained centrally through external synchronization or updated transactionally by the agents themselves as part of their decision process. The latter approach distributes responsibility, removes the central bottleneck and makes provenance inherent to the event stream.
For engineering leaders and CTOs responsible for platform strategy the implication is clear. Architectures that depend on periodic snapshots will struggle as agentic workloads grow. Event-first designs that treat the stream as the source of truth offer a more sustainable path.
The Decision Fabric
The Decision Fabric is an architectural category distinct from both traditional ontology-centric designs and simple RAG pipelines. It is a Kafka-native substrate where every business-relevant event is observable in real time, agents and policies consume the stream where it is produced, and every decision is emitted back as a new event. No central ontology exists to be synchronized. No single snapshot is maintained. The ordered immutable log of events and decisions serves as the authoritative record. Its guiding principle is to turn streams into decisions.
This category is delivered through four complementary open-source components, all licensed under Apache 2.0.
KafScale is the transport and durability backbone. It is an S3-native Kafka-compatible streaming platform. Stateless brokers deployed on Kubernetes write immutable segments directly to object storage. This design removes any practical limit on retention period while remaining fully compatible with existing Kafka clients, Confluent tools and administrative utilities. Its value proposition is captured by the tagline One Endpoint. Infinite Scale.
KafGraph supplies the shared-memory layer required for coherent multi-agent reasoning. Vector databases excel at similarity search but lack strong transactional semantics, shared write consistency and rich traversal capabilities. KafGraph is a distributed knowledge graph implemented in Go and backed by embedded or clustered BadgerDB. It exposes a clean tool-calling interface consisting of seven well-defined JSON schema operations. Agents call these tools to search, recall context, capture new facts, update relationships and more. The graph supports OpenCypher for expressive queries and the Bolt v4 protocol for compatibility with existing tooling. Its tagline is Infinite Memory for AI Agents. The implementation is fully transactional so that updates made by one agent become immediately visible to others consuming the same correlation context.
KafClaw provides the agent runtime. It is a lightweight Go binary that orchestrates heterogeneous agents written in any language capable of producing and consuming Kafka messages. LLM agents, deterministic policy engines, Python microservices, Rust binaries and shell scripts can all participate using a standardized typed JSON envelope format. Each envelope carries correlation identifiers, trace context, and dedicated channels for memory operations, inter-agent requests, responses and audit records. Agents subscribe to input topics, receive events, invoke tools against KafGraph when needed, perform reasoning steps, and publish their decisions as new events. Its tagline is AI Agents under your command.
KafSIEM completes the fabric as the security and link-analysis component. It ingests detector output, agent decisions and operational events from the stream and constructs auditable graphs that record full provenance for every relationship. Designed for defense and critical infrastructure incident response, it emphasizes citable evidence chains rather than generic dashboards. It can operate with embedded SQLite or scale to larger deployments and produces RFC 7946 GeoJSON when spatial analysis is required. Its tagline is Every edge has a citation.
When these components operate together they create a closed-loop system. Agents update shared memory transactionally as a direct consequence of their reasoning. Decisions are published as immutable events. Subsequent agents and analytical processes consume the enriched stream. The entire history is available for replay, audit and continuous improvement. All components are available for immediate use from the Scalytics open source repository.
This architecture inverts the traditional dependency graph of agentic systems. Instead of agents being clients of a central model, they become co-authors of the shared reality. The resulting system is both more responsive and more maintainable at enterprise scale.
How Shared Memory and Event Streams Enable Agentic Systems
The operational sequence in a Decision Fabric deployment follows a disciplined flow that leverages the strengths of immutable events and transactional graph memory.
Business systems and sensors publish events to KafScale topics using established domain-driven naming conventions such as customer.profile.updated.v1 or order.fulfilled.v2. These events carry the authoritative facts at the moment they occurred.
A KafClaw agent configured for a particular responsibility joins the relevant consumer group and begins receiving events. Upon receiving an event the runtime assembles context from three sources: the event payload itself, any prior messages sharing the same correlation identifier, and the current state of the relevant subgraph in KafGraph.
The agent then decides whether additional memory is required. It issues tool calls using ordinary JSON messages. A representative brain_searchcall might look for entities matching certain properties or traversing specific relationship types. Because KafGraph is transactional, the result reflects all captures committed by any agent up to that point in the stream.
For more complex reasoning the agent can issue brain_recall to retrieve a full connected subgraph or brain_capture to assert new entities and relationships that emerged during its analysis. These captures are written with the same correlation identifier so that downstream agents or auditors can trace the provenance exactly.
With enriched context the agent performs its core reasoning. LLM-based agents receive a carefully structured prompt containing the retrieved graph data, previous decisions in the chain, and explicit instructions to emit a decision event. Code-based agents may execute optimization routines or policy evaluations against the same context. The heterogeneity is intentional. Different problems are best solved by different agent types, all collaborating through the shared stream and memory layer.
Once a decision is reached the agent publishes a new event to a designated output topic. The event includes the decision, confidence indicators if applicable, the list of tool calls made, and full trace context. Simultaneously any new facts discovered are captured back into KafGraph.
This pattern integrates naturally with emerging standards such as the Agent2Agent protocol. Protocol messages can be serialized into Kafka events rather than exchanged through synchronous HTTP. The result gains Kafka's delivery guarantees, built-in retry semantics through consumer rebalancing, and the ability for any authorized party to observe the complete conversation history by consuming the topic. Multiple analyses including work from Confluent have explained why this event-driven approach is necessary for scalable A2A implementations.
KafSIEM operates as a passive consumer of both business events and decision events. It builds a parallel graph optimized for relationship traversal and provenance tracking. Every edge in a KafSIEM graph links back to the exact source events and the specific agent version that asserted it. This capability supports rigorous incident review and compliance processes. Related patterns are explored in our work on decentralized data analytics and AI for national defense.
The architecture is deliberately open. Existing Kafka producers require no modification. Legacy systems can publish events through simple adapters or connectors. New agents can be introduced without touching other components. The only shared contracts are event schemas and the standardized tool call interface.
Observability is first class. Standard Kafka metrics are supplemented by agent-specific counters for tool call volume, decision latency, graph mutation rate and correlation completion times. Operators can trace any decision back through the exact sequence of events and memory states that produced it.
This design rests on solid foundations. The immutable log eliminates many classes of reconciliation error. Transactional memory updates ensure consistency without central locking. Explicit correlation identifiers provide end-to-end traceability without a central orchestrator. The combination creates a system that grows more capable as more events and agents are added.
Concrete Implementation Details
Adopting the Decision Fabric requires attention to several implementation practices that have proven effective across client deployments.
Event schema governance is the starting point. We use a schema registry with Avro or JSON Schema definitions that enforce backward compatibility. Topic naming follows a consistent pattern that encodes business domain, entity type, action and version. This convention enables both human comprehension and automated routing rules inside KafClaw agents.
A representative agent configuration demonstrates the simplicity of the runtime.
agent:
id: credit-risk-evaluator-v3
consumer_group: risk-decision-group-2026
input_topics:
- transactions.credit.applications.v2
- customer.credit.score.updated.v1
output_topic: decisions.credit.approval.v2
audit_topic: audit.risk.decisions.v2
memory_tools:
enabled: true
allowed:
- brain_search
- brain_recall
- brain_capture
- brain_update
llm:
provider: anthropic
model: claude-3-5-sonnet
max_tokens: 4096
temperature: 0.1
tracing:
enabled: true
propagator: opentelemetry
The KafClaw binary is deployed as a container. It manages the consumer loop, context assembly, tool routing and response publishing. Dead letter handling, retry backoff and circuit breakers are included in the runtime.
Tool calls follow a standardized JSON contract. The example below shows a brain_capture operation that records a new risk signal and its supporting relationships.
{
"tool_call_id": "tc_9f3k2m8p",
"tool": "brain_capture",
"correlation_id": "corr_8x4p9v2q7w",
"span_id": "span_3d7f9a1c",
"arguments": {
"entities": [{
"type": "RiskSignal",
"id": "rs_7749201",
"attributes": {
"signal_type": "velocity_anomaly",
"severity": "high",
"detected_at": "2026-05-16T10:23:45Z"
}
}],
"relationships": [{
"source": "Customer:cust_118472",
"type": "TRIGGERED",
"target": "RiskSignal:rs_7749201",
"attributes": {
"confidence": 0.87,
"evidence_events": ["txn_449281", "txn_449295"]
}
}]
}
}
KafGraph processes the capture inside a transaction and makes the new nodes and edges immediately queryable by other agents sharing the correlation.
Deployment uses standard Kubernetes manifests and Helm charts provided in the open source repositories. A production cluster for a medium-to-large enterprise might include a KafScale tier with 12-24 stateless brokers, multiple KafClaw deployments segmented by domain, a partitioned KafGraph cluster, and dedicated KafSIEM instances for security teams. Because brokers are stateless, storage capacity is independent of compute and can be scaled by adjusting retention policies on the S3 bucket.
Local development is supported through a single-command Kind cluster that spins up all components with realistic sample event streams. This allows developers to iterate on new agent logic before promoting to shared environments.
Integration with existing Confluent or self-managed Kafka is seamless. KafScale can mirror topics or act as a long-term storage tier. Agents can consume from both legacy and Decision Fabric topics within the same process using standard Kafka consumer configuration.
These patterns have been refined through repeated production deliveries. The open source repositories contain the exact manifests, example agents and monitoring dashboards used in client environments.
Honest Trade-offs
The Decision Fabric is a powerful pattern but it is not appropriate for every situation. Understanding its limitations is essential for responsible adoption.
Graph query performance for complex traversals, while excellent compared to earlier generations of knowledge graphs, remains higher latency than simple key-value lookups or pure vector similarity search. For use cases where approximate semantic retrieval on unstructured documents is the dominant need and data freshness is secondary, a dedicated vector store used in conjunction with the Decision Fabric is often the pragmatic choice. We frequently deploy both.
The operational skill profile required is broader than for simpler RAG architectures. Teams must understand Kafka operations, event schema evolution, graph data modeling, tool-calling patterns and distributed tracing. Organizations without prior streaming experience should budget for knowledge transfer or partnership support.
Initial event taxonomy and graph schema design require investment. Poor partitioning choices or overly generic entity types can lead to hot spots or difficult-to-query graphs. Iterative refinement is necessary, just as with any data platform.
Not every workload benefits. Agents performing purely creative tasks against static knowledge bases or batch analytical jobs against historical aggregates may be better served by traditional ontology or lakehouse approaches. The Decision Fabric shines when decisions must reflect the absolute latest business state and when multiple agents must collaborate with shared context.
The shared graph itself, while transactional, can experience contention under extreme write velocity. Careful design of correlation keys and occasional repartitioning are required, similar to any distributed database.
Exactly-once semantics, while supported through idempotent consumers and transactional producers in Kafka, still require careful application-level handling for certain agent state transitions. Teams must design for at-least-once delivery with appropriate deduplication logic.
Finally, while the components are fully open source, running them at scale still requires platform engineering investment. Enterprises must either build that capability internally or engage experienced partners.
These constraints are not hidden. In our consulting work we have advised clients with immature streaming foundations to begin with simpler patterns and evolve toward the Decision Fabric once foundational capabilities are in place. The pattern is most valuable when event velocity is high, decisions carry material business or regulatory consequence, and multi-agent coordination is required.
Measurable Outcomes
Clients that have implemented the Decision Fabric report several consistent operational improvements.
Decision cycles compress because agents operate on events at the moment they are produced rather than waiting for downstream synchronization. What previously required batch jobs or complex ETL now happens in near real time.
Audit and compliance processes are simplified. Every decision carries its complete provenance chain through the event log and the transactional graph captures. KafSIEM turns this raw data into queryable evidence graphs that incident responders and auditors can traverse with confidence.
Responsibility is better aligned. Domain teams own the agents and event contracts for their part of the business. The central platform team provides the Decision Fabric substrate rather than owning a monolithic ontology. This reduces cross-team coordination tax and accelerates delivery.
Debugging velocity increases. When unexpected behavior occurs, operators replay the precise sequence of input events against the updated agent logic. Root cause analysis that once took days of log correlation is reduced to hours of replay.
The system scales more predictably. Adding new agents or increasing event volume does not automatically increase load on a central query tier. Capacity grows with the business event rate.
Teams also adopt an event-first mindset. New capabilities are designed as reactions to streams rather than as CRUD services. This cultural shift compounds the technical advantages over time.
These results align with industry movement toward event-driven, observable agentic architectures documented by Confluent, Red Hat, IBM and academic research on neurosymbolic systems.
Next Step
Schedule a one-hour architecture review with a Scalytics solutions architect. In that session we will map your existing event streams, identify the highest leverage domains for an initial Decision Fabric implementation, and outline the concrete steps required to deploy KafClaw agents against your current Kafka infrastructure using the open source components.
The complete set of Apache 2.0 projects including KafScale, KafGraph, KafClaw and KafSIEM is available for immediate exploration at scalytics.io/open-source. Working examples, Helm charts and local development environments are included so your team can begin experimentation on day one.
Contact us to arrange your review. We act as a delivery partner focused on production outcomes rather than technology evangelism. Our goal is to help engineering leaders and platform architects build agentic systems that are robust, observable and aligned with the realities of enterprise data infrastructure.
About Scalytics
Our founding team created Apache Wayang (now an Apache Top-Level Project), the federated execution framework that orchestrates Spark, Flink, and TensorFlow where data lives and reduces ETL movement overhead.
We also invented and actively maintain KafScale (S3-Kafka-streaming platform), a Kafka-compatible, stateless data and large object streaming system designed for Kubernetes and object storage backends. Elastic compute. No broker babysitting. No lock-in.
Our mission: data stays in place. Compute comes to you. From data lakehouses to private AI deployment and distributed ML - all designed for security, compliance, and production resilience.
Questions? Join our open Slack community or schedule a consult.