What is a Decision Fabric?

A Decision Fabric is a Kafka-native substrate where every relevant business event is observable in real time. Agents and policies consume events where they are produced, and every decision is emitted back as a new event. The immutable log is the source of truth, so there is no central model to keep synchronized.

Why do centralized ontologies struggle with agentic AI?

Centralized ontologies deliver semantic consistency but introduce three structural problems for agentic systems: staleness between source events and the modeled state, synchronization complexity as data shape changes, and query bottlenecks when many agents read the same model concurrently. These limits compound as event volume and agent count grow.

How does the Scalytics stack implement the Decision Fabric?

KafScale provides the S3-native, stateless streaming spine for transport. KafGraph holds the shared memory accessible to all agents. KafClaw runs the agents themselves based on messages and topics. kafSIEM applies typed-entity link analysis over the resulting event graph. All four are Apache 2.0 licensed and run on top of existing Kafka deployments.

Is event-driven agentic AI auditable?

Yes. Because every input event and every decision event is written to the immutable log, the entire reasoning path of any agent is replayable from the stream. This is structurally stronger than auditing centralized ontologies, where the modeled state at decision time may differ from the modeled state at audit time.

Scalytics | Event Streams vs Centralized Ontologies for Agentic AI

Alexander Alten

CTO & co-founder

May 20, 2026

Sovereign Compute

Bottom Line

Centralized ontologies deliver semantic consistency but create unavoidable staleness, synchronization complexity and query bottlenecks for agentic systems operating against live business events. The superior alternative is the Decision Fabric. This Kafka-native substrate makes every relevant event observable in real time. Agents and policies consume events where they are produced. Every decision is emitted back as a new event on the stream. There is no central model to keep synchronized. The immutable log is the source of truth.

Implemented through the Scalytics open-source stack of KafScale for transport, KafGraph for shared memory, KafClaw for agent runtime and KafSIEM for link analysis, this pattern has been used in multiple enterprise deployments. It enables responsive, observable and fully auditable agentic AI that scales naturally with event volume.

Architectural Shift

Centralized Ontology vs. Decision Fabric

Two architectures for agentic AI. One reconciles a modeled state to source events on a schedule. The other treats the event log itself as the source of truth.

Legacy

Centralized Ontology

Source Events ↓ CDC / reconcile Central Model ↓ query API Agents

× State drift. Modeled state lags source events. The model at decision time may not equal the log at audit time.
× Query contention. Concurrent agent reads against the central model create lock and quota pressure as agent count grows.
× Sync as platform debt. Reconciliation between modeled state and event-of-record is the platform's responsibility, not the application's, and gets harder as data shape evolves.

Decision Fabric

Event Log as Source of Truth

KafScale · Event Spine ↓ consume↑ emit decision

KafClawAgents KafGraphMemory

↑ every decision becomes a new event on the log

✓ Real-time consumption. Agents and policies read events where they are produced. No intermediate sync layer.
✓ Horizontal scale. Kafka-native partitioning scales agent throughput past the point a central model caps out at.
✓ Replayable audit. Every decision is emitted as a new event. The full reasoning path of any agent is reconstructable from the log.

Why Event Streams Beat Centralized Ontologies

Organizations evaluating agentic AI quickly discover that prototype success does not translate easily to production. Early pilots often rely on a central ontology or knowledge graph to provide grounding, reduce hallucinations and enforce governance. Industry analysts have positioned such ontologies as a foundational enterprise architecture asset for trustworthy agentic decisioning.

Yet this approach collides with the realities of modern digital operations. Business events arrive continuously across distributed systems. Customer behavior changes by the second. Supply chains update in real time. Market conditions shift. A central ontology updated through periodic ETL or change data capture is always at least one step behind. The lag may be acceptable for weekly reporting. It is rarely acceptable for agents expected to act autonomously on current conditions.

The second constraint is scale. As the number of deployed agents grows from dozens to hundreds, every agent querying the same central service creates contention. Even with caching and read replicas the coordination cost rises. Platform teams find themselves managing query quotas, shard keys and consistency windows instead of focusing on agent logic.

Governance itself becomes burdensome. Maintaining a single shared model that serves multiple lines of business requires complex conflict resolution, approval workflows and versioning. When two agents assert conflicting facts the central store must arbitrate. The resulting complexity often slows innovation.

These challenges are not theoretical. Client engagements at Scalytics have repeatedly shown platform teams spending disproportionate effort on ontology maintenance and synchronization pipelines. One financial services organization had invested more than 18 months building a comprehensive risk ontology. Despite sophisticated CDC processes, agents frequently made decisions based on credit data that was 45 minutes old. When market volatility increased the ontology lag became a material risk.

Event streams solve these problems at the architectural level. By making the append-only log the system of record, freshness is guaranteed. Agents read events directly from the topics where producers write them. There is no separate synchronization job. Kafka's consumer group mechanism and partitioning provide built-in horizontal scaling and workload isolation.

The replay capability is transformative. When an agent model is updated or a new policy is introduced, operators can replay historical event streams to validate behavior or to bring new agents up to current state without custom backfill logic. This is difficult to replicate with mutable central stores.

Research from multiple sources supports the centrality of event-driven architectures for production agentic systems. An article on using Apache Kafka as the event broker with the Agent2Agent protocol and MCP highlights the decoupling, durability and observability benefits that point-to-point approaches cannot match. Guidance from Red Hat explains how Kafka enables the real-time context delivery and replayability that agentic systems require at scale. Confluent has similarly argued that the Google Agent2Agent protocol needs Kafka to achieve true production characteristics.

The pattern also aligns with established enterprise architecture principles. Event sourcing has long demonstrated that immutable logs provide stronger consistency and audit guarantees than mutable databases for many domains. Extending this principle to the agent layer is a logical progression.

This does not mean semantic models are obsolete. Structured, queryable memory is still essential for agents to reason effectively. The critical distinction is whether that memory is maintained centrally through external synchronization or updated transactionally by the agents themselves as part of their decision process. The latter approach distributes responsibility, removes the central bottleneck and makes provenance inherent to the event stream.

For engineering leaders and CTOs responsible for platform strategy the implication is clear. Architectures that depend on periodic snapshots will struggle as agentic workloads grow. Event-first designs that treat the stream as the source of truth offer a more sustainable path.

‍

The Decision Fabric

The Decision Fabric is an architectural category distinct from both traditional ontology-centric designs and simple RAG pipelines. It is a Kafka-native substrate where every business-relevant event is observable in real time, agents and policies consume the stream where it is produced, and every decision is emitted back as a new event. No central ontology exists to be synchronized. No single snapshot is maintained. The ordered immutable log of events and decisions serves as the authoritative record. Its guiding principle is to turn streams into decisions.

This category is delivered through four complementary open-source components, all licensed under Apache 2.0.

KafScale is the transport and durability backbone. It is an S3-native Kafka-compatible streaming platform. Stateless brokers deployed on Kubernetes write immutable segments directly to object storage. This design removes any practical limit on retention period while remaining fully compatible with existing Kafka clients, Confluent tools and administrative utilities. Its value proposition is captured by the tagline One Endpoint. Infinite Scale.

KafGraph supplies the shared-memory layer required for coherent multi-agent reasoning. Vector databases excel at similarity search but lack strong transactional semantics, shared write consistency and rich traversal capabilities. KafGraph is a distributed knowledge graph implemented in Go and backed by embedded or clustered BadgerDB. It exposes a clean tool-calling interface consisting of seven well-defined JSON schema operations. Agents call these tools to search, recall context, capture new facts, update relationships and more. The graph supports OpenCypher for expressive queries and the Bolt v4 protocol for compatibility with existing tooling. Its tagline is Infinite Memory for AI Agents. The implementation is fully transactional so that updates made by one agent become immediately visible to others consuming the same correlation context.

KafClaw provides the agent runtime. It is a lightweight Go binary that orchestrates heterogeneous agents written in any language capable of producing and consuming Kafka messages. LLM agents, deterministic policy engines, Python microservices, Rust binaries and shell scripts can all participate using a standardized typed JSON envelope format. Each envelope carries correlation identifiers, trace context, and dedicated channels for memory operations, inter-agent requests, responses and audit records. Agents subscribe to input topics, receive events, invoke tools against KafGraph when needed, perform reasoning steps, and publish their decisions as new events. Its tagline is AI Agents under your command.

KafSIEM completes the fabric as the security and link-analysis component. It ingests detector output, agent decisions and operational events from the stream and constructs auditable graphs that record full provenance for every relationship. Designed for defense and critical infrastructure incident response, it emphasizes citable evidence chains rather than generic dashboards. It can operate with embedded SQLite or scale to larger deployments and produces RFC 7946 GeoJSON when spatial analysis is required. Its tagline is Every edge has a citation.

When these components operate together they create a closed-loop system. Agents update shared memory transactionally as a direct consequence of their reasoning. Decisions are published as immutable events. Subsequent agents and analytical processes consume the enriched stream. The entire history is available for replay, audit and continuous improvement. All components are available for immediate use from the Scalytics open source repository.

This architecture inverts the traditional dependency graph of agentic systems. Instead of agents being clients of a central model, they become co-authors of the shared reality. The resulting system is both more responsive and more maintainable at enterprise scale.

‍

How Shared Memory and Event Streams Enable Agentic Systems

The operational sequence in a Decision Fabric deployment follows a disciplined flow that leverages the strengths of immutable events and transactional graph memory.

Business systems and sensors publish events to KafScale topics using established domain-driven naming conventions such as customer.profile.updated.v1 or order.fulfilled.v2. These events carry the authoritative facts at the moment they occurred.

A KafClaw agent configured for a particular responsibility joins the relevant consumer group and begins receiving events. Upon receiving an event the runtime assembles context from three sources: the event payload itself, any prior messages sharing the same correlation identifier, and the current state of the relevant subgraph in KafGraph.

The agent then decides whether additional memory is required. It issues tool calls using ordinary JSON messages. A representative brain_searchcall might look for entities matching certain properties or traversing specific relationship types. Because KafGraph is transactional, the result reflects all captures committed by any agent up to that point in the stream.

For more complex reasoning the agent can issue brain_recall to retrieve a full connected subgraph or brain_capture to assert new entities and relationships that emerged during its analysis. These captures are written with the same correlation identifier so that downstream agents or auditors can trace the provenance exactly.

With enriched context the agent performs its core reasoning. LLM-based agents receive a carefully structured prompt containing the retrieved graph data, previous decisions in the chain, and explicit instructions to emit a decision event. Code-based agents may execute optimization routines or policy evaluations against the same context. The heterogeneity is intentional. Different problems are best solved by different agent types, all collaborating through the shared stream and memory layer.

Once a decision is reached the agent publishes a new event to a designated output topic. The event includes the decision, confidence indicators if applicable, the list of tool calls made, and full trace context. Simultaneously any new facts discovered are captured back into KafGraph.

This pattern integrates naturally with emerging standards such as the Agent2Agent protocol. Protocol messages can be serialized into Kafka events rather than exchanged through synchronous HTTP. The result gains Kafka's delivery guarantees, built-in retry semantics through consumer rebalancing, and the ability for any authorized party to observe the complete conversation history by consuming the topic. Multiple analyses including work from Confluent have explained why this event-driven approach is necessary for scalable A2A implementations.

KafSIEM operates as a passive consumer of both business events and decision events. It builds a parallel graph optimized for relationship traversal and provenance tracking. Every edge in a KafSIEM graph links back to the exact source events and the specific agent version that asserted it. This capability supports rigorous incident review and compliance processes. Related patterns are explored in our work on decentralized data analytics and AI for national defense.

The architecture is deliberately open. Existing Kafka producers require no modification. Legacy systems can publish events through simple adapters or connectors. New agents can be introduced without touching other components. The only shared contracts are event schemas and the standardized tool call interface.

Observability is first class. Standard Kafka metrics are supplemented by agent-specific counters for tool call volume, decision latency, graph mutation rate and correlation completion times. Operators can trace any decision back through the exact sequence of events and memory states that produced it.

This design rests on solid foundations. The immutable log eliminates many classes of reconciliation error. Transactional memory updates ensure consistency without central locking. Explicit correlation identifiers provide end-to-end traceability without a central orchestrator. The combination creates a system that grows more capable as more events and agents are added.

‍

Concrete Implementation Details

Adopting the Decision Fabric requires attention to several implementation practices that have proven effective across client deployments.

Event schema governance is the starting point. We use a schema registry with Avro or JSON Schema definitions that enforce backward compatibility. Topic naming follows a consistent pattern that encodes business domain, entity type, action and version. This convention enables both human comprehension and automated routing rules inside KafClaw agents.

A representative agent configuration demonstrates the simplicity of the runtime.

agent:
  id: credit-risk-evaluator-v3
  consumer_group: risk-decision-group-2026
  input_topics:
    - transactions.credit.applications.v2
    - customer.credit.score.updated.v1
  output_topic: decisions.credit.approval.v2
  audit_topic: audit.risk.decisions.v2
  memory_tools:
    enabled: true
    allowed:
      - brain_search
      - brain_recall
      - brain_capture
      - brain_update
  llm:
    provider: anthropic
    model: claude-3-5-sonnet
    max_tokens: 4096
    temperature: 0.1
  tracing:
    enabled: true
    propagator: opentelemetry

‍

The KafClaw binary is deployed as a container. It manages the consumer loop, context assembly, tool routing and response publishing. Dead letter handling, retry backoff and circuit breakers are included in the runtime.

Tool calls follow a standardized JSON contract. The example below shows a brain_capture operation that records a new risk signal and its supporting relationships.

{
  "tool_call_id": "tc_9f3k2m8p",
  "tool": "brain_capture",
  "correlation_id": "corr_8x4p9v2q7w",
  "span_id": "span_3d7f9a1c",
  "arguments": {
    "entities": [{
      "type": "RiskSignal",
      "id": "rs_7749201",
      "attributes": {
        "signal_type": "velocity_anomaly",
        "severity": "high",
        "detected_at": "2026-05-16T10:23:45Z"
      }
    }],
    "relationships": [{
      "source": "Customer:cust_118472",
      "type": "TRIGGERED",
      "target": "RiskSignal:rs_7749201",
      "attributes": {
        "confidence": 0.87,
        "evidence_events": ["txn_449281", "txn_449295"]
      }
    }]
  }
}

‍

KafGraph processes the capture inside a transaction and makes the new nodes and edges immediately queryable by other agents sharing the correlation.

Deployment uses standard Kubernetes manifests and Helm charts provided in the open source repositories. A production cluster for a medium-to-large enterprise might include a KafScale tier with 12-24 stateless brokers, multiple KafClaw deployments segmented by domain, a partitioned KafGraph cluster, and dedicated KafSIEM instances for security teams. Because brokers are stateless, storage capacity is independent of compute and can be scaled by adjusting retention policies on the S3 bucket.

Local development is supported through a single-command Kind cluster that spins up all components with realistic sample event streams. This allows developers to iterate on new agent logic before promoting to shared environments.

Integration with existing Confluent or self-managed Kafka is seamless. KafScale can mirror topics or act as a long-term storage tier. Agents can consume from both legacy and Decision Fabric topics within the same process using standard Kafka consumer configuration.

These patterns have been refined through repeated production deliveries. The open source repositories contain the exact manifests, example agents and monitoring dashboards used in client environments.

‍

Honest Trade-offs

The Decision Fabric is a powerful pattern but it is not appropriate for every situation. Understanding its limitations is essential for responsible adoption.

Graph query performance for complex traversals, while excellent compared to earlier generations of knowledge graphs, remains higher latency than simple key-value lookups or pure vector similarity search. For use cases where approximate semantic retrieval on unstructured documents is the dominant need and data freshness is secondary, a dedicated vector store used in conjunction with the Decision Fabric is often the pragmatic choice. We frequently deploy both.

The operational skill profile required is broader than for simpler RAG architectures. Teams must understand Kafka operations, event schema evolution, graph data modeling, tool-calling patterns and distributed tracing. Organizations without prior streaming experience should budget for knowledge transfer or partnership support.

Initial event taxonomy and graph schema design require investment. Poor partitioning choices or overly generic entity types can lead to hot spots or difficult-to-query graphs. Iterative refinement is necessary, just as with any data platform.

Not every workload benefits. Agents performing purely creative tasks against static knowledge bases or batch analytical jobs against historical aggregates may be better served by traditional ontology or lakehouse approaches. The Decision Fabric shines when decisions must reflect the absolute latest business state and when multiple agents must collaborate with shared context.

The shared graph itself, while transactional, can experience contention under extreme write velocity. Careful design of correlation keys and occasional repartitioning are required, similar to any distributed database.

Exactly-once semantics, while supported through idempotent consumers and transactional producers in Kafka, still require careful application-level handling for certain agent state transitions. Teams must design for at-least-once delivery with appropriate deduplication logic.

Finally, while the components are fully open source, running them at scale still requires platform engineering investment. Enterprises must either build that capability internally or engage experienced partners.

These constraints are not hidden. In our consulting work we have advised clients with immature streaming foundations to begin with simpler patterns and evolve toward the Decision Fabric once foundational capabilities are in place. The pattern is most valuable when event velocity is high, decisions carry material business or regulatory consequence, and multi-agent coordination is required.

‍

Measurable Outcomes

Clients that have implemented the Decision Fabric report several consistent operational improvements.

Decision cycles compress because agents operate on events at the moment they are produced rather than waiting for downstream synchronization. What previously required batch jobs or complex ETL now happens in near real time.

Audit and compliance processes are simplified. Every decision carries its complete provenance chain through the event log and the transactional graph captures. KafSIEM turns this raw data into queryable evidence graphs that incident responders and auditors can traverse with confidence.

Responsibility is better aligned. Domain teams own the agents and event contracts for their part of the business. The central platform team provides the Decision Fabric substrate rather than owning a monolithic ontology. This reduces cross-team coordination tax and accelerates delivery.

Debugging velocity increases. When unexpected behavior occurs, operators replay the precise sequence of input events against the updated agent logic. Root cause analysis that once took days of log correlation is reduced to hours of replay.

The system scales more predictably. Adding new agents or increasing event volume does not automatically increase load on a central query tier. Capacity grows with the business event rate.

Teams also adopt an event-first mindset. New capabilities are designed as reactions to streams rather than as CRUD services. This cultural shift compounds the technical advantages over time.

These results align with industry movement toward event-driven, observable agentic architectures documented by Confluent, Red Hat, IBM and academic research on neurosymbolic systems.

‍

Next Step

Schedule a one-hour architecture review with a Scalytics solutions architect. In that session we will map your existing event streams, identify the highest leverage domains for an initial Decision Fabric implementation, and outline the concrete steps required to deploy KafClaw agents against your current Kafka infrastructure using the open source components.

The complete set of Apache 2.0 projects including KafScale, KafGraph, KafClaw and KafSIEM is available for immediate exploration at scalytics.io/open-source. Working examples, Helm charts and local development environments are included so your team can begin experimentation on day one.

Contact us to arrange your review. We act as a delivery partner focused on production outcomes rather than technology evangelism. Our goal is to help engineering leaders and platform architects build agentic systems that are robust, observable and aligned with the realities of enterprise data infrastructure.

About Scalytics

Scalytics architects mission-critical streaming, federated execution, and sovereign AI systems. We help defense, infrastructure, and regulated organizations turn real-time data streams into trusted decisions reliably and under production load.
Our founding team created Apache Wayang, the federated execution framework that lets computation run where the data lives and dramatically reduces unnecessary data movement.
We also built and maintain kafSCALE, a high-performance, Kafka-compatible streaming platform designed for Kubernetes and object storage. It delivers elastic scale without broker complexity or lock-in.

‍Our mission: Keep data in place. Bring compute to the data. Enable secure, sovereign, and production-ready AI operations.