Architecting Production Agentic AI with the Model Context Protocol

Alexander Alten
//
CTO & co-founder
//
June 1, 2026

Bottom Line

The Model Context Protocol standardizes the way agents discover and interact with tools, resources and prompts. Its value in enterprise settings however depends entirely on the data and execution substrate beneath it. Many initial implementations replicate patterns from batch analytics or monolithic services and encounter the same limitations at scale.

A Decision Fabric provides the robust alternative. This Kafka-native substrate makes every business-relevant event observable in real time. Agents and policies consume the stream where it is produced. Every decision is emitted back as a new event. No central ontology requires synchronization. The immutable ordered log serves as the single source of truth. Turn streams into decisions.

Delivered through the Apache 2.0 licensed Scalytics open-source stack of KafScale for transport, KafGraph for shared memory, KafClaw for the agent runtime and KafSIEM for link analysis, this architecture directly addresses the recurring production issues of data locality, observability and cost. In Scalytics client deployments it has enabled agentic systems that remain responsive, fully auditable and economically sustainable as scale increases. Teams evaluating production agentic AI should begin with data topology mapping and event-first design rather than centralized models.

Why this matters now

Adoption of the Model Context Protocol has accelerated since its introduction by Anthropic in November 2024. The protocol offers a universal client-server interface that replaces bespoke connectors for databases, file systems, APIs and workflows. MCP clients such as Claude or custom agent frameworks can dynamically discover available tools and resources, invoke them securely and receive structured responses. This capability has lowered the barrier to building capable agents.

Yet production reality diverges from pilot success. Many engineering teams initially apply familiar patterns from batch processing or request-response architectures. These assumptions collide with the distinct constraints of agentic workloads. Three issues surface repeatedly.

First, data locality assumptions break under federation. Agents frequently need access to data residing in multiple systems, some on-premises, some in cloud object storage, some in operational databases. Pulling everything to a central location for processing incurs high egress charges and introduces staleness. Non-deterministic agent behavior can amplify these costs because a single task may trigger dozens of tool calls.

Second, observability gaps appear when agents span multiple runtimes and languages. A typical production workflow might involve a Python LLM agent, a Java policy engine, a Rust microservice and shell-based utilities all collaborating. Tracing decisions, tool calls, memory updates and outcomes across these boundaries becomes complex without a unifying backbone.

Third, cost models fail to account for network egress and token consumption driven by exploratory tool use. As noted in analyses of enterprise AI spend, unpredictable agent loops can erode margins rapidly when each tool invocation crosses availability zones or cloud boundaries.

These challenges are well documented. The official MCP documentation and Anthropic's announcement emphasize the protocol's role in connecting AI to live systems. Complementary research on agentic AI platforms highlights data locality, federation needs, observability and cost attribution as the dominant production barriers. Federated data processing frameworks such as Apache Wayang were designed precisely to execute computations where data already resides, avoiding unnecessary movement.

In Scalytics engagements with platform engineering teams we have seen these patterns across finance, critical infrastructure and energy sectors. Initial MCP servers connected directly to operational stores worked for small-scale experiments but required significant reconciliation logic and generated unexpected cloud bills once agent volume and exploration depth increased. The shift to event-first designs has consistently reduced these operational burdens.

The window for architectural decisions is narrow. MCP makes agent construction easier. The foundational data and coordination layer chosen today determines whether those agents deliver reliable business value or remain confined to controlled sandboxes. Event-driven, federated architectures that treat streams as the source of truth scale more naturally with both data volume and agent count.

The Decision Fabric

The Decision Fabric is a distinct architectural category for agentic systems. It is a Kafka-native substrate where every business-relevant event is observable in real time, agents and policies consume the stream where it is produced, and every decision is emitted back as a new event. No central ontology exists to be kept in sync. No periodic snapshot reconciliation occurs. The immutable, ordered log of events and decisions constitutes the authoritative record. Its guiding principle is to turn streams into decisions.

This category is realized through four complementary open-source components, all licensed under Apache 2.0. KafScale serves as the transport and durability backbone. It is an S3-native, Kafka-compatible streaming platform whose stateless brokers on Kubernetes flush immutable segments directly to object storage. Existing Kafka clients, Confluent tooling and administrative utilities continue to work unchanged. The design removes practical limits on retention while maintaining compatibility. Its positioning is captured by the tagline One Endpoint. Infinite Scale.

KafGraph supplies the shared-memory layer essential for coherent multi-agent collaboration. Traditional vector databases excel at similarity search yet lack strong transactional guarantees, shared write semantics and rich traversal. KafGraph is a distributed, queryable knowledge graph implemented in Go and backed by BadgerDB. It exposes seven well-defined JSON-schema tool calls such as brain_search, brain_recall and brain_capture. These tools are accessible over HTTP or routed through the agent runtime. The graph supports OpenCypher queries and the Bolt v4 protocol. Its tagline is Infinite Memory for AI Agents. Updates are transactional so that facts captured by one agent become immediately visible to others sharing correlation context.

KafClaw is the agent runtime. It is a lightweight Go binary that coordinates heterogeneous agents written in any language that can produce and consume Kafka messages. Typed JSON envelopes carry correlation identifiers, trace spans, request/response channels, memory operations and audit records. LLM-based agents, deterministic policies, Python services, Rust components and even shell scripts participate uniformly. Agents subscribe to input topics, assemble context from events and graph memory, reason or compute, and publish decisions as new events. Its tagline is AI Agents under your command.

KafSIEM completes the fabric with security and link-analysis capabilities. It consumes detector alerts, agent decisions and operational events from the stream and constructs auditable entity graphs with full provenance for every relationship. Built for defense and critical-infrastructure incident response, it prioritizes citable evidence chains over generic dashboards. It can run with embedded SQLite or scale further and outputs RFC 7946 GeoJSON for geospatial cases. Its tagline is Every edge has a citation.

When composed, these components create a closed loop. Agents update shared memory transactionally as part of reasoning. Decisions become immutable events. Subsequent consumers see the enriched stream. The entire history supports replay, audit and continuous improvement. This inverts the traditional model in which agents query a central store. Instead agents become co-authors of the shared reality recorded on the log. The architecture aligns closely with the Model Context Protocol. MCP servers can expose the tools, resources and prompts of the Decision Fabric in a standardized, discoverable way. Clients connect once and gain access to real-time events, graph memory and agent coordination without custom per-system integrations.

This definition and implementation have been refined through multiple enterprise deployments. It directly contrasts with centralized ontology approaches discussed in our earlier analysis on event streams versus centralized ontologies for agentic AI. The Decision Fabric preserves freshness, scalability and auditability that centralized models struggle to maintain.

1. Ingestion

KafScale Substrate

Business systems publish real-time events to topics. S3-native architecture provides an immutable, ordered log.

topic: customer.profile.v1
2. Protocol

MCP Server

A secure gateway replacing bespoke connectors. Clients dynamically discover tools via standard JSON schemas.

"method": "brain_search"
3. Execution

KafClaw + KafGraph

Runtime coordinates agents while shared memory updates state. Wayang federates queries to prevent data egress.

query -> No Data Movement
4. Auditability

KafSIEM

Every decision loops back as a new event. Entity graphs are built with complete provenance back to the source.

output: RFC 7946 GeoJSON

How it works

The operational flow begins with business systems and sensors publishing events to KafScale topics using domain-driven naming conventions such as customer.profile.updated.v1 or order.fulfilled.v2. These events carry authoritative facts at the time they occurred. Retention is effectively unlimited because segments reside in S3.

An MCP server runs alongside the streaming platform. In the KafScale implementation this server is deployed as a separate service exposing read-only tools by default for security. It registers capabilities such as cluster_status, list_topics, describe_configs, cluster_metrics and crucially the Decision Fabric tools brain_search, brain_recall, brain_capture and others. MCP clients discover these tools dynamically through the standardized protocol.

When an agent activates, typically orchestrated by KafClaw, it receives events from its input topics. Context assembly draws from three sources: the triggering event, prior messages sharing the same correlation identifier, and the current state of the relevant subgraph in KafGraph. The agent issues tool calls using the MCP interface or the native JSON tool format. These calls are routed securely. Graph operations are transactional. A brain_capture executed by one agent updates the shared memory in a way that is immediately visible to peer agents processing related events.

For computation that spans multiple data sources or engines, Apache Wayang enters the picture. Rather than moving data to a single processing engine, Wayang constructs an optimized execution plan that pushes operators to the platforms where data already resides. A query that needs to join Kafka events with data in PostgreSQL and an Iceberg catalog on S3 can execute portions in the Kafka processor, the database and a Spark or Flink runner without unnecessary data copying. This federated approach directly mitigates egress costs and latency. The Scalytics extensions to Wayang add enterprise governance, security and integration with the Decision Fabric event model.

KafClaw manages the heterogeneous agent landscape. A single correlation ID ties together an entire decision chain even if it involves multiple specialized agents. Each publishes intermediate results or final decisions to output topics. KafSIEM consumes a subset of these events to build its provenance graph. Every edge in that graph links back to the exact source events, agent version, tool calls and memory state that produced it. Auditors or incident responders can traverse the graph with full citation.

Observability is native. The immutable log records every step. Standard Kafka metrics are augmented with agent-specific telemetry for tool call frequency, decision latency, graph mutation rates and correlation completion. MCP instrumentation captures session details, tool invocations and responses. Operators can reconstruct any agent decision path by replaying the relevant topic partitions from any historical point.

This design solves the three core production problems identified earlier. Data locality is respected because Wayang executes near the data and agents consume events at their source. Observability spans runtimes because all coordination and memory operations flow through the standardized event and tool interfaces recorded on the log. Cost control improves because unnecessary data movement is minimized, tool calls are auditable and can be rate-limited or prioritized, and expensive LLM invocations can be guided by precise context from KafGraph rather than repeated exploratory queries.

The architecture remains open. Existing Kafka producers require no changes. Legacy systems can feed events through connectors. New agents or tools register via the MCP server or Kafka topics. The only shared contracts are event schemas, tool definitions and correlation conventions. This loose coupling allows independent evolution of components while maintaining system-wide coherence.

Implementation

Adopting this pattern requires disciplined practices around schemas, configuration, tool registration and deployment. Event schema governance comes first. A schema registry enforces backward-compatible Avro or JSON Schema definitions. Topic names encode domain, entity, action and version. This convention powers automated routing inside KafClaw agents and simplifies KafSIEM graph construction.

The MCP server for KafScale is enabled through Helm values. A minimal configuration looks as follows.

mcp:
  enabled: true
  auth:
    token: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."  # short-lived, rotated
  etcdEndpoints:
    - "http://etcd.kafscale.svc:2379"
  metrics:
    brokerMetricsURL: "http://kafscale-broker:9093/metrics"
  tools:
    allowed:
      - cluster_status
      - list_topics
      - describe_topics
      - cluster_metrics
      - brain_search
      - brain_recall
      - brain_capture
  security:
    readOnly: true
    auditEnabled: true

Once deployed, an MCP client connects to the /mcp endpoint using the standardized transport (typically SSE over HTTP). Discovery returns the available tools with their JSON schemas. An agent can then invoke brain_search with a payload such as the following example (simplified).

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "brain_search",
  "params": {
    "query": "financial-risk high-value customer last-30d",
    "correlationId": "corr-uuid-1234",
    "limit": 20
  }
}

The server routes the call to kafGRAPH, executes the query against the transactional graph, and returns results enriched with provenance. Similar patterns apply for capture operations that assert new entities or relationships with full attribution to the calling agent and correlation.

kafCLAW agents are configured declaratively. A representative manifest for a credit-risk evaluation agent follows.

agent:
  id: credit-risk-evaluator-v3
  consumerGroup: risk-decision-group-2026
  inputTopics:
    - transactions.credit.applications.v2
    - customer.credit.score.updated.v1
  outputTopic: decisions.credit.approval.v2
  auditTopic: audit.risk.decisions.v2
  memoryTools:
    enabled: true
    allowed:
      - brain_search
      - brain_recall
      - brain_capture
  llm:
    provider: anthropic
    model: claude-3-5-sonnet
    temperature: 0.0
  tracing:
    enabled: true
    propagator: opentelemetry

The runtime handles consumer loops, context assembly from events and graph, tool invocation over MCP or native channels, reasoning, decision emission and memory updates. Dead-letter handling, retry semantics and exactly-once guarantees where required are inherited from the Kafka substrate.

For federated computation, a Wayang plan might be registered as an MCP tool or invoked from within an agent. The optimizer evaluates cost models and data location to decide whether to push a filter into a Kafka processor, a join into PostgreSQL or an aggregation into a Flink job running against S3-resident segments. Scalytics extensions add authentication, audit hooks and output event emission so that computed results also become first-class events on the Decision Fabric.

Deployment follows Kubernetes-native patterns. KafScale brokers, the MCP service, KafClaw fleets, KafGraph instances and KafSIEM components are all managed via operators or Helm. Monitoring aggregates Kafka metrics, agent telemetry, graph query latency and Wayang execution plans. Alerting triggers on anomalous tool-call patterns or correlation completion delays.

These implementation artifacts have been refined across client projects. The combination of MCP standardization, event-driven coordination and federated execution removes the custom glue code that typically accumulates in agentic platforms.

Trade-offs

No architecture is without compromise. The Decision Fabric with MCP excels at scale, freshness and auditability yet presents specific limits that teams must evaluate against their requirements.

The S3-native design of KafScale delivers essentially unlimited retention and high durability. Fetch latency however averages around 500 milliseconds on cache misses. This is acceptable for the majority of agentic use cases where decisions incorporate minutes or hours of history. It is unsuitable for sub-100-millisecond control loops or high-frequency trading. Teams with such requirements should consider complementary low-latency stores for hot data while still emitting events to the fabric for audit and long-term memory.

KafGraph provides powerful transactional shared memory and OpenCypher expressiveness. Query complexity must be managed. Poorly written traversals or excessive concurrent mutations can create hot spots. Schema design for entity relationships requires upfront thought similar to any graph model. While the seven tool calls abstract much of this complexity, teams new to graph thinking may invest time in modeling before seeing full value. In practice we have found that starting with a narrow set of high-value entities and relationships and expanding iteratively works best.

Agent heterogeneity through KafClaw is a strength for matching the right tool to the problem. It also increases the surface for versioning and compatibility. Standardized envelopes and schemas mitigate this but discipline around contract evolution remains necessary. Non-deterministic LLM agents in particular require the audit events and memory captures to be comprehensive. Without them, reproducing a decision path for debugging or compliance becomes difficult.

Federated execution with Apache Wayang avoids data movement and associated egress but adds plan optimization overhead. For very simple single-engine tasks the optimizer may introduce minor latency compared to a direct call. The benefit materializes when queries span disparate systems or when data volume makes centralization expensive. The trade-off tilts strongly toward federation as the number of data sources or agent count grows.

MCP itself, while standardizing discovery and invocation, does not eliminate the need for strong authentication, rate limiting and tool allow-lists. The KafScale MCP server ships read-only by default and requires explicit configuration for any mutation capabilities. Production deployments layer OIDC, short-lived tokens, RBAC and dry-run modes. Teams that treat MCP as a simple REST replacement without these controls expose themselves to prompt-injection or runaway tool-call risks.

Finally, while the Decision Fabric reduces reliance on centralized ontologies, some semantic consistency layer is still valuable for enterprise-wide concepts such as customer identifiers or risk taxonomies. The fabric complements rather than completely replaces carefully governed reference data. The optimal systems combine the real-time event stream with selective, slowly changing reference views consumed through the same MCP interface.

These trade-offs are not theoretical. They have surfaced in every Scalytics engagement that moved agentic workloads into production. Honest evaluation against concrete workload profiles, latency budgets, compliance requirements and cost targets is essential. The architecture shines when freshness, auditability and independent scaling matter more than single-digit millisecond responses.

Outcomes

Organizations that implement the Decision Fabric with MCP report several consistent operational and economic improvements.

Decision quality increases because agents operate on fresher context drawn directly from the event stream and transactional graph memory. Staleness-related errors decline. Replayability allows new agent versions or policies to be validated against historical event sequences before promotion, shortening the safe deployment cycle.

Operational overhead decreases. Platform teams spend less time maintaining synchronization jobs, conflict-resolution workflows or central model versioning. Observability is unified through the log and correlation identifiers. Incidents can be reconstructed by consuming the audit topics and traversing the KafSIEM graph. Root cause analysis that once required weeks of log correlation now completes in hours.

Cost attribution and control improve markedly. Federated execution with Wayang keeps processing near data, reducing egress. MCP tool calls and agent decisions are explicitly logged, enabling precise per-agent, per-tool spend tracking. Teams can implement budgets, prioritization and circuit breakers at the fabric level rather than inside individual agents.

Risk reduction is significant for regulated industries. Every decision carries provenance linking back to source events, tool outputs, memory state and agent logic version. KafSIEM graphs provide citable evidence for auditors or incident responders. The immutable nature of the stream supports point-in-time reconstruction required for compliance.

Scalability characteristics change. Agent count can grow horizontally because consumption is partitioned naturally by Kafka consumer groups. Memory operations scale with the graph cluster. Compute for complex analysis scales with the federated engines chosen by the Wayang optimizer. The system becomes more capable as more events and agents join rather than slower.

These outcomes align with the positioning of Scalytics as a delivery partner for Kafka-native production agentic AI infrastructure. The open-source nature of the components allows teams to begin experimentation immediately while retaining the option to engage experienced architects for production hardening, governance and integration with existing enterprise systems. Further patterns are explored in our category content on streaming MCP implementations.

Next step

For engineering leaders and platform architects evaluating these patterns the concrete next action is to schedule a focused architecture review. Our teams bring experience from multiple production Decision Fabric deployments and can map your specific data topology, agent workloads, compliance constraints and cost drivers to the appropriate implementation choices.

We also maintain a growing collection of Apache 2.0 licensed components and reference architectures. Exploring these at scalytics.io/open-source provides a low-friction starting point for hands-on evaluation of kafSCALE, the MCP server, kafGRAPH tool calls and the full fabric.

Whether your organization is deepening an existing Kafka investment or building its first production agentic platform, the foundational decisions made now will determine long-term success. Contact us to begin the conversation.

About Scalytics

Scalytics architects and troubleshoots mission-critical streaming, federated execution, and AI systems for scaling SMEs. We help organizations turn streams into decisions - reliably, in real time, and under production load. When Kafka pipelines fall behind, SAP IDocs block processing, lakehouse sinks break, or AI pilots collapse under real load, we step in and make them run.

Our founding team created Apache Wayang (now an Apache Top-Level Project), the federated execution framework that orchestrates Spark, Flink, and TensorFlow where data lives and reduces ETL movement overhead.

We also invented and actively maintain KafScale (S3-Kafka-streaming platform), a Kafka-compatible, stateless data and large object streaming system designed for Kubernetes and object storage backends. Elastic compute. No broker babysitting. No lock-in.

Our mission: data stays in place. Compute comes to you. From data lakehouses to private AI deployment and distributed ML - all designed for security, compliance, and production resilience.

Questions? Join our open
Slack community or schedule a consult.
back to all articles
Unlock Faster ML & AI
Free White Papers. Learn how Scalytics  streamlines agentic data pipelines, enabling businesses to achieve rapid AI success.

The experts for mission-critical infrastructure.

Launch your data + AI transformation.