DLQ AI Agent: Intelligent Dead Letter Queue Routing

Dr. Mirko Kämpf

Rethinking Dead Letter Queues in Regulated Data Environments

Dead Letter Queues (DLQs) serve as a critical safeguard in modern enterprise architectures. When upstream systems fail to process messages due to errors, such as schema violations, or transient service outages, a Dead Letter Queue provides a structured fallback mechanism to capture the error. However, in legal and compliance-sensitive domains, the backlog of unprocessed messages introduces operational risk. Even more important, managing the error handling in a timely manner and in a compliant way drives excessive overhead, often including manual reviews.

The Scalytics Assist DLQ Agent addresses this challenge by combining structured stream processing with verifiable AI-based routing. Initially it is built to operate within Kafka-based or Kafka-compatible environments, and it can be adapted to any event driven environment in any enterprise context on-premise or in the clouds.

It provides the digital business with an intelligent, auditable solution to manage and triage DLQ messages with minimal manual intervention independently from operations teams at no additional IT workload.

System Architecture: Modular & Auditable, and Auto-Deployable

The DLQ Agent architecture comprises two core components:

  • DLQ Agent(s): Stateless containerized agent services that consume from an Apache Kafka DLQ, analyze the messages using Large Language Models (LLMs), and route them to downstream topics for reprocessing or into a human-review queue.

  • Management System: Includes a REST-API-based core system, for monitoring the backend and a web frontend that visualizes routing outcomes and system state.

The system’s deployment follows established enterprise patterns. The agent runs in Docker containers or Kubernetes Pods and can be managed via Docker Compose or Kubernetes, with configurations provided via ConfigMaps or environment variables. Apache Kafka compatibility adheres to Confluent Platform and Confluent Cloud standards but there is no lock in to Apache Kafka for our generic streaming processing components.

Message Processing: From Ingestion to Decision Log

Each message follows a verifiable transparent AI-augmented decision-making pipeline:

  1. Ingestion: Messages are consumed from a designated DLQ Kafka topic.

  2. LLM Routing: A structured prompt is generated and passed to a locally available LLM to determine the appropriate routing path (without 3rd party data exposure):
    • REPROCESS for retryable issues (e.g., timeouts, downstream unavailability).
    • HUMAN_REVIEW for complex or uncertain cases (e.g., business rule violations).

  3. Forwarding: Based on the decision, the message is published to the corresponding Kafka topic.

  4. Auditability: A wrapper event (“Wrapper Event”) containing the raw message, the prompt, and the LLMs response is written to a Kafka decision log topic.

This process ensures traceability and creates a foundation for future supervised learning, without compromising data governance.

Kafka Message Process Flow
Kafka Dead Letter Queue Process Flow
A message is received from the INPUT Kafka topic and enters the business process
The business process fails to handle the message, sending it to the dlq (Dead Letter Queue)
The DLQ receives the failed message for triage
Message is routed to the reprocess-topic for automated reprocessing
Message is routed to the human-review-topic for manual inspection
Messages from both reprocessing and review topics rejoin the business flow, leading to the output
After routing and potential review or reprocessing, the message is emitted to the OUTPUT Kafka topic

Governance and Monitoring: Visibility Without Compromise

The DLQ Agent’s management interface is designed for observability and operational assurance. It includes:

  • Core API Service (FastAPI): Exposes agent connectivity, message processing statistics, and topic configurations via a REST API.

  • Web Frontend: Provides a dashboard view of input/output topic flows, message volume breakdowns, and system configuration. Initial versions are read-only and assume deployment within trusted internal networks.

All data communication adheres to internal routing policies defined in Scalytics Connect, maintaining compliance boundaries while supporting transparent operations.

Infrastructure Anchored in Scalytics Connect

The DLQ Agent runs on top of Scalytics Connect, the secure AI infrastructure layer that keeps inference and metadata routing within the enterprise trust boundary. Unlike externalized inference pipelines that rely on outbound API calls, Scalytics Connect offers:

  • Close-to-data execution: No need to expose payloads outside your controlled environment.

  • Enterprise-first controls: Consistent with security, audit, and compliance policies.

  • Adaptability: Scalytics Assist components can evolve from prompt-based LLM agents to fully trainable, supervised systems.

Future Roadmap: From Prompt Engineering to Adaptive Agents

The current DLQ Agent provides an operational baseline, offering rule-aligned AI support for message triage. Future iterations will enhance both intelligence and control:

  • Live reconfiguration of routing logic

  • Trust scoring and decision confidence reporting

  • Training feedback loops using decision logs

As part of the broader Scalytics Assist framework, this component exemplifies how modular agents can extend proactive intelligence across an event-driven enterprise—securely, observably, and with operational resilience.

Strategic Outlook: AI Infrastructure for Regulated Domains

The Scalytics Assist DLQ Agent underscores a broader shift: AI systems in regulated enterprises must be verifiable, auditable, and secure by design. By embedding AI-powered decisions directly into Kafka pipelines—while preserving governance and traceability—Scalytics Connect enables enterprise teams to modernize streaming operations without sacrificing oversight.

As legal, compliance, and IT stakeholders increase scrutiny over AI adoption, systems like the DLQ Agent set a new bar for AI readiness: not just intelligent, but accountable

Scalytics continues to invest in infrastructure that meets the operational and legal realities of enterprise AI.

About Scalytics

Scalytics builds on Apache Wayang, the cross-platform data processing framework created by our founding team and now an Apache Top-Level Project. Where traditional platforms require moving data to centralized infrastructure, Scalytics brings compute to your data—enabling AI and analytics across distributed sources without violating compliance boundaries.

Scalytics Connect provides federated data processing across Spark, Flink, PostgreSQL, and cloud-native engines through a single abstraction layer. Our cost-based optimizer selects the right engine for each operation, reducing processing time while eliminating vendor lock-in.

Scalytics Copilot extends this foundation with private AI deployment—running LLMs, RAG pipelines, and ML workloads entirely within your security perimeter. Data stays where it lives. Models train where data resides. No extraction, no exposure, no third-party API dependencies.For organizations in healthcare, finance, and government, this architecture isn't optional—it's how you deploy AI while remaining compliant with HIPAA, GDPR, and DORA.Explore our open-source foundation: Scalytics Community Edition

Questions? Reach us on Slack or schedule a conversation.
back to all articles
Unlock Faster ML & AI
Free White Papers. Learn how Scalytics Copilot streamlines data pipelines, empowering businesses to achieve rapid AI success.

Scalytics Copilot:
Real-time intelligence. No data leaks.

Launch your data + AI transformation.

Thank you! Our team will get in touch soon.
Oops! Something went wrong while submitting the form.