Breaking the SAP IDoc monolith: from file payloads to Enterprise Object Streaming

Dr. Mirko Kämpf

We spent two years dumping SAP IDocs as monolithic files into Kafka. Seemed like the obvious path. SAP sends XML. Kafka stores messages. Ship the whole thing. Let consumers figure it out.

That approach broke during peak season. Order volume spiked 3x. Our Kafka consumers fell 2 hours behind. Billing stalled. Warehouse couldn't see inventory updates. Support tickets piled up. The root cause? We were treating structured business documents like opaque file blobs.

Here's what we learned tearing down that monolith and rebuilding it the right way.

The Files-as-Payloads Trap

SAP IDocs carry real business meaning. An ORDERS05 IDoc isn't just XML. It's an order header, line items, pricing rules, partner data, delivery schedules. Hierarchical structure with dozens of segment types.

Our original Kafka setup:

  • One topic: sap.idocs
  • One message = one complete IDoc XML file
  • Consumers parse full document, extract what they need
  • Average payload: 3.2MB during our incident
  • Parse time: 18 seconds per message

The billing service needed invoice totals. Nothing else. But it pulled down complete 5MB INVOIC02 IDocs, parsed every segment, grabbed one field. Pure waste.

Before: The Monolithic Blob

SAP IDoc (3.2MB XML)
Single Topic: sap.idocs
Every Consumer Parses Everything

Result: High CPU, massive lag, "all or nothing" processing.

After: The Exploded Pattern

SAP IDoc
SMT Parser
↙ ⬇ ↘
Header
Items
Pricing

Result: Selective consumption, sub-second lag, tiny payloads.

The Performance Shift

Metric Monolithic Pattern Exploded Pattern
Peak Consumer Lag 2 Hours < 30 Seconds
Avg. Message Size 3.2 MB 1.8 KB
Schema Changes 2 Weeks Same Day
Analytics Replay 2 Hours 5 Minutes

The warehouse service wanted inventory movements. Got flooded with pricing updates, order headers, everything. Spent CPU filtering noise.

During our peak failure:

  • Consumer lag hit 8,400 messages
  • Processing fell 2 hours behind real-time
  • Recovery took 4 hours
  • Business impact: delayed billing, inventory sync failures

We were shipping files, not events. Kafka became a glorified file system instead of an event bus.

"Our Kafka consumers fell 8,400 messages behind. Recovery took 4 hours. We weren't running a real-time event bus; we were running a very expensive, very slow distributed file system."

What Breaking It Down Actually Looks Like

The fix wasn't subtle refactoring. We tore out the file-as-message pattern completely.

New flow:

SAP ECC
  ↓ IDoc XML
Kafka Connect (custom transform)
  ↓ Parse + Explode segments
Multiple domain-specific topics:
  - sap.orders.header (E1EDK01 segments)
  - sap.orders.items (E1EDP01 segments)  
  - sap.orders.pricing (E1EDP26 segments)
  - sap.invoices.header (E1EDK02 segments)
  ↓ Partition by document number
Consumers subscribe selectively

Each segment type gets its own topic. Billing subscribes to sap.invoices.* only. Warehouse reads sap.inventory.*. Analytics pulls line items without headers.

We built a Kafka Connect Single Message Transform that parses IDocs at ingestion. One IDoc in, many segment events out. Each segment carries context (document number, IDoc type) but stands alone.

The parsing logic in Python (we prototyped before writing the Java SMT):

import xml.etree.ElementTree as ET

def explode_idoc(idoc_xml, producer):
    root = ET.fromstring(idoc_xml)
    docnum = root.find('.//DOCNUM').text
    
    for segment in root.findall('.//E1*'):
        segment_type = segment.tag
        fields = {child.tag: child.text for child in segment}
        fields['DOCNUM'] = docnum
        
        topic = f"sap.{segment_type.lower()}"
        producer.send(topic, key=docnum, value=fields)

We use the document number as the message key. Kafka partitions by key. All segments from one order land in the same partition, same consumer. Ordering guarantees preserved within a document.

Schema Registry Saves Us From Breaking Changes

Every segment type has an Avro schema. Order headers look different from line items look different from pricing. When SAP adds a field to headers, only header consumers care.

Order header schema:

{
  "type": "record",
  "name": "OrderHeader",
  "namespace": "sap.orders",
  "fields": [
    {"name": "DOCNUM", "type": "string"},
    {"name": "BELNR", "type": "string"},
    {"name": "DATUM", "type": "string"},
    {"name": "UZEIT", "type": "string"}
  ]
}

Add a new field? Register a new schema version. Compatible consumers auto-upgrade. Old consumers ignore the new field. No coordination needed.

We've pushed 3 schema updates in 6 months. Zero consumer breakage. Used to dread SAP IDoc changes because every consumer needed manual updates.

What Actually Changed After

Six months post-migration, our metrics shifted:

Before (monolithic IDocs):

  • Peak consumer lag: 2 hours
  • Average message size: 3.2MB
  • Billing service processes: 100% of IDoc messages
  • Schema change deployment: 2 weeks per change
  • Replay time for analytics: 2 hours

After (exploded segments):

  • Peak consumer lag: <30 seconds
  • Average message size: 1.8KB
  • Billing service processes: only invoice topics (60% reduction)
  • Schema change deployment: same-day, zero coordination
  • Replay time for analytics: 5 minutes

Biggest win? Teams stopped coordinating deploys. Billing team updates invoice handling without telling warehouse team. Schemas handle compatibility automatically.

The analytics team celebrates most. They replay months of order line items in minutes. Before? Replaying required processing every complete IDoc. Gigabytes of XML for data they didn't need.

What This Pattern Costs

Being honest: the explode pattern isn't free.

Kafka Connect CPU usage doubled. We went from pass-through (SAP → Kafka → consumer) to active parsing at ingestion. Budgeted 2x capacity on Connect workers.

Topic count exploded (pun intended). 10 major IDoc types, 15-20 segments each, 150+ topics. Broker metadata overhead grew. Monitoring complexity increased.

Error handling got harder. One bad IDoc segment used to fail one message. Now it can partially succeed. We added a dead letter queue and alert on parse failures.

prometheus_alerts.yml
groups:
  - name: sap_idoc_explosion_alerts
    rules:
    - alert: HighIDocSegmentFailureRate
      expr: |
        sum(rate(kafka_smt_dlq_total[5m])) 
        / 
        sum(rate(kafka_smt_segments_total[5m])) > 0.01
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "High failure rate for {{ $labels.idoc_type }}"

Development investment was real. Writing the custom SMT took 3 weeks. Testing across all our IDoc types took another 2 weeks. Not a weekend project.

But for high-volume SAP integrations where selective consumption and independent schema evolution matter? Worth every hour.

⚠️ Fault-Tolerant Explosion Logic
1. Malformed XML

Source IDoc is unreadable. Entire file routed to sap.errors.raw.

2. Schema Mismatch

Segment E1EDP26 changed in SAP. Only that segment goes to sap.errors.schema.

3. Auto-Retry

DLQ consumer alerts SREs to update Schema Registry or fix the SMT logic.

If You're Considering This

Start with your highest-volume IDoc types. We began with ORDERS and INVOICES. Proved value. Then migrated inventory, then master data.

Keep the monolithic topic running during transition. Dual-write for 2 months. Legacy consumers kept working. New consumers adopted gradually. Zero-downtime migration.

Watch for partner-specific IDoc extensions. Some of our suppliers add custom segments. Those need special schema handling. Document the quirks.

Test partition distribution early. We initially partitioned by IDoc type, not document number. Killed ordering guarantees within multi-segment documents. Painful rollback.

The Bigger Pattern

This isn't really about SAP. It's about not shipping files when you mean to ship events.

We see it everywhere. Teams dump JSON files to Kafka topics. Avro-wrapped PDFs. Complete database snapshots. Anything blob-like.

If consumers parse your payload to extract pieces, you're hiding structure. Kafka's partition-level parallelism, consumer group semantics, schema evolution all work better with granular events.

Break the monolith. Ship the atoms.

If you're wrestling with IDocs in Kafka, happy to compare notes. Reach out at kafscale.io.

Further reading:

About Scalytics

Scalytics architects and troubleshoots mission-critical streaming, federated execution, and AI systems for scaling SMEs. When Kafka pipelines fall behind, SAP IDocs block processing, lakehouse sinks break, or AI pilots collapse under real load, we step in and make them run.

Our founding team created Apache Wayang (now an Apache Top-Level Project), the federated execution framework that orchestrates Spark, Flink, and TensorFlow where data lives and reduces ETL movement overhead.

We also invented and actively maintain KafScale (S3-Kafka-streaming platform), a Kafka-compatible, stateless data and large object streaming system designed for Kubernetes and object storage backends. Elastic compute. No broker babysitting. No lock-in.

Our mission: Data stays in place. Compute comes to you. From data lakehousese to private AI deployment and distributed ML - all designed for security, compliance, and production resilience.

Questions? Join our open
Slack community or schedule a consult.
back to all articles
Unlock Faster ML & AI
Free White Papers. Learn how Scalytics Copilot streamlines data pipelines, empowering businesses to achieve rapid AI success.

The experts for mission-critical infrastructure.

Launch your data + AI transformation.