We spent two years dumping SAP IDocs as monolithic files into Kafka. Seemed like the obvious path. SAP sends XML. Kafka stores messages. Ship the whole thing. Let consumers figure it out.
That approach broke during peak season. Order volume spiked 3x. Our Kafka consumers fell 2 hours behind. Billing stalled. Warehouse couldn't see inventory updates. Support tickets piled up. The root cause? We were treating structured business documents like opaque file blobs.
Here's what we learned tearing down that monolith and rebuilding it the right way.
The Files-as-Payloads Trap
SAP IDocs carry real business meaning. An ORDERS05 IDoc isn't just XML. It's an order header, line items, pricing rules, partner data, delivery schedules. Hierarchical structure with dozens of segment types.
Our original Kafka setup:
- One topic:
sap.idocs - One message = one complete IDoc XML file
- Consumers parse full document, extract what they need
- Average payload: 3.2MB during our incident
- Parse time: 18 seconds per message
The billing service needed invoice totals. Nothing else. But it pulled down complete 5MB INVOIC02 IDocs, parsed every segment, grabbed one field. Pure waste.
The warehouse service wanted inventory movements. Got flooded with pricing updates, order headers, everything. Spent CPU filtering noise.
During our peak failure:
- Consumer lag hit 8,400 messages
- Processing fell 2 hours behind real-time
- Recovery took 4 hours
- Business impact: delayed billing, inventory sync failures
We were shipping files, not events. Kafka became a glorified file system instead of an event bus.
"Our Kafka consumers fell 8,400 messages behind. Recovery took 4 hours. We weren't running a real-time event bus; we were running a very expensive, very slow distributed file system."
What Breaking It Down Actually Looks Like
The fix wasn't subtle refactoring. We tore out the file-as-message pattern completely.
New flow:
SAP ECC
↓ IDoc XML
Kafka Connect (custom transform)
↓ Parse + Explode segments
Multiple domain-specific topics:
- sap.orders.header (E1EDK01 segments)
- sap.orders.items (E1EDP01 segments)
- sap.orders.pricing (E1EDP26 segments)
- sap.invoices.header (E1EDK02 segments)
↓ Partition by document number
Consumers subscribe selectively
Each segment type gets its own topic. Billing subscribes to sap.invoices.* only. Warehouse reads sap.inventory.*. Analytics pulls line items without headers.
We built a Kafka Connect Single Message Transform that parses IDocs at ingestion. One IDoc in, many segment events out. Each segment carries context (document number, IDoc type) but stands alone.
The parsing logic in Python (we prototyped before writing the Java SMT):
import xml.etree.ElementTree as ET
def explode_idoc(idoc_xml, producer):
root = ET.fromstring(idoc_xml)
docnum = root.find('.//DOCNUM').text
for segment in root.findall('.//E1*'):
segment_type = segment.tag
fields = {child.tag: child.text for child in segment}
fields['DOCNUM'] = docnum
topic = f"sap.{segment_type.lower()}"
producer.send(topic, key=docnum, value=fields)We use the document number as the message key. Kafka partitions by key. All segments from one order land in the same partition, same consumer. Ordering guarantees preserved within a document.
Schema Registry Saves Us From Breaking Changes
Every segment type has an Avro schema. Order headers look different from line items look different from pricing. When SAP adds a field to headers, only header consumers care.
Order header schema:
{
"type": "record",
"name": "OrderHeader",
"namespace": "sap.orders",
"fields": [
{"name": "DOCNUM", "type": "string"},
{"name": "BELNR", "type": "string"},
{"name": "DATUM", "type": "string"},
{"name": "UZEIT", "type": "string"}
]
}Add a new field? Register a new schema version. Compatible consumers auto-upgrade. Old consumers ignore the new field. No coordination needed.
We've pushed 3 schema updates in 6 months. Zero consumer breakage. Used to dread SAP IDoc changes because every consumer needed manual updates.
What Actually Changed After
Six months post-migration, our metrics shifted:
Before (monolithic IDocs):
- Peak consumer lag: 2 hours
- Average message size: 3.2MB
- Billing service processes: 100% of IDoc messages
- Schema change deployment: 2 weeks per change
- Replay time for analytics: 2 hours
After (exploded segments):
- Peak consumer lag: <30 seconds
- Average message size: 1.8KB
- Billing service processes: only invoice topics (60% reduction)
- Schema change deployment: same-day, zero coordination
- Replay time for analytics: 5 minutes
Biggest win? Teams stopped coordinating deploys. Billing team updates invoice handling without telling warehouse team. Schemas handle compatibility automatically.
The analytics team celebrates most. They replay months of order line items in minutes. Before? Replaying required processing every complete IDoc. Gigabytes of XML for data they didn't need.
What This Pattern Costs
Being honest: the explode pattern isn't free.
Kafka Connect CPU usage doubled. We went from pass-through (SAP → Kafka → consumer) to active parsing at ingestion. Budgeted 2x capacity on Connect workers.
Topic count exploded (pun intended). 10 major IDoc types, 15-20 segments each, 150+ topics. Broker metadata overhead grew. Monitoring complexity increased.
Error handling got harder. One bad IDoc segment used to fail one message. Now it can partially succeed. We added a dead letter queue and alert on parse failures.
Development investment was real. Writing the custom SMT took 3 weeks. Testing across all our IDoc types took another 2 weeks. Not a weekend project.
But for high-volume SAP integrations where selective consumption and independent schema evolution matter? Worth every hour.
If You're Considering This
Start with your highest-volume IDoc types. We began with ORDERS and INVOICES. Proved value. Then migrated inventory, then master data.
Keep the monolithic topic running during transition. Dual-write for 2 months. Legacy consumers kept working. New consumers adopted gradually. Zero-downtime migration.
Watch for partner-specific IDoc extensions. Some of our suppliers add custom segments. Those need special schema handling. Document the quirks.
Test partition distribution early. We initially partitioned by IDoc type, not document number. Killed ordering guarantees within multi-segment documents. Painful rollback.
The Bigger Pattern
This isn't really about SAP. It's about not shipping files when you mean to ship events.
We see it everywhere. Teams dump JSON files to Kafka topics. Avro-wrapped PDFs. Complete database snapshots. Anything blob-like.
If consumers parse your payload to extract pieces, you're hiding structure. Kafka's partition-level parallelism, consumer group semantics, schema evolution all work better with granular events.
Break the monolith. Ship the atoms.
If you're wrestling with IDocs in Kafka, happy to compare notes. Reach out at kafscale.io.
Further reading:
- Kafka security for sensitive SAP data: Confluent security practices
- Real-time ERP integration at scale: Bosch case study, Siemens story
- Event sourcing fundamentals: Martin Fowler
About Scalytics
Our founding team created Apache Wayang (now an Apache Top-Level Project), the federated execution framework that orchestrates Spark, Flink, and TensorFlow where data lives and reduces ETL movement overhead.
We also invented and actively maintain KafScale (S3-Kafka-streaming platform), a Kafka-compatible, stateless data and large object streaming system designed for Kubernetes and object storage backends. Elastic compute. No broker babysitting. No lock-in.
Our mission: Data stays in place. Compute comes to you. From data lakehousese to private AI deployment and distributed ML - all designed for security, compliance, and production resilience.
Questions? Join our open Slack community or schedule a consult.
