Streaming Data Becomes Storage-Native
Apache Kafka established the distributed commit log as the foundation for real-time data processing. The protocol is now commodity infrastructure. The constraint is the storage architecture. This article examines how storage-native streaming separates analytical workloads from broker infrastructure, following the same disaggregation pattern that moved batch processing from HDFS to S3. We introduce KafScale, an open source implementation with a documented storage format that enables direct S3 access for batch and AI workloads without broker involvement. The architecture draws on research from Apache Wayang, Apache Flink (FLIP-531), and production deployments at organizations adopting diskless Kafka alternatives.

