Answered here so the briefing can go to your actual constraints. Repository links resolve to the upstream Apache 2.0 source. Architecture references map to public documentation.
Does this replace our Confluent or Apache Kafka cluster?
No. The pattern is explicitly additive. kaf-mirror sits next to your production cluster as one more read-only consumer. KafScale is a parallel spine for the BDR copy. Production traffic, ksqlDB, Connect, Stream Governance, and every other feature you rely on continues to work exactly as before. The first deployment can be one mirrored topic in a single namespace; the production team is not required to change a single line of producer code.
How does point-in-time recovery actually work?
KafScale segment writes carry their commit timestamp in S3 object metadata. The kafscale-cli restore command takes a topic name and a target timestamp, lists segments under the topic prefix in S3, filters to those that closed before the target time, and exposes them as a recovered topic that standard Kafka consumers can read. The procedure works without a running source cluster, which is why it qualifies as a backup rather than a replica.
What happens during a long target-cluster outage?
kaf-mirror does not require a separate disk buffer. The internal franz-go producer has an in-memory buffer with idempotent retries for transient interruptions. If KafScale is unavailable for an extended period, the consumer applies backpressure and stops advancing the offset on the source cluster. The source cluster itself acts as the durable buffer through its existing retention. When KafScale returns, replication resumes from the last committed offset, no data lost. This is documented in the kaf-mirror Resilience section.
Can this run air-gapped?
Yes. kaf-mirror and KafScale both run without internet egress. Container images mirror to your registry. The only requirements are a reachable S3-compatible endpoint (MinIO, Ceph, or on-prem object store) and a reachable etcd ensemble. Both can sit inside your network boundary or inside a classified enclave. The data path makes no external calls.
What is the realistic ingestion ceiling?
A single kaf-mirror replication job handles up to roughly 250 MB/s sustained with zstd compression and a 50 MB fetch batch. Larger estates run multiple jobs in parallel, partitioned by topic prefix. KafScale itself scales horizontally with stateless broker pods, and S3 is effectively unbounded for the throughput tier most BDR workloads need. The bottleneck is almost always source-cluster egress, not the BDR side.
What does Scalytics provide on top of the open source?
Architecture review for BDR and DR strategy, hardening for regulated environments, DORA and NIS2 evidence packs mapped to your specific topics and retention, integration with Apache Wayang for federated analytics over the BDR data, and 24/7 enterprise support contracts. The repositories remain Apache 2.0. The product roadmap is independent of any specific support engagement.
Will offsets match between source and BDR?
For PITR and disaster recovery, exact offset preservation across vendors is not the design goal. kaf-mirror preserves order within a partition, preserves message keys, and uses idempotent production to keep duplicates rare. The recovered topic on KafScale has its own offset space, which is the expected behavior for a backup target. If your workload requires offset-preserving replication for active-active or hot failover, that is a different pattern with different tradeoffs and we will say so on the briefing call.
How long does a full pilot take?
Two weeks for a working pilot in a non-production environment, including kaf-mirror deployment, one KafScale cluster on your Kubernetes, three to five topics mirrored, and one restore-test report generated. Four to six weeks for a production rollout with compliance evidence aligned to your specific DORA or NIS2 control set. The architecture sprint covers both phases.