How does point-in-time recovery work?

KafScale segment writes carry commit timestamps in S3 object metadata. The kafscale-cli restore command takes a topic and target timestamp, lists segments closed before that time, and exposes them as a recovered topic that standard Kafka consumers can read.

What does this cost compared to Cluster Linking or MSK Replicator?

At 90 TB retention, S3 storage at $0.023 per GB-month costs roughly $2,100 per month. Cluster Linking from Confluent Enterprise to Enterprise approximately doubles the primary spend. MSK Replicator across regions runs around $22,500 per month before cross-region transfer. These are public-pricing planning estimates as of April 2026.

How does point-in-time recovery work for Kafka topics?

KafScale segment writes carry commit timestamps in S3 object metadata. The kafscale-cli restore command takes a topic and target timestamp, lists segments closed before that time, and exposes them as a recovered topic that standard Kafka consumers can read. Unlike MirrorMaker, there is no offset translation table that drifts under retention pressure.

Can a Kafka BDR pattern run air-gapped?

Yes. kaf-mirror and KafScale both run without internet egress. Container images mirror to your registry. The data path requires only a reachable S3-compatible endpoint such as MinIO or on-premises Ceph, and an etcd ensemble. Both can run inside a classified enclave.

How does this compare in cost to Cluster Linking or MSK Replicator?

For 90 TB retention, S3 storage at 0.023 USD per GB-month costs roughly 2,100 USD per month. Cluster Linking from Confluent Enterprise to Enterprise approximately doubles the primary spend. MSK Replicator across regions runs around 22,500 USD per month before cross-region transfer. These are public-pricing planning estimates as of April 2026.

Will offsets match between source and BDR target?

For PITR and disaster recovery, exact offset preservation across vendors is not the design goal. kaf-mirror preserves order within partition, preserves keys, and uses idempotent production. The recovered topic on KafScale has its own offset space, which is expected for a backup target.

Do we need a new Kafka cluster to run KafClaw?

No. KafClaw produces and consumes from any cluster that speaks the Apache Kafka protocol, including Confluent, Confluent Cloud, Redpanda, Amazon MSK, and self-hosted Apache Kafka. Treat the agent topics as new namespaces under your existing topic policy.

How does KafClaw work alongside Confluent?

As a standard Kafka client. SASL/SCRAM, mTLS, schema registry, and the rest of the Confluent surface are configured through standard environment variables. Agent topics live alongside existing topics under the existing ACL model. Scalytics is a Confluent partner and this is the supported coexistence pattern.

Does KafClaw lock you into a specific LLM provider?

No. The provider layer is an interface. OpenAI, OpenRouter, and OpenAI-compatible self-hosted endpoints are supported out of the box. Scalytics Copilot covers the on-prem case for organizations that need reasoning to stay inside the firewall.

How is access between agent groups controlled?

Two layers. At the wire, Kafka ACLs apply to the agent topic hierarchy exactly as they do to any other topics on the cluster. At the application, the orchestrator supports zones with public, shared, and private visibility, isolating groups unless a zone bridge is configured.

How does KafClaw differ from LangGraph, AutoGen, or CrewAI?

LangGraph, AutoGen, and CrewAI coordinate agents inside one process. KafClaw coordinates agents across processes, machines, and languages using Kafka as the wire. They compose: a Python agent built with LangGraph can publish a response envelope to a KafClaw group topic and become part of a multi-host system without changing its internal graph.

Is the KafClaw source available before purchase?

Yes. The repository is open at github.com/KafClaw/KafClaw under Apache 2.0. Envelope schemas, the topic protocol, and configuration keys are documented at kafclaw.scalytics.io. There is no enterprise-only branch.

What does Scalytics provide on top of the open-source project?

Architecture review for multi-agent platforms, hardening for regulated environments, integration with Apache Wayang for federated execution, integration with Scalytics Copilot for private LLM hosting, and 24/7 enterprise support contracts. The KafClaw repository remains Apache 2.0 and its roadmap is independent of any specific support engagement.

Kafka backup, disaster recovery, point-in-time restore

One spine.
Every Kafka.
Mirrored to S3.

kaf-mirror live-syncs Confluent, Apache Kafka, Redpanda, or MSK to a KafScale spine on S3. Immutable. Point-in-time recoverable. Apache 2.0 on both ends. Production stays untouched. The same infrastructure becomes your AI agent platform.

▸ Book a briefing ▸ kaf-mirror on GitHub

live mirror · prod → bdr

streaming

production

CONFLUENT

untouched · real-time SLA

any kafka source-agnostic

kaf-mirror

live sync

franz-go · idempotent · regex map

112 ms replication lag

kafscale

BDR spine

stateless brokers · .kfs to S3

412,803 segments

S3 · object lock · immutable 11 nines durability

Events mirrored

84,217,000

PITR window

unbounded

License

Apache 2.0

Kafka backup, disaster recovery, point-in-time restore

One spine.
Every Kafka.
Mirrored to S3.

Immutable. Point-in-time recoverable. Apache 2.0 on both ends. Production stays untouched. Insurance costs a fraction of a second cluster.

Talk with an Expert How It WOrks

Replication is not backup

If you can delete prod with one command, your DR cluster deletes with it.

Cluster Linking, MirrorMaker 2, and MSK Replicator are excellent at one thing: keeping a hot standby in sync with production. None of them protect against a misfired `kafka-topics --delete`, a ransomware attack that walks through the producer credentials, or a retention misconfiguration that drops three weeks of audit data overnight. The deletion replicates. The corruption replicates. The compromised credentials replicate. A Kafka backup and disaster recovery strategy needs something replication cannot provide: an immutable, point-in-time, off-vendor copy of the log.

The three failure modes

Where hot replication quietly fails you.

Every production Kafka incident postmortem we have read in the last two years sits in one of three buckets. None of them are solved by the standby cluster you already pay for. They are solved by an immutable, time-indexed copy of the log that lives outside the production blast radius.

01 · Operator error

The accidental delete propagates.

An engineer runs `kafka-topics --delete` against production thinking it is staging. A `terraform apply` removes 14 topics. MirrorMaker 2 and Cluster Linking do exactly what they were configured to do: mirror the deletion to the DR cluster. Now you have two copies of nothing. KIP-382 itself notes MirrorMaker is "insufficient for many use cases, including backup, disaster recovery, and fail-over scenarios."

02 · Ransomware in the stream

Credentials get compromised.

A producer key leaks. An attacker writes malformed events into a payments topic, or worse, deletes consumer offsets and resets retention to one hour. The standby mirrors the writes. Replication is not isolation. The only defense is a copy that the production credentials cannot reach, stored under different keys, under retention rules production cannot modify.

03 · No rewind

Replication shows current state. Backup shows history.

A bug ships at 09:14 and writes corrupt events for 47 minutes before someone notices. You need the log as of 09:13. A hot standby gives you the log as of right now, with the same corruption. Point-in-time recovery requires immutable segments indexed by timestamp, which is exactly what KafScale .kfs segments on S3 with object lock provide.

The architecture

Production untouched. BDR on S3 economics.

The pattern uses two open-source components Scalytics already maintains. kaf-mirror is a high-performance franz-go replicator with an enterprise dashboard, AI anomaly detection, and built-in compliance reporting. KafScale is a stateless, Kafka-protocol-compatible streaming spine that writes immutable .kfs segments to S3. Together they form a backup-DR target that does not require a second production-sized cluster, does not lock you to a vendor, and does not double your storage bill.

production · untouched

Source cluster

confluent / apache
msk / redpanda
real-time SLA preserved

→

kaf-mirror · live sync

Replication engine

franz-go · idempotent producer
same-name or regex mapping
dashboard · ai anomalies · audit

→

kafscale + s3 object lock

BDR target

stateless brokers · etcd metadata
.kfs segments · immutable
11 nines durability

Point-in-time recovery works because every .kfs segment carries its commit timestamp in object metadata. To rewind a topic, KafScale lists segments under the topic prefix, filters to those that closed before the target timestamp, and exposes them as a recovered topic that consumers can replay. The procedure for the operator is one command: kafscale-cli restore --topic orders --to "2026-05-13T14:23:00Z". The data path is the same Kafka wire protocol your consumers already speak.

Three sources · one BDR target

Connect any Kafka. The mirror config is six lines.

kaf-mirror reads its base configuration from configs/default.yml and stores runtime job state in SQLite. The schema is identical for every Kafka-protocol source. What changes between Confluent, Apache, and Redpanda is the bootstrap address and the SASL mechanism. The target block points at the KafScale proxy DNS endpoint. Once both clusters are configured, replication jobs are created interactively with mirror-cli jobs add, or via the REST API. The actual schema below comes from the upstream scalytics/kaf-mirror repository.

Confluent Cloud → KafScale BDR

PLAIN SASL · API key auth · TLS required

configs/default.yml

# kaf-mirror · Confluent Cloud → KafScale BDR target
# Confluent issues SASL/PLAIN API keys. Confluent egress is
# charged per GB; increase fetch_max_bytes and linger_ms to
# amortize. Production traffic on Confluent is unaffected.
 
source_cluster:
  name: "confluent-prod"
  brokers: "pkc-xxxxx.eu-west-1.aws.confluent.cloud:9092"
  security:
    enabled: true
    sasl_mechanism: "PLAIN"
    sasl_username: "${CC_API_KEY}"
    sasl_password: "${CC_API_SECRET}"
    tls:
      enabled: true
 
target_cluster:
  name: "kafscale-bdr"
  brokers: "kafscale-proxy.bdr.svc.cluster.local:9092"
  security:
    enabled: true
    sasl_mechanism: "SCRAM-SHA-256"
    sasl_username: "${KAFSCALE_USER}"
    sasl_password: "${KAFSCALE_PASS}"
 
# Tune for Confluent egress economics
consumer:
  fetch_max_bytes: 52428800     # 50 MB batches
  auto_offset_reset: "earliest"
 
producer:
  idempotent: true
  acks: "all"
  compression: "zstd"
  linger_ms: 50

create the replication job

# Job creation via mirror-cli (interactive)
./mirror-cli login admin
./mirror-cli jobs add \
  --source confluent-prod \
  --target kafscale-bdr \
  --topics "orders.*,payments.*,audit.*" \
  --exclude "_confluent.*,__.*" \
  --mapping same-name

Apache Kafka (self-managed) → KafScale BDR

SASL/SCRAM-SHA-512 · mTLS · on-prem or self-hosted

configs/default.yml

# kaf-mirror · Apache Kafka → KafScale BDR target
# Self-managed Apache Kafka with SCRAM-SHA-512 and TLS.
# Works identically against Strimzi, AKHQ-managed clusters,
# and Confluent Platform (the self-hosted product).
 
source_cluster:
  name: "prod-apache-kafka"
  brokers: "kafka-prod-1.internal:9093,kafka-prod-2.internal:9093,kafka-prod-3.internal:9093"
  security:
    enabled: true
    sasl_mechanism: "SCRAM-SHA-512"
    sasl_username: "${KAFKA_USER}"
    sasl_password: "${KAFKA_PASS}"
    tls:
      enabled: true
      ca_cert: "/etc/kaf-mirror/ca.pem"
      verify_hostname: true
 
target_cluster:
  name: "kafscale-bdr"
  brokers: "kafscale-proxy.bdr.svc.cluster.local:9092"
  security:
    enabled: true
    sasl_mechanism: "SCRAM-SHA-256"
    sasl_username: "${KAFSCALE_USER}"
    sasl_password: "${KAFSCALE_PASS}"
 
producer:
  idempotent: true
  acks: "all"
  compression: "zstd"

create the replication job

./mirror-cli login admin
./mirror-cli jobs add \
  --source prod-apache-kafka \
  --target kafscale-bdr \
  --topics "orders.*,payments.*,audit.*" \
  --exclude "*.tmp,*.debug" \
  --mapping same-name \
  --batch-size 50000 \
  --compression zstd
 
# Inspect job state at any time
./mirror-cli jobs status full

Redpanda → KafScale BDR (regex namespace)

SCRAM-SHA-256 · regex topic mapping

configs/default.yml

# kaf-mirror · Redpanda → KafScale BDR target
# Redpanda speaks the Kafka wire protocol natively. The only
# config delta from Apache is the bootstrap address pattern.
# Mapped topics are namespaced under bdr.* on the target.
 
source_cluster:
  name: "redpanda-prod"
  brokers: "seed-0.redpanda-prod.internal:9092,seed-1.redpanda-prod.internal:9092,seed-2.redpanda-prod.internal:9092"
  security:
    enabled: true
    sasl_mechanism: "SCRAM-SHA-256"
    sasl_username: "${RP_USER}"
    sasl_password: "${RP_PASS}"
    tls:
      enabled: true
 
target_cluster:
  name: "kafscale-bdr"
  brokers: "kafscale-proxy.bdr.svc.cluster.local:9092"
  security:
    enabled: true
    sasl_mechanism: "SCRAM-SHA-256"
    sasl_username: "${KAFSCALE_USER}"
    sasl_password: "${KAFSCALE_PASS}"
 
producer:
  idempotent: true
  acks: "all"
  compression: "zstd"

create the replication job (regex mapping)

./mirror-cli login admin
./mirror-cli jobs add \
  --source redpanda-prod \
  --target kafscale-bdr \
  --topics "events.*" \
  --mapping regex \
  --pattern "events\.(.*)" \
  --replacement "bdr.events.\$1"
 
# events.orders   on Redpanda
# bdr.events.orders   on KafScale

Cost reality · planning estimate

BDR insurance for the cost of an S3 bucket.

Reference workload: 3 TB per day ingest, 30-day retention (90 TB), one mid-size mirrored estate. Storage and replication line items only. Networking egress and support contracts excluded. Source: vendor public pricing and AWS public pricing as of April 2026. The point is the ratio, not the exact dollar figure. Your numbers will move with throughput and region, but the column ordering does not.

BDR approach	Storage cost / month	PITR	Source lock-in	License
Confluent Cluster Linking (Enterprise → Enterprise)	~$38,000 (doubles primary)	No	Confluent both ends	Proprietary
MSK Replicator (cross-region)	~$22,500 + cross-region transfer	No	MSK both ends	Proprietary
Redpanda Shadowing + WCR	~$18,000 (Enterprise license required)	Whole-cluster only	Redpanda source	Source available, Enterprise paywall
MirrorMaker 2 → second open Kafka cluster	~$12,000 (second cluster + ops)	No	None, but offset drift	Apache 2.0
kaf-mirror → KafScale on S3 (this page)	~$2,100 (90 TB at $0.023/GB)	Yes, per-topic timestamp	None, any Kafka source	Apache 2.0 on both ends

▸ Confluent figures derived from published eCKU pricing and the cluster-linking line item.
▸ MSK figures from the AWS MSK pricing page, kafka.m5.large × 3 brokers, 30-day retention.
▸ S3 storage at $0.023 per GB-month (us-east-1). API charges trivial at this scale.
▸ Compute for kaf-mirror and KafScale brokers is a small Kubernetes deployment, typically under $300/month.

Two outcomes · one deployment

BDR today. Agent spine tomorrow.

The same kaf-mirror plus KafScale infrastructure you deploy for backup and disaster recovery is the substrate AI agent workloads actually need. Production Kafka serves real-time SLAs in single-digit milliseconds. Agents need months of history, replayable decision logs, and long-retention prompt traces. Those two access patterns cannot share a cluster without one starving the other. So the BDR copy already solves the second problem: it is durable, queryable, isolated from production, and built for retention measured in years instead of hours. One Kubernetes deployment. One S3 bucket. Two strategic capabilities.

use case · BDR

Disaster recovery and compliance.

The reason this pattern gets budget approval. Immutable S3 segments, per-topic point-in-time restore, DORA Article 12 and NIS2 Annex II evidence packs from the same dashboard. Production stays untouched, the BDR cluster has no shared credentials, and the recovery procedure is one documented command.

Per-topic PITR via segment timestamps
S3 object lock, separate keys
Compliance reports daily, weekly, monthly
Restore tested by scheduler, not on the day

shared infrastructure

Same operator. Same bucket. Same credentials boundary.

Both outcomes deploy on the same Kubernetes operator, write to the same S3 bucket under different prefixes, and run under the same separation-of-duties model. No second platform team. No second procurement cycle. The BDR investment buys the agent platform at zero marginal infrastructure cost.

One KafScale operator on Kubernetes
One S3 bucket, prefix-namespaced workloads
One etcd ensemble for metadata
One audit trail across both use cases

use case · AI agents

Replay, retention, reasoning over history.

Agent workloads cannot run on the same brokers that serve real-time. Replay over months of history burns broker disk, fights production retention policies, and exposes the cluster to slow-consumer failure modes. KafScale segments live in S3 and are read directly by agents, processors, and replay tools, bypassing brokers entirely. The same data that protects you also powers the experimentation.

Decision logs, tool calls, prompt history
Replay any agent run from any point
Years of retention on S3 economics
Zero broker contention with production

Why this matters now. Most platform teams are being asked to support AI agent workloads on infrastructure that was designed for real-time streaming. Putting agent replay traffic on a Confluent or Apache cluster that also serves sub-10ms production paths is how you get on-call pages at 2 AM. The KafScale BDR target is already isolated, already durable, already on S3. Pointing agent workloads at it requires no new procurement and no new architecture review. See KafClaw for the agent runtime that runs on top of this spine.

Honest comparison · May 2026

Where this pattern fits, and where others fit better.

The Kafka BDR market is small but real, and every option has a niche where it is the right answer. Kannika Armory is the most direct comparison: a mature, purpose-built commercial product with strong DORA and NIS2 positioning. OSO kafka-backup is the closest open-source equivalent, although it is a CLI tool rather than a platform. The hyperscaler-native and Kafka-vendor-native options each work well when you are already committed to the matching ecosystem. The table below is what an honest procurement evaluation looks like.

Solution	License	Point-in-time recovery	Source flexibility	Compliance UI	Doubles as agent platform	Best for
Kannika Armory commercial · EU	Commercial	Yes, native	Confluent, Apache Kafka	Yes, mature	No	Regulated EU teams who want a single-purpose, sales-supported BDR product and accept closed-source.
OSO kafka-backup MIT · Rust CLI	MIT	Yes, ms precision	Any Kafka	No, CLI only	No	Single-team backups, manual runbooks, no compliance reporting requirement.
Confluent Cluster Linking proprietary	Proprietary	No, replication only	Confluent both ends	Confluent Console	No	Confluent-to-Confluent active-passive failover for the existing customer base.
MSK Replicator proprietary	Proprietary	No, replication only	MSK both ends	AWS Console	No	Single-AWS-account, same-region or cross-region MSK replication.
MirrorMaker 2 Apache 2.0	Apache 2.0	No, KIP-382 acknowledges this	Any Kafka	No	No	Migration, geo-replication, and hot-standby where offset drift is acceptable.
Redpanda Shadowing + WCR BSL + Enterprise	BSL with Enterprise paywall	Whole-cluster only	Redpanda source	Redpanda Console	No	Redpanda-native teams already on Enterprise tier.
Lenses Stream Reactor S3 Apache 2.0 connector	Apache 2.0	No, batch sink	Any Kafka via Connect	No	No	Teams already operating a Kafka Connect cluster who want raw archive to S3.
Scalytics: kaf-mirror + KafScale Apache 2.0 both ends	Apache 2.0	Yes, per-topic timestamp	Any Kafka wire protocol	Yes, with AI anomalies	Yes, same infrastructure	Teams who want open source on both ends, source-agnostic BDR, and one deployment that also serves AI agent workloads.

▸ Kannika Armory ships a polished commercial product with strong DORA and NIS2 framing. Where Scalytics differs: Apache 2.0 on both ends, source-available before procurement, and the BDR target is also the agent platform.
▸ OSO kafka-backup is the closest open-source equivalent on the backup side and is excellent at its scope. Where Scalytics differs: live mirror instead of batch, web dashboard, compliance reports, and an integrated streaming target.
▸ MirrorMaker 2 is the right tool for replication. KIP-382 itself states it is insufficient for backup and disaster recovery.
▸ Cluster Linking, MSK Replicator, Redpanda Shadowing are vendor-native and require the matching vendor on both ends.
Information current as of May 2026. Vendor product pages reviewed, public pricing referenced where available.

DORA · NIS2 · ISO 27001

Evidence regulators ask for. Built into the stack, not bolted on.

DORA Article 12 requires financial entities to maintain immutable backups with rigorously tested recovery procedures. NIS2 Annex II requires comparable controls for essential and important entities. Both frameworks require evidence: the audit trail, the retention proof, the recovery test. kaf-mirror generates compliance reports daily, weekly, or monthly with full audit trails. KafScale stores the actual log segments in S3 under object lock. The combination produces regulator-grade evidence without a separate audit pipeline.

DORA · Article 12

Backup and restoration of ICT systems

Article 12(1) · Immutable backupsS3 Object Lock in compliance mode prevents deletion and modification regardless of credentials. Retention rules enforced at the bucket policy layer.
Article 12(2) · Restoration procedurePer-topic point-in-time restore via documented kafscale-cli command. Reproducible. Tested via the kaf-mirror compliance scheduler.
Article 12(3) · SegregationBDR target runs on separate credentials, separate keys, separate cluster, separate bucket. Production credentials cannot reach BDR storage.
Article 12(4) · Testingkaf-mirror Compliance tab generates monthly restore-test reports with timestamps and recovered offsets.

NIS2 · Annex II

Cybersecurity risk-management measures

Annex II(1)(d) · Business continuityAir-gappable BDR target in a separate availability zone, region, or on-premises cluster. KafScale runs without internet egress.
Annex II(1)(e) · Supply chain securityBoth kaf-mirror and KafScale are Apache 2.0, source available, no proprietary control plane that can revoke access or change pricing.
24-hour incident notificationkaf-mirror anomaly detection surfaces replication lag spikes, schema drift, and credential failures as alertable signals before they become incidents.
Audit trailRole-based access (admin/operator/monitor/compliance). Every state-modifying action is logged with actor, timestamp, and outcome.

Architecture commitments

Red lines, in plain text.

A buyer running BDR for a regulated process reads architectural commitments more carefully than they read marketing claims. Each line below is enforced in the build pipeline, documented in the source, and visible at the protocol layer.

Production traffic is never affected. kaf-mirror consumes via a read-only credential. The producer to KafScale is a separate process. Production brokers see one more consumer, nothing else.
The BDR target is reachable only from kaf-mirror. Network policies, distinct credentials, distinct encryption keys, distinct S3 bucket. A compromised producer key on prod cannot reach the BDR copy.
S3 segments are immutable under object lock. Compliance mode prevents deletion until the retention clock expires. Not even the bucket owner can override.
Point-in-time recovery uses segment commit timestamps, not offset mapping. Unlike MirrorMaker, there is no offset translation table that drifts under retention pressure.
The .kfs segment format is part of the public specification. If kaf-mirror or KafScale ever disappear, the segments in your bucket remain readable by any conformant reader.
No vendor control plane. Both tools run on your Kubernetes, your S3, your network. There is no external service to revoke access or rate-limit operations.
The license is Apache 2.0 on both sides. No BSL conversion clauses, no per-GiB usage fees, no commercial-use restrictions. Now and after a hypothetical acquisition.

Honest limits

What this pattern is not.

The same architectural choices that make this BDR pattern cheap also make it the wrong choice for a small set of workloads. We name them here because the cost of finding out after deployment is high.

not this

Sub-10ms latency failover

For synchronous trading or fraud-detection paths that require RPO=0 and millisecond failover, run a stretch cluster with synchronous replication across availability zones. KafScale produce latency is 200 to 500ms because writes commit to S3 before acknowledgement.

not this

Confluent feature replacement

ksqlDB, Schema Registry mirroring, Stream Designer, Connect-managed connectors. KafScale is transport and storage only. If those features are why you run Confluent, this is a BDR pattern alongside Confluent, not a replacement for it. Scalytics is a Confluent partner.

not this

Active-active multi-region

Active-active with offset-preserving bi-directional replication is a different problem with different tradeoffs. WarpStream Orbit and Confluent Cluster Linking with mirror topics both target that pattern. This pattern is active-passive BDR with an immutable rewindable copy.

Frequently asked before a briefing

The questions your CISO and your VP of Engineering will both ask.

Answered here so the briefing can go to your actual constraints. Repository links resolve to the upstream Apache 2.0 source. Architecture references map to public documentation.

Does this replace our Confluent or Apache Kafka cluster?

No. The pattern is explicitly additive. kaf-mirror sits next to your production cluster as one more read-only consumer. KafScale is a parallel spine for the BDR copy. Production traffic, ksqlDB, Connect, Stream Governance, and every other feature you rely on continues to work exactly as before. The first deployment can be one mirrored topic in a single namespace; the production team is not required to change a single line of producer code.

How does point-in-time recovery actually work?

KafScale segment writes carry their commit timestamp in S3 object metadata. The kafscale-cli restore command takes a topic name and a target timestamp, lists segments under the topic prefix in S3, filters to those that closed before the target time, and exposes them as a recovered topic that standard Kafka consumers can read. The procedure works without a running source cluster, which is why it qualifies as a backup rather than a replica.

What happens during a long target-cluster outage?

kaf-mirror does not require a separate disk buffer. The internal franz-go producer has an in-memory buffer with idempotent retries for transient interruptions. If KafScale is unavailable for an extended period, the consumer applies backpressure and stops advancing the offset on the source cluster. The source cluster itself acts as the durable buffer through its existing retention. When KafScale returns, replication resumes from the last committed offset, no data lost. This is documented in the kaf-mirror Resilience section.

Can this run air-gapped?

Yes. kaf-mirror and KafScale both run without internet egress. Container images mirror to your registry. The only requirements are a reachable S3-compatible endpoint (MinIO, Ceph, or on-prem object store) and a reachable etcd ensemble. Both can sit inside your network boundary or inside a classified enclave. The data path makes no external calls.

What is the realistic ingestion ceiling?

A single kaf-mirror replication job handles up to roughly 250 MB/s sustained with zstd compression and a 50 MB fetch batch. Larger estates run multiple jobs in parallel, partitioned by topic prefix. KafScale itself scales horizontally with stateless broker pods, and S3 is effectively unbounded for the throughput tier most BDR workloads need. The bottleneck is almost always source-cluster egress, not the BDR side.

What does Scalytics provide on top of the open source?

Architecture review for BDR and DR strategy, hardening for regulated environments, DORA and NIS2 evidence packs mapped to your specific topics and retention, integration with Apache Wayang for federated analytics over the BDR data, and 24/7 enterprise support contracts. The repositories remain Apache 2.0. The product roadmap is independent of any specific support engagement.

Will offsets match between source and BDR?

For PITR and disaster recovery, exact offset preservation across vendors is not the design goal. kaf-mirror preserves order within a partition, preserves message keys, and uses idempotent production to keep duplicates rare. The recovered topic on KafScale has its own offset space, which is the expected behavior for a backup target. If your workload requires offset-preserving replication for active-active or hot failover, that is a different pattern with different tradeoffs and we will say so on the briefing call.

How long does a full pilot take?

Two weeks for a working pilot in a non-production environment, including kaf-mirror deployment, one KafScale cluster on your Kubernetes, three to five topics mirrored, and one restore-test report generated. Four to six weeks for a production rollout with compliance evidence aligned to your specific DORA or NIS2 control set. The architecture sprint covers both phases.

Next step

Replace the standby cluster with an immutable copy.

Forty-five minutes. Architecture review, BDR pattern mapped to your topology, kaf-mirror configuration for your specific source, compliance evidence pack for DORA or NIS2. Bring the failure scenario that keeps your on-call team awake.

▸ Book a BDR briefing ▸ Read the kaf-mirror source ▸ KafScale architecture

One spine.Every Kafka.Mirrored to S3.

One spine.Every Kafka.Mirrored to S3.

If you can delete prod with one command, your DR cluster deletes with it.

Where hot replication quietly fails you.

The accidental delete propagates.

Credentials get compromised.

Replication shows current state. Backup shows history.

Production untouched. BDR on S3 economics.

Connect any Kafka. The mirror config is six lines.

Confluent Cloud → KafScale BDR

Apache Kafka (self-managed) → KafScale BDR

Redpanda → KafScale BDR (regex namespace)

BDR insurance for the cost of an S3 bucket.

BDR today. Agent spine tomorrow.

Disaster recovery and compliance.

Same operator. Same bucket. Same credentials boundary.

Replay, retention, reasoning over history.

Where this pattern fits, and where others fit better.

Evidence regulators ask for. Built into the stack, not bolted on.

Backup and restoration of ICT systems

Cybersecurity risk-management measures

Red lines, in plain text.

What this pattern is not.

Sub-10ms latency failover

Confluent feature replacement

Active-active multi-region

The questions your CISO and your VP of Engineering will both ask.

Replace the standby cluster with an immutable copy.

One spine.
Every Kafka.
Mirrored to S3.

One spine.
Every Kafka.
Mirrored to S3.