Scalytics | Shift Left Architecture and Data Firewalls in Federated Computing

October 23, 2024

In Part 1 of this series we outlined why centralized architectures and data lakes fail to support modern AI workloads. The core problem is simple. Many organizations will not allow sensitive data to leave their secure network or jurisdiction. This constraint exists in finance, energy, healthcare, and public sector environments and defines the boundaries where any real data strategy must operate.

Cloud providers attempt to solve this by placing services inside private networks, yet these solutions still encourage data movement. Data is transferred between silos, duplicated into analysis systems, and pushed into reporting environments. The security context is lost and governance becomes harder rather than easier.

A different architecture is required. This is where the shift left paradigm and data firewalls enter the picture.

‍

Many organizations absolutely do not want their sensitive data to leave their secure network environment.

‍

Shift Left for Data: Bring Algorithms to the Data

Why shift left matters

The shift left paradigm reverses the traditional ETL model. Instead of moving data to computation, computation is placed where the data already lives. This can occur inside:

Operational databases
Storage clusters
Streaming systems
On premise transactional systems
Regulated environments and sovereign data zones

‍

Only the intermediate or aggregated information leaves the source. Sensitive raw data remains inside its secure zone.

This aligns with the emerging model of data products, where each product has its own governance context, security requirements, and jurisdictional boundaries.

‍

Data + AI Fabric with Federated Learning

‍

Federated execution across data products

In this federated architecture, each data product is treated as a zone. Scalytics Federated creates a bridge between these zones without violating the governance boundaries of any of them. Computation happens locally and results are combined in a controlled way.

‍

The Role of the Data Firewall

A technical boundary, not a metaphor

The Scalytics Federated runtime deploys edge nodes inside the organization’s secure network. These nodes access local operational systems such as SAP, Oracle, Salesforce, or industry specific transactional systems. A data plane is established across zones using open protocols like HTTPS, MQTT, or the Kafka protocol.

The critical rule is simple. Only information that is allowed to be shared across zones enters the data plane. Sensitive data never leaves the secure zone where it originates.

How the data firewall behaves

The data firewall acts as a controlled membrane. It permits approved processing requests to enter the zone and blocks everything else. It permits algorithm fragments or model components to execute locally but prevents raw data from leaving the zone.

This ensures:

No uncontrolled copies
No cross region transfers
No bypass of governance rules
No loss of sovereignty
Full auditability of every operation

Monitoring through the processing context

The firewall enforces processing context. This allows organizations to answer key questions at any moment.

What data was used
For what purpose
Under which conditions
By which algorithm
With which authorization

This capability is implemented through an open source API built on Apache Wayang, which provides a uniform execution abstraction across all zones.

‍

A Turn-Key Solution for Data Management

Compliance ready by design

Scalytics Federated is delivered as a turn key solution. The data network is deployed inside the customer’s infrastructure. Compliance status is assessed during setup and all rules are made transparent and auditable from the first day.

Data ownership and sovereignty remain with the organization. Rules defined inside the data firewall are:

Enforced
Auditable
Visible at all times

This reduces operational risk and strengthens accountability.

Decentralized data centered collaboration

This architecture defines a new model called decentralized data centered collaboration. It removes unnecessary data movement and eliminates fragile integration pipelines. Each zone retains autonomy while still participating in collaborative analytics and AI.

Shared data planes, secured by local firewalls, allow the organization to:

Train AI models on distributed data
Execute analytics across secured zones
Support agent systems without exposing sensitive data
Maintain full compliance with GDPR, the EU Data Act, and the EU AI Act

‍

Transparency and Governance in Federated Environments

Compliance requires visibility

Regulated industries must know:

Who used which data
For what purpose
Under which conditions
With what results

Scalytics Federated provides continuous visibility into the compliance status of each data operation across all federated zones. This is essential for GDPR and for the new obligations introduced by the EU Data Act and the EU AI Act.

‍

Summary

Scalytics Federated provides a standardized approach to secure and collaborative data use. It replaces fragile data movement with local computation, reduces risk, and allows organizations to operate within strict governance and sovereignty boundaries.

AI systems, agents, and analytics can now run inside secure zones without compromising sensitive data. Data sovereignty is maintained, unnecessary copies disappear, and collaborative insight becomes possible across the entire enterprise.

‍

Recommended Next Steps for Our Customers

‍

Evaluate your current data architecture
Identify zones where shift left execution can replace ETL pipelines.
Establish federated data collaboration zones
Use Scalytics Federated to create secure computation layers across existing infrastructure.
Deploy a data firewall
Define usage rules, enforce governance, and create transparent auditability.
Train your teams on decentralized data centered collaboration
Build internal capability to use federated data processing execution effectively.
Adopt a compliance first culture
Prepare for GDPR, the EU Data Act, and the EU AI Act with architecture that supports continuous oversight.

About Scalytics

Scalytics architects mission-critical streaming, federated execution, and sovereign AI systems. We help defense, infrastructure, and regulated organizations turn real-time data streams into trusted decisions reliably and under production load.
Our founding team created Apache Wayang, the federated execution framework that lets computation run where the data lives and dramatically reduces unnecessary data movement.
We also built and maintain kafSCALE, a high-performance, Kafka-compatible streaming platform designed for Kubernetes and object storage. It delivers elastic scale without broker complexity or lock-in.

‍Our mission: Keep data in place. Bring compute to the data. Enable secure, sovereign, and production-ready AI operations.