Scalytics is modernizing how organizations process, govern, and operationalize data in regulated and distributed environments. Our platform, Scalytics Federated, enables analytics, machine learning, and AI to run directly where data resides. Instead of moving, copying, or centralizing information, computation is executed across existing systems and infrastructures. This helps organizations reduce complexity, strengthen compliance, and unlock value from data that traditionally remained siloed.
We are the team behind Scalytics: Zoi, Alexander, Kaustubh, and Mirko.
We are the team behind Scalytics:
Zoi, Alexander, Mirko, and Kaustubh.

A few words about us, our story, and why we are here:
Zoi Kaoudi
Zoi leads our research vision in distributed systems, machine learning, and federated data processing. She is an Associate Professor at the IT University of Copenhagen and the principal architect behind Apache Wayang, the first cross-platform data processing system. Her work on cross-platform optimization introduced concepts that shaped modern data fabrics and continue to influence federated AI research.
Alexander Alten
Alexander contributes more than 15 years of experience in big data, distributed systems, AI, and IoT. His background spans regulated sectors including energy, finance, and healthcare, as well as roles with Cloudera and other data platform providers. Having worked across organizations where regulatory pressure, data fragmentation, and operational constraints collide, he brings the practical perspective that guides our product direction.
Kaustubh Beedkar
Kaustubh is a founding engineer and CTO. He earned his Ph.D. at the Max Planck Institute and the University of Mannheim, publishing research in top-tier venues. He built the federated SQL layer of Apache Wayang and leads the architectural development of Scalytics Federated. Kaustubh also teaches data management at the Indian Institute of Technology, Delhi.
Mirko Kämpf
Mirko contributes deep experience in distributed data, large-scale systems, and enterprise engineering through senior roles at Cloudera, Confluent, and ecolytiq. His expertise bridges open-source practice, data platform architecture, and real-world operational scaling.
In memory of Jorge
Jorge began exploring federated data processing and distributed AI in 2015. He co-developed early prototypes of what later became Apache Wayang and presented this work internationally. Jorge passed away unexpectedly in 2023. His contributions remain fundamental to how Scalytics Federated operates today.
Why We Started Scalytics
Scalytics emerged from a shared frustration with the limitations of traditional data architectures. Across industries we saw the same pattern: excessive data movement, costly ETL pipelines, duplicated systems, and growing regulatory pressure. Teams spent more time integrating platforms than analyzing data. Modern organizations were accumulating more technology but gaining less agility.
We built Apache Wayang to address these challenges programmatically. It introduced a clean abstraction between analytical logic and execution engines, enabling applications to run on Spark, Flink, Postgres, Java, or Python without rewriting code. This work became the technical foundation for Scalytics Federated.
Our goal was straightforward: make distributed data usable without forcing organizations into centralization strategies or vendor-locked platforms.
This motivation is reflected in the well-known complexity visualized by the Matt Turck Data & AI Landscape, which has expanded each year and highlights the increasing fragmentation across tools and architectures.

What We Built
Scalytics Federated brings together distributed data, heterogeneous processing systems, and modern AI into one execution and governance layer. It empowers teams to:
- run analytics and AI directly on operational data across different platforms
- minimize redundant ETL and avoid building new data silos
- apply consistent governance and compliance across distributed systems
- modernize existing architectures without replacing them
- accelerate development by abstracting applications from underlying processing engines
The system operates across data lakes, warehouses, operational stores, and edge platforms. By unifying them at the execution layer, organizations can modernize their data strategy without disrupting their infrastructure.
Our Vision
We're taking on the data market players who have purposefully created segregated products to lock clients into their single solutions, hindering data cooperation and complicating compliance with data rules. We've experienced the frustration, exhaustion, and anger when initiatives fail due to incompatibility, rising expenses, and reliance on limited technology. We know what it's like to feel pressure to address real-world data issues while no one is willing to step up. That's what drives us - the determination to revolutionize how we all work with data.

Data architectures have grown complex because the market rewarded siloed products and isolated ecosystems. This has made interoperability difficult and compliance harder. Our vision is to give organizations control over their data processing strategy by providing a neutral, federated, and extensible foundation.
We believe distributed, regulation-aligned processing is the future of enterprise AI. Scalytics Federated is designed to help organizations work with their data where it already is, unlock value without unnecessary movement, and support the next generation of decentralized AI systems.
Alexander, Zoi, Kaustubh, Mirko
About Scalytics
Scalytics Federated provides federated data processing across Spark, Flink, PostgreSQL, and cloud-native engines through a single abstraction layer. Our cost-based optimizer selects the right engine for each operation, reducing processing time while eliminating vendor lock-in.
Scalytics Copilot extends this foundation with private AI deployment: running LLMs, RAG pipelines, and ML workloads entirely within your security perimeter. Data stays where it lives. Models train where data resides. No extraction, no exposure, no third-party API dependencies.
For organizations in healthcare, finance, and government, this architecture isn't optional, it's how you deploy AI while remaining compliant with HIPAA, GDPR, and DORA.Explore our open-source foundation: Scalytics Community Edition
Questions? Reach us on Slack or schedule a conversation.
