The Problem: Utility Data Is Everywhere — and Can't Move
Utilities operate across dozens of disconnected data systems. SCADA historians sit air-gapped in control centers. Smart meter data flows into cloud platforms. GIS, asset management, and market systems each have their own storage. Remote substations and DER assets generate telemetry that never leaves the edge. Field crews collect inspection data on tablets that sync intermittently.
Traditional data lake approaches demand consolidation: extract everything, transform it, load it into a central repository. For utilities, this model breaks down:
- Regulatory constraints — NERC CIP, GDPR, and sector-specific rules often prohibit moving operational data
- Network isolation — OT systems are air-gapped by design; IT/OT convergence doesn't mean data convergence
- Latency requirements — grid operations need millisecond decisions, not overnight batch loads
- Field data gaps — mobile workforce data lives in disconnected systems, unavailable for analytics until manually reconciled
- Cost and complexity — migrating decades of historian data is a multi-year project that rarely delivers ROI
The result: analytics teams build brittle point-to-point integrations, shadow IT proliferates, ML models train on stale snapshots, and AI initiatives stall waiting for "clean" data that never arrives.
The Solution: A Virtual Data Lake That Queries Data in Place
Scalytics Copilot creates a virtual data lake — a unified query layer across distributed utility systems without moving or copying data.
Your data stays where it is: in the historian, the SCADA network, the cloud platform, the edge device, the field crew's tablet. Scalytics Copilot federates queries across these sources, handles transformations in-flight, and returns unified results. Compliance boundaries remain intact. Network isolation is preserved. Analytics and ML training run where the data lives.
This isn't abstraction for its own sake. The architecture reflects lessons from real utility transformation programs — including E.ON's migration from legacy Hadoop to cloud-native platforms, where centralized approaches repeatedly failed before federated methods proved viable.
→ How federated data processing works — the technology behind Scalytics
Architecture Overview: The Scalytics Federated Layer
Utilities no longer need to risk compliance or latency by centralizing data. Scalytics Copilot establishes a Virtual Data Lake—a unified, secure layer that integrates data in place across your entire energy infrastructure.
The architecture is built to mirror utility reality:
- Data Sources Remain Secure: Whether the data sits air-gapped On-premise (like SCADA, PI Historian, GIS, and Asset Databases), within Cloud environments (Databricks, Snowflake, IoT Platforms), on the Edge (Substations, DER, Smart Meters), or on Mobile devices (Field Crews, Work Orders), it is accessed directly.
- Central Federation: The Scalytics Copilot layer sits in the middle, handling all query translation, execution, and security governance.
- Computation Moves, Data Stays: Arrows on the diagram illustrate the critical difference: Queries are federated and ML models are trained across these distributed sources. Only unified results and aggregated model gradients move, ensuring that sensitive operational data never leaves its secure domain.
This design delivers high-accuracy grid intelligence without the complexity, cost, or regulatory risk of traditional data migration projects.
How It Works
1. Connect — not migrate Scalytics Copilot connects to your existing systems: PI historians, OSIsoft, cloud data warehouses, edge databases, mobile workforce platforms, and streaming systems. No data extraction. No schema redesign.
2. Query as one Write a single query or ML pipeline. Scalytics Copilot determines optimal execution across sources — pushing computation to the data rather than pulling data to computation.
3. Train models on federated data ML models access distributed datasets without centralization. Training runs across sources while raw data never leaves its secure environment — only model updates aggregate.
4. Govern consistently Access controls, encryption, and audit trails apply uniformly across the virtual data lake. Compliance is enforced at the federation layer.
5. Scale incrementally Start with one use case — outage prediction, demand forecasting, asset health, crew optimization. Add data sources and workloads without rearchitecting.
Utility Use Cases
Grid Analytics Across IT and OT
Combine SCADA telemetry, weather data, and market signals for real-time grid optimization — without bridging air-gapped networks.
→ See Smart Grid Intelligence for streaming grid use cases
Asset Performance Management
Federate maintenance records, sensor streams, and inspection reports to predict equipment failures before they cascade.
Regulatory Reporting
Query distributed systems for compliance reporting without staging data in intermediate warehouses.
Demand-Side Management
Integrate smart meter data, CRM systems, and weather feeds to model flexible load segments — data stays distributed, insights are unified.
Utility AI/ML Operations
Traditional MLOps assumes centralized data. Utility reality is different: training data lives in SCADA historians, asset databases, weather feeds, and field systems — often subject to strict residency requirements.
Scalytics Copilot enables federated ML operations:
Train on distributed data Models access data across sources without extraction. A transformer health model can train on historian sensor data, maintenance records from the asset system, and inspection notes from field crews — simultaneously, without moving bytes.
Privacy-preserving training Federated learning techniques allow model training across organizational or regulatory boundaries. Only model gradients move; raw operational data stays in place.
Consistent feature pipelines Feature engineering runs identically across development and production environments, regardless of where source data resides. No "works on my laptop" surprises.
Reproducible experiments Track which data sources, versions, and transformations contributed to each model. Audit trails span the entire virtual data lake.
→ Deep dive: Federated Learning for enterprise AI
Fleet & Mobile Workforce Integration
Field crews generate critical data — inspection reports, equipment photos, GPS traces, work order updates — but this information typically lives in disconnected mobile platforms, syncing hours or days later.
Scalytics Copilot brings field data into the virtual data lake:
Real-time crew visibility Query work order status, crew locations, and job progress alongside grid state — enabling dynamic dispatch based on actual conditions.
Inspection data for predictive models Field observations become training data for asset health models without manual export/import cycles. A technician's photo of corroded equipment feeds directly into failure prediction.
Optimize routing and scheduling Combine outage predictions, asset priority scores, weather forecasts, and crew availability into unified optimization — data from five systems, one query.
Close the loop When a model predicts equipment failure, automatically generate work orders routed to the nearest qualified crew. Federated data makes closed-loop operations possible.
Why Utilities Choose Scalytics
Built by utility veterans Our founding team led digital transformation at E.ON — architecting IoT platforms and cloud-native data infrastructure for connected energy assets. That hands-on experience shaped every design decision in Scalytics Copilot.
No rip-and-replace Connect to existing historians, SCADA systems, workforce management platforms, and cloud lakes. Your infrastructure stays; analytics capabilities expand.
Compliance by architecture Data residency, access controls, and audit requirements are enforced at the federation layer — not bolted on after the fact.
Transparent execution See exactly which systems are queried, what transformations run where, and how results are assembled. No black-box magic.
→ Connect your Databricks lakehouse to on-prem utility systems
Get Started
Most utilities start with a single high-value use case — grid analytics, asset health, or workforce optimization — and expand from there.
