Scalytics Core
Apache Wayang ist das Herzstück unserer Produkte. Apache Wayang (Incubating) ist die einzige plattformübergreifende Open-Source-Datenverarbeitungs-Engine. Anwendungsentwickler spezifizieren Anwendungen mithilfe der API von Apache Wayang.
Scalytics Unique features:
- Der KI-basierte Optimierer wählt automatisch eine optimale Konfiguration von Klassendatenverarbeitungs-Frameworks wie Java Streams oder Apache Spark aus, auf denen Anwendungen ausgeführt werden.
- Blossom Core führt die Programmausführung durch. Es abstrahiert die verschiedenen plattformspezifischen APIs und koordiniert die plattformübergreifende Kommunikation.
- Anwendungen können auf mehreren Datenverarbeitungsplattformen ausgeführt werden, ohne den systemeigenen Code der zugrunde liegenden Plattformen zu ändern.
- Federated data processing: In-situ processing in different sites without moving raw data outside their origin.
- Build and execute cross platform machine learning pipelines in a unified way.
- NEW: Federated Machine Learning
- Federated analytics by integrating multiple platforms across silos
- Developers: Train ML models using federated learning in a platform agnostic way
- NEW: Supporting unsupervised learning (e.g., using K-means) and Stochastic Gradient Decent optimization technique for Federated Learning across supported data platforms
- NEW: Auditing compliance (who accessed what when) and training audits (basic)
Data sources:
- PostgresSQL
- Columnar Data Files (e.g., CSV, Iceberg, Parquet, ORC)
- SQlite (e.g. Mobiles, Embedded)
- Local file systems
- Distributed file systems (e.g., HDFS, S3)
- Apache Kafka
- NEW: Remote files over http(s)
- NEW: JDBC based data sources
Data Processing Platforms:
- Java 8 Streams
- Apache Spark / DataBricks
- Postgres
- SQLite
- Apache Flink / Confluent, Decodable
- NEW: Apache Kafka
- NEW: Tensorflow
- NEW: JDBC based platforms
Programming APIs
- Java
- Scala
- Basic SQL
- New: Python (limited support)
Runtime
- NEW: Actor-based runtime for building federated applications
Scalytics Studio: Simplifying Machine Learning Workflow Design
Scalytics Studio is a cloud-native, low-code extension to Scalytics Core, designed to streamline Machine Learning workflow design and enhance data management.
With an intuitive graphical user interface, Scalytics Studio enables you to:
- Connect and Query Seamlessly: Effortlessly connect to various data sources and perform local queries.
- Unify and Join Data: Combine data from multiple sources with ease for comprehensive analysis.
- Transform Data Intuitively: Perform complex data transformations and execute them on the platform of your choice.
Supported Data Sources:
- PostgresSQL
- Files on local or distributed filesystems (e.g., HDFS)
- NEW: Apache Kafka
- NEW: Files over http(s)
Supported Platforms
- Java 8 Streams
- Apache Spark / DataBricks
- NEW: Tensorflow
- NEW: Apache Kafka
Supported Data Transformations
- Map
- Filter
- Reduce
- GroupBy
- Join
- Union
- Cross
- Train (for ML pipelines)
- Predict (for ML pipelines)