Scalytics Connect Release Notes

Updates für neue Funktionen und Leistungsverbesserungen

Scalytics Connect v. 1.2.0 / November 2024

Scalytics Core

Apache Wayang ist das Herzstück unserer Produkte. Apache Wayang (Incubating) ist die einzige plattformübergreifende Open-Source-Datenverarbeitungs-Engine. Anwendungsentwickler spezifizieren Anwendungen mithilfe der API von Apache Wayang.

‍

Scalytics Unique features:

Der KI-basierte Optimierer wählt automatisch eine optimale Konfiguration von Klassendatenverarbeitungs-Frameworks wie Java Streams oder Apache Spark aus, auf denen Anwendungen ausgeführt werden.
Blossom Core führt die Programmausführung durch. Es abstrahiert die verschiedenen plattformspezifischen APIs und koordiniert die plattformübergreifende Kommunikation.
Anwendungen können auf mehreren Datenverarbeitungsplattformen ausgeführt werden, ohne den systemeigenen Code der zugrunde liegenden Plattformen zu ändern.
Federated data processing: In-situ processing in different sites without moving raw data outside their origin.
Build and execute cross platform machine learning pipelines in a unified way.
NEW: Federated Machine Learning
- Federated analytics by integrating multiple platforms across silos
- Developers: Train ML models using federated learning in a platform agnostic way
NEW: Supporting unsupervised learning (e.g., using K-means) and Stochastic Gradient Decent optimization technique for Federated Learning across supported data platforms
NEW: Auditing compliance (who accessed what when) and training audits (basic)

‍

Data sources:

PostgresSQL
Columnar Data Files (e.g., CSV, Iceberg, Parquet, ORC)
SQlite (e.g. Mobiles, Embedded)
Local file systems
Distributed file systems (e.g., HDFS, S3)
Apache Kafka
NEW: Remote files over http(s)
NEW: JDBC based data sources

‍

Data Processing Platforms:

Java 8 Streams
Apache Spark / DataBricks
Postgres
SQLite
Apache Flink / Confluent, Decodable
NEW: Apache Kafka
NEW: Tensorflow
NEW: JDBC based platforms

‍

Programming APIs

Java
Scala
Basic SQL
New: Python (limited support)

‍

Runtime

NEW: Actor-based runtime for building federated applications

‍

Scalytics Studio: Simplifying Machine Learning Workflow Design

Scalytics Studio is a cloud-native, low-code extension to Scalytics Core, designed to streamline Machine Learning workflow design and enhance data management.

With an intuitive graphical user interface, Scalytics Studio enables you to:

Connect and Query Seamlessly: Effortlessly connect to various data sources and perform local queries.
Unify and Join Data: Combine data from multiple sources with ease for comprehensive analysis.
Transform Data Intuitively: Perform complex data transformations and execute them on the platform of your choice.

‍

Supported Data Sources:

PostgresSQL
Files on local or distributed filesystems (e.g., HDFS)
NEW: Apache Kafka
NEW: Files over http(s)

‍

Supported Platforms

Java 8 Streams
Apache Spark / DataBricks
NEW: Tensorflow
NEW: Apache Kafka

‍

Supported Data Transformations

Map
Filter
Reduce
GroupBy
Join
Union
Cross
Train (for ML pipelines)
Predict (for ML pipelines)