Open Source Community

Your AI. On Your Terms.

Contribute to open source projects with established, vendor-neutral communities. Our engineers participate alongside maintainers and contributors from multiple organizations and independent developers globally. These projects accept contributions in code, documentation, issue reporting, and feature proposals from all community members.

Read more about our open source commitment, projects, research and knowledge in our open source blogs.

Open Source Projects & Communities

Federated Data & AI - Apache Wayang®

Apache Wayang lets you design data-analytics jobs once as JSON and have it executed everywhere. It decomposes the workflow into a single DAG and, at runtime, delegates each stage to the connected engine—Spark, Flink, JDBC/SQL, Kafka streams, or more—that can run it fastest. The result is consistently higher performance and lower cost without rewrites, glue scripts, or cloud lock-in, giving teams a straight path to production AI on any platform.

Scalytics Connect - Community Edition

Scalytics Community Edition puts a fully open-source, private-AI stack on your own metal in three simple commands. A polished admin UI, GPU monitor, and chat console sit on top of vLLM inference, vector search, and an OpenAI-compatible API, so every open-source model—from 7B to 70B—can be served, audited, and rate-limited with ease. No vendor lock-in, no DevOps yak-shaving—just enterprise-grade security and control under the Apache 2.0 license.

Open Source Projects we support

Apache Spark®

Apache Spark is an open-source distributed processing system used for big data workloads. It utilizes in-memory caching and optimized query execution for fast analytic queries against data of any size.

Apache Flink®

Apache Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. It can process unbounded and bounded data streams.

Apache Hadoop®

Apache Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models, uniting multiple data tools.

Apache Impala®

Apache Impala is an open-source, native analytic database for Apache Hadoop. It provides low latency and high concurrency for BI/analytic queries on Hadoop, which is not delivered by batch frameworks such as Apache Hive.

TensorFlow

TensorFlow is a platform for machine learning, TTF is made for federated TensorFlow. It supports distributed training, immediate model iteration and easy debugging with Keras, and much more.

PostgreSQL

PostgreSQL is a powerful, open-source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

Open Source: Apache Wayang®

"Apache Wayang is an API for big data cross-platform processing. It provides an abstraction over other platforms like Apache Spark and Apache Flink as well as a default built-in stream-based “platform”. The goal is to provide a consistent developer experience when writing code regardless of whether a light-weight or highly-scalable platform may eventually be required. Execution of the application is specified in a logical plan which is again platform agnostic. Wayang will transform the logical plan into a set of physical operators to be executed by specific underlying processing platforms."

Groovy and Data Science - JVM Advent (javaadvent.com)
“Wayang is a Java library typically used in Big Data applications. Incubator-wayang has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License, and it has low support. You can download it from GitHub.
In contrast to traditional data processing systems that provide one dedicated execution engine, Apache Wayang (incubating) is a cross-platform data processing system: Users can specify any data processing application using one of Wayang's APIs and then Wayang will choose the data processing platform(s), e.g., Postgres or Apache Spark, that best fits the application.”

Wayang is the first cross platform system (openweaver.com)
“Execution of the application is specified in a logical plan which is again platform-agnostic. Wayang will transform the logical plan into a set of physical operators to be executed by specific underlying processing platforms.Wayang selects which platform(s) will run our application. It has numerous capabilities whereby cost functions and load estimators can be used to influence and optimize how the application is run. For our simple example, it is enough to know that even though we specified Java or Spark as options, Wayang knows that for our small data set, the Java streams option is the way to go.

Using Groovy with Wayang and Spark
The future belongs to those who own data + AI. Own yours!
start your free trial