Read more about our open source commitment, projects, research and knowledge in our open source blogs.
Apache Wayang lets you design data-analytics jobs once as JSON and have it executed everywhere. It decomposes the workflow into a single DAG and, at runtime, delegates each stage to the connected engine—Spark, Flink, JDBC/SQL, Kafka streams, or more—that can run it fastest. The result is consistently higher performance and lower cost without rewrites, glue scripts, or cloud lock-in, giving teams a straight path to production AI on any platform.
Scalytics Community Edition puts a fully open-source, private-AI stack on your own metal in three simple commands. A polished admin UI, GPU monitor, and chat console sit on top of vLLM inference, vector search, and an OpenAI-compatible API, so every open-source model—from 7B to 70B—can be served, audited, and rate-limited with ease. No vendor lock-in, no DevOps yak-shaving—just enterprise-grade security and control under the Apache 2.0 license.
Apache Spark is an open-source distributed processing system used for big data workloads. It utilizes in-memory caching and optimized query execution for fast analytic queries against data of any size.
Apache Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. It can process unbounded and bounded data streams.
Apache Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models, uniting multiple data tools.
Apache Impala is an open-source, native analytic database for Apache Hadoop. It provides low latency and high concurrency for BI/analytic queries on Hadoop, which is not delivered by batch frameworks such as Apache Hive.
TensorFlow is a platform for machine learning, TTF is made for federated TensorFlow. It supports distributed training, immediate model iteration and easy debugging with Keras, and much more.