Are you still migrating or are you already training your AI?

Dr. Mirko Kaempf

Gartner's 2024 AI Hype Cycle: A Glimpse into the Future

According to the 2024 AI Hype Cycle published by Gartner, it is clear that AI engineering, ModelOps/DataOps, especially the provision of synthetic data alongside real data, will become increasingly relevant in the coming years. The report mentions AI-Ready Data, which is now ahead of the hype cycle wave. Cloud AI services are passing through the trough of disillusionment. These observations highlight the importance of data management and data architecture, especially in future investment cycles. 

Should we continue to replace existing database systems, or should we adopt a different data platform technology that promises the end of our data silos?

Scalytics allows you to integrate all your data on the three abstraction levels. You integrate the data you need, without moving it around. No data movement and no data migration, but rather easy access to the valuable bits and bytes. This drives your digital evolution towards data sovereignty and responsive AI.

 

The Evolution of Data Platforms: From Hadoop to Cloud Data Lakes

When Apache Hadoop started to revolutionize the way we deal with data, we could observe a mindset shift within many people and organizations. Since then, a lot of improvements have given us an endless set of tools for handling data in every flavor, in every domain, but still, there is no such thing like an ideal data platform. Again, this leads to data silos. Although from a tech point of view the data could already be integrated much smoother than it is done today. 

Mental Boundaries in Data Management

It seems that the boundaries are no longer technical limitations, or the limitations in physics (which is still true for some edge-cases). Today, the majority of data related applications have to cope with mental boundaries, regulations, and the impact of organizational structures.

The Importance of Data in Modern Business

It is more important than ever to bring the right data to the right place at the right time. But also, the mandate to process the data must be in the right hands, otherwise, the technological advantages don’t pay off. It is so important to bring the capabilities of the recently invented technologies together with existing data and emerging data streams. In many organizations data is derived from business workflows in such a way that data processing becomes an efficient support process for the business. Only if the collected data can be used in business decisions it is of real value. Otherwise, it costs just money and gives nothing back.

Building a scalable data infrastructure for a scalable business around technical limitations makes sense only as long as the selected technology is fresh and seen as an innovation trigger. But machine learning, data science, and AI are marching towards the plateau of productivity. This will change the way we handle model training, and automated data products. 

With Apache Hadoop we have seen an innovation entering and changing the data industry. The idea of scaling out individual servers, using multiple computers within a cluster in a flexible elastic approach, and especially combining storage and processing inside a worker node gave us data locality and very high throughput for specific workloads. Data warehouses were migrated into Hadoop clusters, and those were migrated into the clouds a bit later. The cloud data lakes became cloud data warehouses and offered a bunch of new data analytics capabilities. Questions such as: “Which cloud provider? Which region? Which technology stack?” were important and influenced how companies invested their data and analytics budgets. 

Cloud independence was another request from many customers who started new data initiatives. The idea of owning valuable data assets emerged, but the value of the data is ignored as long as the data is not vividly used within the organization or even in a broader context. 

Data products and the data mesh, together with decentral data infrastructures have been proposed, and finally we have reached a point, where we all think: data is so valuable, and we all want to use these assets, which are at our fingertips within our businesses. Let's be honest! Is it really so easy to use the existing data in a new production ready business scenario quickly without a huge project setup and a strong budget?

From a technology point of view we can clearly say: Yes! Data storage end processing capabilities are cheap, dynamically scalable, and using all the automation offered by SaaS providers makes it easy to use data analytics solutions without the demand for deep technical skills in any company. A business person with process understanding can get huge benefits from self-service analytics platforms. 

But wait! Self-service analytics (SSA) is not yet an AI application, and thus not part of the AI hype. SSA can be supported by AI, but this kind of platform feature is not in my scope for this article. 

I want to address the usage of business data for more than ad-hoc analytics and reporting. 

We are looking forward to the emergence of agents supporting various lines of business. This starts with expert systems, provided by third parties, and goes on towards internally managed multi agent systems, which are trained by a company’s own data and expert knowledge. Besides the continuously growing impact of GenAI on UI/UX we can clearly see the topic of decision automation entering the stage. And even if the final decision stays in a human’s hand which signs off, together with the responsibility for that decision, our technology can support this person or a group of decision makers by providing context, in a tangible way, so that human decisions become a result of deep insights, extracted from systems which use deep learning.

We should not get lost in the question: Who can finally decide in a particular situation, but rather we should ask: How can we support the decision maker efficiently, using data together with learning algorithms which go way beyond the capabilities of us, the humans? 

Federated Learning and EdgeAI: The Future of Data Integration

Many of today's AI gadgets utilize centrally managed large language models. Creation of in-house language models is possible but there is a trend towards small language models for special use cases. SLMs run on highly efficient hardware with minimized resource and energy consumption. EdgeAI is the trend which is also mentioned in the Gartner report. 

Processing data on the edge gives results faster without the overhead of moving raw data around. Data movements are critical for two reasons: there is the technical and the resource overhead of moving data around. This costs time, money and energy and besides this the data usage policies often do not allow bringing data into places where it can be aggregated and merged into other contexts outside the control of the data owner. 

The idea of widely applied fine tuning of your own LLMs or the training of small language models can be seen as the next step in our digital evolution. Fine-tuned models or collaboratively trained models allow us to reach the next level of enterprise data usage. Using ad-hoc analysis plays a crucial role during investigation of new ideas, during problem analysis and during the design of new algorithms for new business workflows. 

One of the most important challenges is this: In order to support automated AI solutions we must become able to adopt new models within our existing data environments quickly and without all the migration pain. 

With Scalytics, you can do this by just using existing data in platforms such as your favorite DWH, data lakes, traditional RDBMS, or key-value stores, document stores, or more specialized systems (graph DB, time series DB). With clever integration protocols one can build and operate flexible distributed feature stores. Such a virtual feature store can become the unified but decentralized data source for ML/DS algorithms and for Intelligent Applications as mentioned in the Gartner report. E.g., one can think of transfer learning for computer vision models on your own image as an example use case. Using SLMs and fine tuning in the consequent next step towards capturing business, market, and process knowledge. 

From now on, every company needs a robust data provisioning solution in order to manage all the automations and all their intelligent systems, and even if you do not plan to operate complex AI-infrastructure, the existing AI services can only be of an advantage for you, as long as the training data can be provided in a reliable, secure and efficient way. 

Starting each AI initiative with a huge data migration project has not been proved as an easily applicable approach until now.

Hence, putting your data into the right scope, such as a data product, with a well-defined owner and purpose, you can increase the value you get out of that portion of the data. This step overcomes the limitations of splitting responsibility between business- and tech-people and opens the door for cross department and cross-organizational usage of the data.

Leveraging today's cloud technology and the widely used abstractions on multiple levels, such as containers, storage notes, and SQL engines comes at a particular cost. The integration path can be really bumpy. Moving files on such a bumpy road is not a good idea, hence we argue again towards data locality, but now on a different level. We emphasize data sovereignty on top of technical data locality and decentralized data with clear separation between storage and processing capabilities and capacities. 

Summary

While the Apache Hadoop ecosystem including Apache Spark and many other processing engines solved the technical scalability issues of many analytics use cases we are entering the organizational level now. There is no need to have all data in one central system, no need to use the same technology in every department across organizations, but rather an integration layer, which gives you data federation for advanced analytics and model training. This will become an important driver for the interconnected digital businesses using data, automation and AI. Federated Learning and Federated Analytics will be the technology which helps you to implement smooth data integration layers for services and data products with low and drastically reduced data movements for future proof data products and services within your organization and for your customers.

TL;DR

The 2024 AI Hype Cycle by Gartner highlights the increasing relevance of AI engineering, ModelOps/DataOps, and AI-Ready Data. The report emphasizes the importance of data management and architecture in future investment cycles. Apache Hadoop revolutionized data handling, but there is no ideal data platform, leading to data silos again, on a different level. It is still crucial to bring the right data to the right place at the right time and process it in the right hands. Machine learning, data science, and AI are marching towards productivity plateaus, changing how we handle model training and automated data products.

About Scalytics

Legacy data infrastructure cannot keep pace with the speed and complexity of modern artificial intelligence initiatives. Data silos stifle innovation, slow down insights, and create scalability bottlenecks that hinder your organization’s growth. Scalytics Connect, the next-generation Federated Learning Framework, addresses these challenges head-on.
Experience seamless integration across diverse data sources, enabling true AI scalability and removing the roadblocks that obstruct your machine learning data compliance and data privacy solutions for AI. Break free from the limitations of the past and accelerate innovation with Scalytics Connect, paving the way for a distributed computing framework that empowers your data-driven strategies.

Apache Wayang: The Leading Java-Based Federated Learning Framework
Scalytics is powered by Apache Wayang, and we're proud to support the project. You can check out their public GitHub repo right here. If you're enjoying our software, show your love and support - a star ⭐ would mean a lot!

If you need professional support from our team of industry leading experts, you can always reach out to us via Slack or Email.
back to all articlesFollow us on Google News
Unlock Faster ML & AI
Free White Papers. Learn how Scalytics streamlines data pipelines, empowering businesses to achieve rapid AI success.

Ready to become an AI-driven leader?

Launch your data + AI transformation.

Thank you! Our team will get in touch soon.
Oops! Something went wrong while submitting the form.