Reducing the Carbon Impact of AI and ML

October 10, 2023
-
Vatsal Shah
-

With climate change accelerating, artificial intelligence must shift focus towards energy efficiency and sustainability. Traditional centralized training consumes massive amounts of power. For example, training a single natural language model can equal five lifetime's worth of emissions for an average car [1].  An alternative approach like federated learning, which distributes model training across devices, can significantly reduce the carbon footprint.

The Carbon Crisis in AI

In 2019, data centers for AI training consumed around 200 terawatt-hours of electricity, accounting for 0.3% of global CO2 emissions. Since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.4-month doubling time (by comparison, Moore’s Law had a 2-year doubling period). Since 2012, this metric has grown by more than 300,000x (a 2-year doubling period would yield only a 7x increase) [2]. Much of this energy comes from power-hungry GPUs in data centers. For example, an NVIDIA V100 GPU for AI workloads can draw up to 300 watts. And cooling these data centers can consume 40% extra power [3].

Privacy and Efficiency Benefits of On-Device Training

Federated learning provides a more efficient distributed approach by training models directly on end user devices like phones without transmitting sensitive raw data to centralized servers. Apple, for example, did extensive research and released his findings in 2022 in the paper “Federated Evaluation and Tuning for On-Device Personalization: System Design & Applications”, and Samsung discussed federated learning used in their mobile OS in their blog post "Advancing Privacy Preserving Techniques for Machine Learning”.

Researchers point out that this keeps personal user data localized and private while still leveraging the collective computational power of millions of devices for collaborative training. Centralized data centers require energy-intensive hardware like GPUs and extensive cooling systems. In contrast, on-device training distributes the workload across consumer devices with lower individual power demands.

Though smartphones and tablets have less processing power than data center machines, they are more efficient due to not needing substantial cooling infrastructure. Distributing model training to the network edge where data originates provides both privacy protections and energy efficiency improvements compared to centralized approaches.

Measurable Carbon Reductions

The researchers at Cambridge University and collaborators recently published a paper titled ‘Can Federated Learning Save the Planet?’ where they conducted the first systematic study on the carbon footprint of federated learning. [4] They measured emissions from federated learning by training models for image classification and speech recognition using a server and chipsets typical of smaller devices used for on-device training. They recorded the full energy consumption and calculated how emissions vary based on location. Their analysis found federated learning was reliably ‘cleaner’ than centralized training under many common scenarios. For image classification, federated learning in France emitted less CO2 than any centralized setup in China or the US. It was also more efficient for speech recognition in all countries.

These results were further validated by a follow-up study exploring more diverse datasets and models. The researchers also developed the first ‘Federated Learning Carbon Calculator’ to estimate emissions based on devices, datasets, speeds, locations etc. [5] This provides a methodology for quantifying and reducing the carbon impact of federated systems. Overall, their rigorous research methodology demonstrates the substantial carbon savings possible with distributed on-device training compared to traditional centralized data center approaches.

Building Sustainable AI with Blossom Sky

The power of Blossom Sky as a platform for starting and growing federated learning is unrivaled, especially when it comes to cutting down on AI's carbon footprint. We're at a time when we need sustainable AI training more than ever. That's why they've designed our platform to support decentralized training, using edge devices to reduce energy consumption that's typically seen in centralized data centers, resulting in a dramatic drop in CO2 emissions. Federated learning’s built-in ability to perform calculations on local devices combined with the robustness of Blossom Sky means that data doesn’t have to constantly travel across the network. Not only does this save energy, but it also simplifies the learning process while preserving both the effectiveness and speed of the AI training.

When federated learning – as enabled by Blossom Sky – is combined with renewable energy, the eco-friendly nature of AI development is dramatically enhanced. In addition, in today’s digital world where privacy is a top priority, Blossom Sky’s decentralized approach is in line with data privacy norms because data remains at source and not centralized, minimizing the risk of data breaches and unauthorized access.

Beyond efficiency and data protection, Blossom Sky is a combination of ethics, ecology, and technology that makes it the go-to platform for those truly investing in responsible AI innovation. The platform won’t just deliver on federated learning’s promise in future, it enriches it, providing an efficient, eco-friendly and future-proof solution for tomorrow’s challenges.

References

[1] Karen Hao, Training a single ai model can emit as much carbon as five cars in their lifetimes, MIT Technology Review, 2019
[2] Dario Amodei and Danny Hernandez, AI and Compute, 2018.
[3] V100 Specs
[4] Xinchi Qiu, Titouan Parcollet, Daniel J. Beutel, Taner Topal, Akhil Mathur, Nicholas D. Lane: “Can Federated Learning Save The Planet?”, 2020; arXiv:2010.06537
[5] Federated Learning Carbon Calculator

About Scalytics

Most current ETL solutions hinder AI innovation due to their increasing complexity, lack of speed, lack of intelligence, lack of platform integration, and scalability limitations. Scalytics Connect, the next-generation ETL platform, unleashes your potential by enabling efficient data platform integration, intelligent data pipelines, unmatched data processing speed, and real-time data transformation.

We enable you to make data-driven decisions in minutes, not days
Scalytics Connect delivers unmatched flexibility, seamless integration with all your AI and data tools, and an easy-to-use platform that frees you to focus on building high-performance data architectures to fuel your AI innovation.
Scalytics is powered by Apache Wayang, and we're proud to support the project. You can check out their public GitHub repo right here. If you're enjoying our software, show your love and support - a star ⭐ would mean a lot!

If you need professional support from our team of industry leading experts, you can always reach out to us via Slack or Email.

Get started with Scalytics Connect today

Thank you! Our team will get in touch soon.
Oops! Something went wrong while submitting the form.