Content area

Abstract

This study evaluates the performance and energy trade-offs of three popular data processing libraries—Pandas, PySpark, and Polars—applied to GreenNav, a CO2 emission prediction pipeline for urban traffic. GreenNav is an eco-friendly navigation app designed to predict CO2 emissions and determine low-carbon routes using a hybrid CNN-LSTM model integrated into a complete pipeline for the ingestion and processing of large, heterogeneous geospatial and road data. Our study quantifies the end-to-end execution time, cumulative CPU load, and maximum RAM consumption for each library when applied to the GreenNav pipeline; it then converts these metrics into energy consumption and CO2 equivalents. Experiments conducted on datasets ranging from 100 MB to 8 GB demonstrate that Polars in lazy mode offers substantial gains, reducing the processing time by a factor of more than twenty, memory consumption by about two-thirds, and energy consumption by about 60%, while maintaining the predictive accuracy of the model (R2 ≈ 0.91). These results clearly show that the careful selection of data processing libraries can reconcile high computing performance and environmental sustainability in large-scale machine learning applications.

Details

1009240
Business indexing term
Title
Optimizing Data Pipelines for Green AI: A Comparative Analysis of Pandas, Polars, and PySpark for CO2 Emission Prediction
Author
Youssef, Mekouar 1   VIAFID ORCID Logo  ; Lahmer Mohammed 2   VIAFID ORCID Logo  ; Karim, Mohammed 3   VIAFID ORCID Logo 

 Paragraphe Laboratory, Paris 8 University of Paris, Vincennes–Saint-Denis, 93200 Saint-Denis, France, Laboratory of Engineering, Modeling, and Systems Analysis (LIMAS), Faculty of Sciences, Sidi Mohamed Ben Abdellah University (USMBA), Fez 30000, Morocco; [email protected], ESISA ANALYTICA Laboratory (LEA), Department of Artificial Intelligence, School of Engineering in Applied Sciences (ESISA), Fez 30050, Morocco; [email protected] 
 ESISA ANALYTICA Laboratory (LEA), Department of Artificial Intelligence, School of Engineering in Applied Sciences (ESISA), Fez 30050, Morocco; [email protected], Department of Computer Engineering High School of Technology, Moulay Ismail University, Meknes 50050, Morocco 
 Laboratory of Engineering, Modeling, and Systems Analysis (LIMAS), Faculty of Sciences, Sidi Mohamed Ben Abdellah University (USMBA), Fez 30000, Morocco; [email protected] 
Publication title
Computers; Basel
Volume
14
Issue
8
First page
319
Number of pages
25
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
2073431X
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-08-07
Milestone dates
2025-06-12 (Received); 2025-08-01 (Accepted)
Publication history
 
 
   First posting date
07 Aug 2025
ProQuest document ID
3244001599
Document URL
https://www.proquest.com/scholarly-journals/optimizing-data-pipelines-green-ai-comparative/docview/3244001599/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-08-29
Database
ProQuest One Academic