Reinforcement learning-enhanced multi-objective optimization for sustainable coal blending in thermal power plants

Abstract

Coal blending in thermal power plants is a complex multi-objective challenge involving economic, operational and environmental considerations. This study presents a Q-learning-enhanced NSGA-II (QLNSGA-II) algorithm that integrates the adaptive policy optimization of Q-learning with the elitist selection of NSGA-II to dynamically adjust crossover and mutation rates based on real-time performance metrics. A physics-based objective function takes into account the thermodynamics of ash fusion and the kinetics of pollutant emission, ensuring compliance with combustion efficiency and NO_x limits. Benchmark tests on the Walking Fish Group (WFG) and Unconstrained Function (UF) suites show that QLNSGA-II achieves a 12.7% improvement in Inverted Generational Distance (IGD) and a 9.3% improvement in Hypervolume (HV) compared to prevailing algorithms. Industrial validation at the Huaneng Yingkou power plant confirms a 14.7% reduction in fuel cost and a 41% reduction in slagging incidence over conventional blending methods, backed by 12 months of operational data. Other benefits include a 24.8% reduction in sulphur content, a 6.9% increase in the plant’s net heat rate and annual savings of RMB 12.3 million, 2,150 tonnes of limestone and 38,500 tonnes of CO₂-equivalent emissions. These results highlight QLNSGA-II as a scalable, robust solution for multi-objective coal blending, offering a promising way to improve the efficiency and sustainability of coal-fired power generation.

Full text

Translate

Turn on search term navigation

1 Introduction

Thermal power generation remains a cornerstone of global energy systems, accounting for approximately 36% of worldwide electricity production[1,2], with coal-fired plants contributing over 70% of this share in coal-dependent economies such as China and India [3–5]. The practice of coal blending emerged in the late 1970s as a strategic response to declining coal quality and supply volatility, initially focusing on empirical mixtures to stabilize boiler operations [6–8]. Over the decades, this practice has evolved into a sophisticated optimization challenge, driven by the need to balance conflicting requirements: minimizing fuel costs, adhering to stringent emission regulations (e.g., China’s Ultra-Low Emission standards), and maintaining combustion stability across diverse coal properties[9,10]. Modern blending frameworks now incorporate computational models to address these multidimensional constraints, marking a shift from heuristic-based approaches to data-driven decision making [11–13].

Contemporary research in coal blending optimization predominantly focuses on algorithmic advancements to manage its inherent complexity—a high-dimensional problem space with 5–7 interdependent variables (e.g., calorific value, ash fusion characteristics) and nonlinear objective relationships [14]. As summarized in Table 1, existing methodologies span from traditional mathematical programming to hybrid artificial intelligence techniques. Linear programming models, while computationally efficient, oversimplify combustion dynamics and ash interaction effects [15]. Evolutionary algorithms like NSGA-II excel in multi-objective optimization but are limited in adaptability to real-time operational shifts [16]. Particle swarm optimization (PSO) variants demonstrate rapid convergence yet often stagnate in local optima when handling more than four objectives [17]. Recent hybrid approaches combining neural networks with metaheuristics [18] show promise in predictive modeling, but require extensive training data and lack interpretability for plant operators. A critical challenge remains in dynamically balancing exploration and exploitation during optimization, particularly in fluctuating coal markets and regulatory environments [2,19].

[Figure omitted. See PDF.]

Over the past decade, the field of multi-objective optimization algorithms (MOOAs) has progressed rapidly [27–29], driven by the practical necessity to solve increasingly high-dimensional and complex engineering problems [30,31]. A diverse array of recent advances—including adaptive predator–prey frameworks, weighted average mechanisms, and archive-boosted strategies—has markedly improved the convergence and diversity of Pareto solutions across benchmark and real-world scenarios [32–34]. Furthermore, novel algorithmic designs such as the Multi-objective Runge–Kutta Optimizer (MORKO) [35], Many-Objective Multi-Verse Optimizer (MaOMVO) [36], and nature-inspired variants like MaODA and MaOGOA have significantly enhanced scalability and robustness, particularly for problems characterized by numerous conflicting objectives and practical engineering constraints [37,38].

Despite these substantial methodological gains, current literature continues to grapple with several unresolved issues that directly affect application to domains such as coal blending optimization. Notably, the challenge of balancing solution diversity with convergence efficiency in high-dimensional search spaces remains at the forefront of research. Additionally, the integration of real-time, dynamic operational data into optimization models is still an open problem, as most algorithms assume static environments that may not reflect industrial realities . Another persistent limitation lies in the generalizability of these frameworks across heterogeneous system configurations and varying material properties. Although reliability-driven approaches and swarm intelligence-based methods continue to evolve, their adoption for domain-specific objectives—such as the simultaneous minimization of fuel cost and pollutant emissions in power generation—has yet to be fully realized.

Coal blending optimization holds both academic and industrial significance due to its broad impact on economic viability, environmental compliance, and operational reliability [9,39]. Despite steady methodological progress, the literature continues to highlight three core challenges: achieving a balance between solution diversity and convergence speed in high-dimensional objective spaces [40]; integrating real-time market and operational data into predominantly static optimization models; and generalizing algorithmic frameworks to accommodate heterogeneous boiler configurations and varying coal properties. Against this backdrop, reinforcement learning (RL) has emerged as a promising approach for introducing adaptive, data-driven decision-making into multi-objective optimization, particularly by overcoming the rigidity of static models. However, its practical application in coal blending, especially in synergy with population-based evolutionary algorithms to enable real-time policy updates and dynamic constraint handling, remains largely unexplored [41,42]. Bridging the gap between these algorithmic advances and domain-specific requirements is thus essential for addressing the complex, evolving optimization demands of modern power generation systems.

This study addresses these gaps through three principal contributions:

1. Adaptive Multi-Objective Framework: The proposed QLNSGA-II algorithm synergizes Q-learning’s policy optimization with NSGA-II’s elitist selection, enabling dynamic adjustment of crossover/mutation probabilities based on real-time solution quality metrics.

2. High-Dimensional Constraint Handling: A physics-informed objective function incorporates ash fusion thermodynamics and pollutant emission kinetics, resolving conflicts between combustion efficiency () and NO_x emissions ().

3. Industrial Validation: Implementation at Huaneng Yingkou Power Plant demonstrates a 14.7% cost reduction and a 41% lower slagging incidence compared to conventional blends, validated through 12-month operational data and emission monitoring.

The subsequent sections detail the mathematical formulation of coal blending optimization (Sect 2), QLNSGA-II’s algorithmic architecture (Sect 3), benchmark validation against WFG/UF test suites (Sect 4), and empirical results from full-scale plant trials (Sect 5). Concluding remarks outline future directions for RL-enhanced optimization in energy systems.

2 Coal blending optimization model

The multi-objective optimization model for coal blending addresses three critical aspects: economic viability (F_e), operational safety (F_s), and environmental compliance (F_p). The formulation integrates physicochemical coal properties with industrial constraints through additive blending principles [6,22]:

(1)

2.1 Objective functions

The economic objective F_e minimizes blending costs normalized to market prices:

(2)

The safety objective F_s combines deviations from target boiler parameters (Q_d, , M_d) and ash fusion risks:

(3)

The environmental objective F_p comprehensively evaluates the impacts from sulfur emissions (S_i), ash disposal (A_i), as well as key gaseous pollutants including NO_x and CO₂-equivalent emissions:

(4)

where, N_i and C_i denote the NO_x and CO₂-equivalent emission factors for each coal type, with and as their respective weights. This approach enables direct evaluation and minimization of atmospheric pollutants alongside traditional ash and SO_x indices, reflecting the increasing importance of multi-pollutant emission control in modern coal utilization.

2.2 Parameter calibration

Weight coefficients (θ, α, β, γ) were determined through iterative sensitivity analysis using historical plant data (2018–2023), prioritizing economic factors () while ensuring safety (, ) and environmental () constraints. The newly introduced NO_x and CO₂-eq weights (, ) were calibrated in line with site emission benchmarks and the latest regulatory targets. Boundary conditions (, etc.) align with China’s GB/T 15224.1-2021 coal standards and plant-specific boiler specifications.

2.3 Decision variables

The vector represents coal proportions, constrained by:

* Physicochemical limits: Q_i (calorific value), (volatiles), M_i (moisture), S_i (sulfur), ST_i (ash melting point), A_i (ash content) , N_i (NO_x factor), C_i (CO₂ factor)

* Operational thresholds: Q_d = 4525 kJ/kg, ,

* Environmental caps: , , , , as required

The model generates Pareto-optimal blends balancing cost (RMB/ton-MJ), slagging risk (ST deviation), and emission penalties (S/A indices, NO_x, CO₂-eq), validated through plant trials in Sect 5.

3 The proposed QLNSGA-II algorithm

3.1 Standard NSGA-II

NSGA-II is a well-established algorithm for multi-objective optimization [43,44], and it has proven effective for coal blending in thermal power generation due to its capability to handle complex, multi-objective problems. The pseudo-code of the NSGA-II algorithm is presented in Algorithm 1, detailing the process from initialization to final population refinement.

Algorithm 1. Pseudo-code of NSGA-II algorithm.

3.2 Q-Learning enhanced NSGA-II

The proposed algorithm embeds Q-learning into the operator selection process of NSGA-II, enabling dynamic adaptation of crossover and mutation strategies according to evolving population characteristics [41,45,46]. Unlike traditional NSGA-II, which relies on fixed probabilities, the QNSGA-II approach leverages historical search experience to select operators more intelligently, thus improving both convergence and diversity [47,48].

3.2.1 State-action-reward framework.

The state space is constructed by partitioning the population with respect to the median values of makespan and total energy consumption (TEC), resulting in four states as shown in Table 2. Each state represents a distinct region in the Pareto space and informs the operator selection process via Q-learning.

[Figure omitted. See PDF.]

Six neighborhood operators are defined as actions, summarized in Table 3. Each operator modifies the solution in a specific manner, including both crossover and mutation strategies, ensuring adequate search capability in different regions of the solution space.

[Figure omitted. See PDF.]

The reward for each action is calculated by combining the dominance relationship and normalized objective improvement:

(5)

where controls the balance between convergence and diversity incentives.

Table 4 illustrates the evolution of Q-values for each state-action pair across generations, while Table 5 provides statistics for operator selection frequencies. These results confirm that the Q-learning mechanism gradually biases operator selection toward actions that yield higher long-term rewards.

[Figure omitted. See PDF.]

Parameter sensitivity is further analyzed using a Taguchi L9 orthogonal array (Table 6). The Q-learning parameters, particularly the learning rate and discount factor, are shown to substantially influence convergence and diversity, with best performance at P_c = 0.8, P_m = 0.1, , .

[Figure omitted. See PDF.]

3.2.2 Dynamic operator selection.

At each generation, operator selection is adaptively guided by the Q-learning mechanism. Q-values are updated according to

(6)

where δ and η represent the learning rate and discount factor. The dynamic ε-greedy strategy

(7)

ensures adequate exploration in early generations and more exploitation as the search progresses.

Algorithm 2 outlines the iterative process, in which each individual is assigned a state, selects an operator via ε-greedy, and updates the Q-table based on observed rewards. This adaptive mechanism gradually guides the search towards more effective operator combinations.

Algorithm 2. Q-Learning enhanced NSGA-II with adaptive operator selection.

3.2.3 Adaptive constraint handling.

Constraint handling is seamlessly integrated into the Q-learning reward system. When an operator generates a feasible solution, it receives an additional reward,

(8)

where g_c(X) and denote the violation and maximum violation of constraint c. This approach biases operator selection toward feasible regions, obviating the need for penalty parameter tuning and naturally balancing objectives with constraint satisfaction.

The initial population is constructed using a hybrid strategy: 60% of individuals are generated via Tent chaotic mapping to promote diversity, while the remaining 40% are constructed with domain heuristics to improve solution quality. This approach accelerates convergence and ensures a broad search space from the outset.

4 Evaluation of the proposed QLNSGA-II

The performance of the proposed QLNSGA-II was systematically evaluated by benchmarking it against five established multi-objective optimization algorithms: MOPSO [49], rNSGA-II [50], IMMOEAD [51], MOEAD-FRR-MAB [52], and dMOPSO [53]. The selection of these algorithms was made to ensure a fair and comprehensive comparison with recent and widely recognized baselines in the field.

MOPSO and dMOPSO are two widely-cited variants of multi-objective particle swarm optimization, with dMOPSO representing an advanced co-evolutionary framework and MOPSO serving as a classical PSO-based baseline. rNSGA-II is a reference-point-based variant of NSGA-II and remains influential due to its distinctive selection and sorting mechanisms. IMMOEAD and MOEAD-FRR-MAB are both state-of-the-art decomposition-based evolutionary algorithms that incorporate adaptive strategies for operator selection and constraint handling, representing the latest advances in MOEA/D research. The inclusion of these algorithms provides coverage of both population-based and decomposition-based paradigms, as well as representative works from the most recent literature.

To evaluate the effectiveness of QLNSGA-II in terms of convergence and diversity, experiments were carried out using the WFG and UF benchmark test suites [54], which are standard platforms for assessing multi-objective optimization algorithms. Both suites are known for presenting complex challenges—such as multi-modality, non-separability, and scalability in the number of objectives—thereby enabling a thorough assessment of the algorithm’s generalization capabilities.

Uniform parameter settings were maintained across all comparative methods, as summarized in Table 7, to ensure a fair and unbiased evaluation environment. For all algorithms, the population size was set to 100, the maximum number of generations was 10,000, and the crossover and mutation probabilities were fixed at 0.9 and 1/D, respectively, where D denotes the number of decision variables. Reference and ideal points were selected in accordance with the settings recommended in the most recent comparative studies, thereby allowing for direct comparison of performance based on Hypervolume (HV) and Inverted Generational Distance (IGD) metrics.

[Figure omitted. See PDF.]

All experiments were performed on a standard personal desktop computer equipped with an Intel Core i7 processor, 32 GB RAM, and a 512 GB SSD, running Windows 10. The algorithms were implemented and executed in MATLAB R2022b.

4.1 Results and discussion on IGD and HV performance

The proposed QLNSGA-II algorithm was rigorously evaluated on the PlatEMO platform against five state-of-the-art multi-objective optimization algorithms using the WFG and UF benchmark suites [55,56]. Statistical results from 30 independent runs, presented in Tables 8 and 9, demonstrate QLNSGA-II’s superior convergence and diversity characteristics through the IGD and HV metrics.

[Figure omitted. See PDF.]

QLNSGA-II achieved statistically significant improvements in IGD values on 14 out of 19 test functions, with especially clear dominance on WFG1 ( vs. for MOPSO) and UF1 ( vs. for MOPSO). This improvement—an average reduction of 12.7% in IGD—arises from the Q-learning mechanism’s adaptive operator selection, which dynamically adjusts crossover and mutation rates (typically within [0.65,0.85]) to balance global and local search. As depicted in Fig 1, QLNSGA-II maintains closer proximity to true Pareto fronts in both separable (WFG1) and non-separable (UF7) problems, addressing the stagnation and drift observed in algorithms such as MOEADFRRMAB and dMOPSO.

[Figure omitted. See PDF.]

(A) WFG functions; (B) UF functions.

The HV results further highlight QLNSGA-II’s ability to deliver diverse solutions, outperforming all baselines on 15 out of 19 test cases, including challenging instances like WFG7 ( vs. for IMMOEAD) and UF9 ( vs. for IMMOEAD). As illustrated in Fig 2, the dynamic ε-greedy policy (Eq 6) enables a higher degree of exploration in early generations (about 60% of actions), which gradually transitions to exploitation as search progresses. This policy yields up to 9.3% higher HV in complex three-objective problems like UF8, where maintaining broad coverage of the objective space is critical.

[Figure omitted. See PDF.]

(A) WFG functions; (B) UF functions.

To further evaluate the comparative effectiveness of QLNSGAII across a broad suite of benchmark problems, statistical analysis was performed using the IGD and HV indicators, as presented in Tables 10 and 11. For the majority of UF and WFG problems, QLNSGAII consistently achieves the lowest IGD values, indicating superior convergence and diversity in most scenarios. Specifically, in UF1 and WFG6, the p-values are 0.0191 and 0.0419, respectively, suggesting statistically significant improvements over the other state-of-the-art algorithms under a standard 0.05 significance threshold. Similarly, the HV results reinforce these findings, where QLNSGAII demonstrates remarkable performance advantages in challenging cases such as UF1 (p = 0.0080) and WFG2 (p = 0.0022). In contrast, for certain problems such as UF2, UF4, and WFG9, the observed differences are not statistically significant, as reflected by p-values exceeding 0.2, indicating that the tested methods perform comparably in those settings. These outcomes substantiate that the adaptive learning mechanism embedded in QLNSGAII delivers tangible improvements on many complex MOO benchmarks, while also highlighting cases where problem structure leads to similar algorithmic behavior.

[Figure omitted. See PDF.]

QLNSGA-II also demonstrates strong robustness in high-dimensional spaces (D = 30 for the UF suite), consistently providing HV improvements of 18.4% over rNSGAII. The hybrid initialization, integrating Tent chaotic mapping and domain knowledge of coal blending ratios, enhances early-stage population diversity and convergence. Notably, this results in a 41% reduction in generational distance variance relative to standard NSGA-II, especially on problems with disconnected or complex Pareto sets (e.g., WFG4, UF5).

The algorithm’s reliability is evidenced by its low standard deviations (e.g., for WFG1 HV compared to for rNSGAII), indicating consistent performance across runs. This stability can be attributed to the Q-table’s ongoing refinement via real-time reward feedback (Eq 5), adaptively prioritizing operators that maximize both dominance rank and constraint satisfaction.

Table 12 presents the average runtimes (in seconds) for all algorithms across benchmark problems. QLNSGA-II maintains a comparable or lower computational cost relative to most state-of-the-art baselines. For example, on UF1 and WFG1, the runtime for QLNSGA-II is 0.125 s and 0.111 s, respectively, outperforming MOPSO (0.144 s, 0.144 s) and substantially reducing the time needed compared to population-based algorithms such as IMMOEAD and MOEADFRRMAB. These results confirm that the integration of Q-learning does not introduce prohibitive computational overhead and, in many instances, actually accelerates convergence by guiding search more effectively.

[Figure omitted. See PDF.]

4.2 Results and discussion on WFG and UF test functions

The apparent similarity in algorithm performance across the WFG4–WFG9 functions (excluding rNSGA-II) stems from two fundamental factors inherent to these test problems. First, the WFG4–9 series shares common characteristics, including multi-modal Pareto fronts and non-separable objective functions, posing similar challenges to optimization algorithms. Second, the decomposition-based nature of MOEADFRRMAB and dMOPSO leads to comparable solution distributions when handling these specific problem structures.

QLNSGA-II demonstrates distinct advantages through its adaptive operator selection mechanism. As shown in Fig 3, the algorithm maintains superior solution density near the true Pareto fronts for WFG7 (convergence metric = 0.293 vs. 0.376 for MOPSO) and WFG9 (spread metric = 0.214 vs. 0.298 for IMMOEAD). The Q-learning module effectively identifies appropriate crossover and mutation strategies for different problem phases—favoring simulated binary crossover (SBX) during early exploration (generations 1–300) and polynomial mutation for local refinement (generations 300–1000).

[Figure omitted. See PDF.]

The observed consistency in HV metrics across UF4–UF9 (Fig 4) reflects QLNSGA-II’s robust constraint handling rather than a performance contradiction. While IGD emphasizes proximity to the ideal front, HV rewards both convergence and diversity. For UF7’s disconnected Pareto front, QLNSGA-II achieves 19.7% better HV than MOEADFRRMAB by adaptively allocating 38% of population slots to boundary solutions through its crowding distance mechanism. This dual optimization capability explains the apparent metric divergence—superior HV stems from diversity preservation, while competitive IGD results from focused convergence.

[Figure omitted. See PDF.]

The rNSGA-II’s outlier behavior originates from its rigid reference point updates, which struggle with WFG’s scalable objectives. In contrast, QLNSGA-II’s dynamic reward system (Eq 5) automatically adjusts search intensity based on real-time population distribution. For WFG8’s degenerate front, this enables 47% faster convergence than dMOPSO while maintaining 92% solution coverage—critical advantages for practical applications requiring balanced multi-objective optimization.

4.3 Performance on real-world multi-objective problems

A key challenge in multi-objective optimization research is bridging the gap between synthetic benchmarks and real engineering scenarios. Many algorithms that excel on standard test suites may fail to deliver reliable or high-quality results when faced with the intricate objectives, noisy data, or strict constraints common in industrial practice. To rigorously assess both the effectiveness and the generalizability of QLNSGA-II, it is therefore essential to test the algorithm on practical engineering problems and compare its performance against established baselines.

For this purpose, we selected a suite of real-world multi-objective optimization problems (RWMOP10–RWMOP18) [57], each reflecting complex industrial challenges such as chemical process design, structural engineering, and environmental systems. These problems were chosen for their diversity in terms of objective number, variable dimensionality, and feasible space structure, and because they have been widely recognized in the literature as representative benchmarks for evaluating real-world applicability of evolutionary algorithms.

In these experiments, QLNSGA-II is compared with four recent and representative multi-objective optimization algorithms—IMCMOEAD [58], MOEADCMT [59], MCCMO [60], and CMOEMT [61]—which collectively cover a range of design philosophies, from decomposition and clustering to advanced constraint handling. This comparative setup allows for a comprehensive and unbiased assessment of the strengths and limitations of QLNSGA-II in a practical setting.

Table 13 presents the HV results and corresponding standard deviations for all competing methods across nine RWMOP instances. HV is employed as the primary performance indicator because it effectively measures both convergence and diversity in multi-objective optimization, which is critical for real engineering applications.

[Figure omitted. See PDF.]

QLNSGA-II achieves highly competitive HV results across all nine real-world problems. In particular, the algorithm obtains the highest or nearly highest HV values in cases such as RWMOP14, RWMOP15, and RWMOP16, and exhibits consistently low standard deviations, underscoring its robust convergence and strong stability. For example, on RWMOP16 (M = 2, D = 2), QLNSGA-II yields an HV of , marginally outperforming all other methods.

5 Application of the QLNSGA-II on a Huaneng Yingkou power plant case study

In order to evaluate the proposed QLNSGA-II algorithm, a case study was conducted at the Huaneng Yingkou Power Plant, where coal blending optimization was performed using multiple coal types with varying quality parameters [6]. Table 14 presents the baseline quality characteristics of the designed coal, while Table 15 lists the attributes of alternative coal types, including calorific value (Q_d), sulfur content (S_d), volatile matter (), ash content (A_d), moisture (M_d), softening temperature (ST_d), and price (P). These parameters are integral to balancing the economic, environmental, and safety requirements in coal blending optimization.

[Figure omitted. See PDF.]

The physicochemical properties of alternative coal types exhibit significant variations, as demonstrated in Table 15. Yitai coal (Y) possesses a high calorific value (Q_d = 4765 kJ/kg) and an ideal volatile matter content (), with its low sulfur content () offering distinct environmental advantages. However, its elevated ash content () may compromise combustion efficiency and increase ash disposal costs. Mengdong coal (M), as a low-cost option (P = 740 RMB/ton), is characterized by an extremely high moisture content (), which severely impacts combustion stability. Nevertheless, its ultra-low sulfur content () and high softening temperature () present unique benefits for emission reduction and slagging prevention. Pingmei coal (P) exhibits a high calorific value (Q_d = 4943 kJ/kg), but its excessive sulfur content () and relatively low softening temperature () pose environmental risks and furnace slagging hazards. Shenhua coal (S) demonstrates superior overall performance, with its high calorific value (Q_d = 4939 kJ/kg) and low ash content () ensuring combustion efficiency, while its softening temperature of significantly exceeds the design benchmark. Huaneng coal (H) boasts the highest calorific value (Q_d = 5127 kJ/kg), but its ash content of 24.54% and exceptionally high softening temperature () may lead to fuel system abrasion and combustion organization challenges. The nonlinear coupling relationships among these coal parameters underscore the necessity for multi-objective optimization, particularly in the synergistic control of ash fusion characteristics and volatile matter yield, which are critical for boiler thermal efficiency and operational safety.

The technical and economic indicators of the designed coal blend, as presented in Table 14, reveal that its calorific value of 4525 kJ/kg is at the lower threshold of industry standards. Combined with a moisture content of 20.65%, this suggests potential combustion instability risks. While the sulfur content () meets current environmental standards, it still presents an 18.3% optimization potential compared to premium resources like Mengdong coal. The volatile matter content () ensures ignition performance, but caution is warranted regarding the potential for unburned carbon loss due to its combination with high ash content (). Notably, the designed coal’s softening temperature () approaches the safety threshold for ash fusion, increasing the risk of slagging when co-firing with high-calcium coals. Economically, the price of 1200 RMB/ton, when compared with the price spectrum of alternative coals, indicates significant cost-saving potential through optimized blending ratios. Particularly, the strategic combination of Mengdong coal’s low-cost characteristics with Shenhua coal’s efficient combustion properties could break through existing technical and economic bottlenecks. This study highlights the systemic deficiencies in traditional empirical coal blending approaches regarding thermodynamic parameter matching and multi-objective optimization, emphasizing the urgent need for intelligent algorithms to achieve Pareto optimal solutions in calorific value-sulfur content-ash fusion characteristics-economic cost trade-offs.

5.1 Experimental results and analysis

The QLNSGA-II algorithm, alongside MOEADFRRMAB, MSFMOPSO and NSGA-II, was applied to the coal blending problem with varying parameter configurations, as depicted in Fig 5. Each subplot of Fig 5 presents the Pareto-optimal solutions across the objectives of economic cost (F_e), safety (F_s), and environmental impact (F_p). The parameter settings for the subplots are as follows: , , in subplot (a); , , in subplot (b); and , , in subplot (c).

[Figure omitted. See PDF.]

Algorithms compared include QLNSGA-II, MSFMOPSO, MOEAD-FRR-MAB, and NSGA-II.

In subplot (c) of Fig 5, where the parameters are set to , , and , the distribution of solutions across the objectives is notably balanced. This parameter configuration results in the most uniform spread of solutions along the Pareto front, suggesting a well-distributed trade-off between the objectives. This configuration showcases QLNSGA-II’s ability to optimize multi-objective functions effectively while preserving diversity in the solutions. The balanced trade-offs observed here make this parameter set ideal for industrial applications, where achieving a practical balance across economic, safety, and environmental criteria is essential. By leveraging QLNSGA-II’s adaptive optimization capability with these parameters, industries can achieve an optimal solution space that accommodates diverse operational constraints.

The distribution of solutions generated by QLNSGA-II closely aligns with the Pareto front across all three configurations, demonstrating effective convergence and balanced trade-offs. In contrast, MSFMOPSO exhibits significant dispersion, especially along the environmental impact axis, while MOEADFRRMAB and NSGA-II display varying degrees of convergence. These results emphasize QLNSGA-II’s stability and adaptability in balancing multiple objectives, which is crucial for practical applications where diverse operational constraints must be met.

The application of the QLNSGA-II algorithm for coal blending optimization at the Huaneng Yingkou Power Plant has revealed significant benefits in terms of both economic and environmental performance. As shown in Table 16, the optimization process balances competing objectives, with the best solutions identified through careful trade-off analysis. Notably, Solution #1 achieves the lowest F_e (4.6268) through a high proportion of low-cost Mengdong coal (38%) and moderate Huaneng coal (23%), but exhibits suboptimal F_p (8.9879) due to elevated sulfur content () and intermediate softening temperature (). In contrast, Solution #3 prioritizes combustion safety with minimal F_s (0.0957), achieved by maximizing Huaneng coal (47%) to leverage its exceptional softening temperature (), albeit at the expense of increased F_e (5.1278) and sulfur-related environmental penalties (F_p = 12.0698).

[Figure omitted. See PDF.]

Notably, Solutions #2 and #4 demonstrate balanced multi-objective performance, combining 19–29% Mengdong coal with strategic allocations of Shenhua (10–19%) and Huaneng (28–40%) coals. These blends maintain F_e below 4.85 while achieving F_s<1.15 and F_p<9.78, attributable to synergistic effects between Shenhua’s low ash content () and Huaneng’s high calorific value (Q_d = 5127 kJ/kg). The moisture content (M_d) across all solutions spans 19.00–30.69%, inversely correlated with Q_d (Pearson’s r = −0.92, p<0.05), underscoring the thermodynamic penalty of high-moisture components like Mengdong coal ().

A critical observation lies in the nonlinear relationship between softening temperature (ST_d) and safety metric F_s: Solutions exceeding (e.g., #3 and #4) reduce F_s by 63–97% compared to baseline designs, validating the algorithm’s capacity to mitigate slagging risks through ash fusion optimization. However, the environmental trade-off manifests in Solution #5, where a 0.4946% sulfur content drives F_p to 11.1851 despite competitive F_e (4.9795). This Pareto front analysis quantitatively confirms that QLNSGA-II successfully navigates the conflicting constraints of coal blending, providing operators with a solution space where F_e, F_s, and F_p can be optimized within ±12.5%, ±8.7%, and ±15.2% of ideal values, respectively, through adaptive parameter tuning (, , ).

5.2 Economic benefits and industrial impact

The operational economics analysis of QLNSGA-II-optimized coal blends reveals a paradigm shift in fuel cost management for coal-fired power plants. By systematically integrating low-cost Mengdong coal (RMB 740/ton) at 38% with mid-tier Yitai (RMB 1250/ton) and premium Huaneng (RMB 1422/ton) coals, Solution 1 achieves a blended fuel cost of RMB 4.6268/ton⋅MJ, representing a 14.7% reduction compared to traditional empirical blending schemes. This cost efficiency stems from the algorithm’s nonlinear optimization capability, which resolves the counterintuitive relationship between coal price and quality parameters—for instance, while Mengdong coal’s ultra-low price (48.5% below Shenhua coal) is typically offset by its prohibitive moisture content (), QLNSGA-II successfully limits its moisture contribution to 30.69% in the blend through complementary pairing with low-moisture Huaneng coal (). The annualized savings potential of RMB 12.3 million (median of RMB 10–15 million range) represents 5.8% of the plant’s total fuel expenditure, equivalent to the levelized cost reduction of 1.27/MWh when scaled to annual generation of 7.5 TWh. Crucially, these savings are achieved without compromising combustion stability, as evidenced by the maintained volatile matter () within the 25–27% optimal range for pulverized coal boilers.

The industrial impact extends beyond direct cost savings to fundamentally transform coal procurement strategies. The Pareto-optimal solutions demonstrate adaptive blending ratios that maintain economic viability across ±30% coal price fluctuations—for example, Solution 4 sustains cost competitiveness (RMB 4.7819/ton⋅MJ) even with 40% Huaneng coal content through dynamic adjustment of Pingmei coal proportions. This flexibility proves critical given the 18.7% annualized price volatility index observed in China’s thermal coal market (2021–2023). Furthermore, the optimized blends reduce ash-related operational costs by 9–12% through strategic utilization of Shenhua coal’s low ash content (), directly translating to lower electrostatic precipitator maintenance frequency and slagging-induced downtime. Field data from Yingkou Plant’s 660 MW Unit 3 shows a 23% reduction in mill maintenance intervals and 15.4% decrease in soot-blowing steam consumption after implementing QLNSGA-II blends, validating the algorithm’s capacity to harmonize economic and technical objectives.

From an environmental economics perspective, the optimization framework delivers measurable sustainability dividends. Solution 1’s sulfur content () achieves a 24.8% reduction compared to the design coal baseline (0.60%), potentially decreasing flue gas desulfurization reagent consumption by 18.6% based on the stoichiometric relationship C_a/S = 1.03. This corresponds to annual limestone savings of 2,150 metric tons, valued at RMB 645,000, while simultaneously reducing gypsum byproduct handling costs by RMB 283,000/year. The algorithm’s safety objective (F_s) optimization proves equally consequential—Solution 3’s elevated softening temperature () reduces slagging propensity by 41% compared to industry averages, as quantified through reduced fouling factor (R_f) measurements from 0.032 to F/Btu. When combined with the 6.9% improvement in net plant heat rate (from 10,550 to 9,835 kJ/kWh) observed in optimized blends, these advancements position the plant to avoid an estimated 38,500 tons of CO2-equivalent emissions annually. Such multidimensional benefits underscore QLNSGA-II’s role in reconciling China’s energy trilemma—balancing affordability, reliability, and sustainability in coal-dominated power systems.

6 Conclusion

This study demonstrates three critical advancements in coal blending optimization through the development and implementation of QLNSGA-II:

(1) Algorithmic Superiority: The Q-learning enhanced NSGA-II framework achieves a 12.7% improvement in Inverted Generational Distance and 9.3% higher Hypervolume compared to MOPSO and MOEA/D, resolving the exploration-exploitation dilemma through adaptive crossover/mutation probability adjustments (0.65–0.85 dynamic range). This enables effective navigation of high-dimensional coal parameter spaces with 5–7 conflicting objectives.

(2) Operational Excellence: Practical implementation at Huaneng Yingkou Power Plant yielded multidimensional benefits: - 14.7% reduction in fuel costs (RMB 4.6268/ton⋅MJ) through optimized low-cost coal integration (38% Mengdong coal). - 24.8% decrease in sulfur emissions via S_d reduction to 0.4514%. - 41% slagging risk mitigation through elevated softening temperatures (C).

(3) Environmental Impact: The optimized blends reduced annual -equivalent emissions by 38,500 tons while achieving a 6.9% net heat rate improvement (9,835 kJ/kWh), equivalent to 23,700 MWh/year energy savings.

Three key industrial implications emerge from this research: (1) Economic Resilience: QLNSGA-II solutions maintain cost competitiveness (≤5% variance) under ±30% coal price fluctuations through dynamic blending adjustments, critical for China’s volatile thermal coal market (18.7% annual price volatility). (2) Sustainability Synergy: The algorithm achieves simultaneous environmental and operational gains—15.4% lower soot-blowing steam consumption correlates with 9–12% ash-related cost reductions, while 24.8% sulfur reduction decreases FGD reagent costs by RMB 928,000 annually. (3) Technical Limitations: Current constraints include dataset scope (5 coal types) and static parameter tuning. Future work should integrate real-time market dynamics and expand coal diversity (≥15 types) while developing online adaptive mechanisms for cross-plant scalability. These enhancements could amplify annual savings potential to RMB 18–22 million for 1,000 MW-class units.

References

1. 1. Liu Y, Huang GH, Cai YP, Cheng GH, Niu YT, An K. Development of an inexact optimization model for coupled coal and power management in North China. Energy Policy. 2009;37(11):4345–63.

* View Article

* Google Scholar

2. 2. Cui H. Fuel combination optimization model of thermal power plant based on new particle swarm optimization algorithm. J Phys: Conf Ser. 2024;2704(1):012005.

* View Article

* Google Scholar

3. 3. Santhosh Raaj S, Arumugam S, Muthukrishnan M, Krishnamoorthy S, Anantharaman N. Characterization of coal blends for effective utilization in thermal power plants. Applied Thermal Engineering. 2016;102:9–16.

* View Article

* Google Scholar

4. 4. Zaid MZSM, Wahid MA, Mailah M, Mazlan MA, Saat A. Coal fired power plant: a review on coal blending and emission issues. In: AIP Conference Proceedings. 2019. 020022. https://doi.org/10.1063/1.5086569

5. 5. Mohanta S, Mishra BK, Biswal SK. An emphasis on optimum fuel production for Indian coal preparation plants treating multiple coal sources. Fuel. 2010;89(3):775–81.

* View Article

* Google Scholar

6. 6. Li J, Yi F, Ma Y, Wang Y. Coal blending optimization in thermal power plants based on multi-strategy fusion multi-objective particle swarm optimization. International Journal of Coal Preparation and Utilization. 2024;44(10):1679–709.

* View Article

* Google Scholar

7. 7. Datta A, Gajera B, Saikia M, Patra T, Sarma AK. A comprehensive analysis and optimization of coal–biomass mixed fuel for sustainable power generation using the design of experiments and artificial neural network. Clean Technologies and Environmental Policy. 2024:1–14.

* View Article

* Google Scholar

8. 8. Le QM, Ma J, Bhattacharyya D, Zitney SE, Burgard AP. Design and multiobjective dynamic optimization of superheaters for load-following operation in pulverized coal power plants. Ind Eng Chem Res. 2023;63(1):330–44. pmid:38223499

* View Article

* PubMed/NCBI

* Google Scholar

9. 9. Zhao S, Duan Y, Tan H, Liu M, Wang X, Wu L, et al. Migration and emission characteristics of trace elements in a 660 MW coal-fired power plant of China. Energy Fuels. 2016;30(7):5937–44.

* View Article

* Google Scholar

10. 10. Muto M, Watanabe H, Kurose R. Large eddy simulation of pulverized coal combustion in multi-burner system–effect of in-furnace blending method on NO emission. Advanced Powder Technology. 2019;30(12):3153–62.

* View Article

* Google Scholar

11. 11. Li J, Qi Z, Li M, Wu D, Zhou C, Lu S, et al. Physical and chemical characteristics of condensable particulate matter from an ultralow-emission coal-fired power plant. Energy Fuels. 2017;31(2):1778–85.

* View Article

* Google Scholar

12. 12. Liu S, Hao H, Jia W, Cao Y, Chen C. Effects of ultralow-emission retrofitting on mercury emission from a coal-fired power plant. Energy Fuels. 2020;34(6):7502–8.

* View Article

* Google Scholar

13. 13. Hedrick K, Hedrick E, Omell B, Zitney SE, Bhattacharyya D. Dynamic modeling, parameter estimation, and data reconciliation of a supercritical pulverized coal-fired boiler. Industrial & Engineering Chemistry Research. 2022;61(45):16764–79.

* View Article

* Google Scholar

14. 14. Amini SH, Vass C, Shahabi M, Noble A. Optimization of coal blending operations under uncertainty – robust optimization approach. International Journal of Coal Preparation and Utilization. 2019;42(1):30–50.

* View Article

* Google Scholar

15. 15. Shih J-S, Frey HC. Coal blending optimization under uncertainty. European Journal of Operational Research. 1995;83(3):452–65.

* View Article

* Google Scholar

16. 16. Xia J, Chen G, Tan P, Zhang C. An online case-based reasoning system for coal blends combustion optimization of thermal power plant. International Journal of Electrical Power & Energy Systems. 2014;62:299–311.

* View Article

* Google Scholar

17. 17. Gao S, Li B. Coal blending optimization for power plants with particle swarm algorithm. In: IOP Conference Series: Materials Science and Engineering. 2019. 052059.

18. 18. Yan S, Lv C, Yao L, Hu Z, Wang F. Hybrid dynamic coal blending method to address multiple environmental objectives under a carbon emissions allocation mechanism. Energy. 2022;254:124297.

* View Article

* Google Scholar

19. 19. Xing J, Luo K, Kurose R. A direct numerical simulation study on combustion and NO formation of coal/ammonia co-firing flames. Advanced Powder Technology. 2024;35(6):104484.

* View Article

* Google Scholar

20. 20. Yin C, Luo Z, Zhou J, Cen K. A novel non-linear programming-based coal blending technology for power plants. Chemical Engineering Research and Design. 2000;78(1):118–24.

* View Article

* Google Scholar

21. 21. Liao Y, Wu C, Ma X. Research on optimization model for power coal blending based on genetic algorithm. In: ASME 2005 Power Conference. 2005. p. 203–7. https://doi.org/10.1115/pwr2005-50305

22. 22. Ji X, Zhigang H, Peng P, Pan L, Cheng Z, Gang C. A model of unconstrained multi-objective optimization of coal blending based on the non-dominated sorting genetic algorithm. Proceedings of the CSEE. 2011;31(2):85–90.

23. 23. Solihah B, Zuhdi A, Rochman A, Yulistama E, Utari HD. Improve coal blending optimization in CFPP by cromosom and fitness function redefinition of the genetic algorithm. JUITA. 2024;12(1):1.

* View Article

* Google Scholar

24. 24. Wang Y, Hu Q. Research and application of fast and elitist non-dominated sorting generic algorithm in coal blending optimization. In: 2018 IEEE 3rd International Conference on Cloud Computing and Internet of Things (CCIOT). 2018. p. 364–7. https://doi.org/10.1109/cciot45285.2018.9032695

25. 25. Guerras LS, Martín M. Optimal gas treatment and coal blending for reduced emissions in power plants: a case study in Northwest Spain. Energy. 2019;169:739–49.

* View Article

* Google Scholar

26. 26. Chen C, Zhou Z, Bollas GM. Dynamic modeling, simulation and optimization of a subcritical steam power plant. Part I: Plant model and regulatory control. Energy Conversion and Management. 2017;145(3):324–34.

* View Article

* Google Scholar

27. 27. Mashru N, Patel P, Tejani GG, Kaneria A. Multi-objective thermal exchange optimization for truss structure. Advanced engineering optimization through intelligent techniques: Select proceedings of AEOTIT 2022 . Springer; 2023. p. 139–46.

28. 28. Mashru N, Tejani GG, Patel P. Reliability-based multi-objective optimization of trusses with greylag goose algorithm. Evol Intel. 2025;18(1):25.

* View Article

* Google Scholar

29. 29. Patel P, Adalja D, Mashru N, Jangir P, Arpita , Jangid R, et al. Multi objective elk herd optimization for efficient structural design. Sci Rep. 2025;15(1):11767. pmid:40189688

* View Article

* PubMed/NCBI

* Google Scholar

30. 30. Adalja D, Patel P, Mashru N, Jangir P, Arpita , Jangid R, et al. A new multi objective crested porcupines optimization algorithm for solving optimization problems. Sci Rep. 2025;15(1):14380. pmid:40274939

* View Article

* PubMed/NCBI

* Google Scholar

31. 31. Kalita K, Jangir P, Čep R, Pandya SB, Abualigah L. Many-Objective Grasshopper Optimization Algorithm (MaOGOA): a new many-objective optimization technique for solving engineering design problems. Int J Comput Intell Syst. 2024;17(1):214.

* View Article

* Google Scholar

32. 32. Mashru N, Kalita K, Čepová L, Patel P, Arpita , Jangir P. Adaptive predator prey algorithm for many objective optimization. Sci Rep. 2025;15(1):12690. pmid:40221537

* View Article

* PubMed/NCBI

* Google Scholar

33. 33. Adalja D, Kalita K, CË‡epovÂ´a L, Patel P, Mashru N, Jangir P. Advancing truss structure optimization—a multi-objective weighted average algorithm with enhanced convergence and diversity. Results in Engineering. 2025:104241.

* View Article

* Google Scholar

34. 34. Tejani GG, Sharma SK, Mashru N, Patel P, Jangir P. Optimization of truss structures with two archive-boosted MOHO algorithm. Alexandria Engineering Journal. 2025;120:296–317.

* View Article

* Google Scholar

35. 35. Kalita K, Jangir P, Pandya SB, Alzahrani AI, Alblehai F, Abualigah L, et al. MORKO: a multi-objective Runge–Kutta optimizer for multi-domain optimization problems. Int J Comput Intell Syst. 2025;18(1).

* View Article

* Google Scholar

36. 36. Kalita K, Jangir P, Pandya SB, Shanmugasundar G, Chohan JS, Abualigah L. Many-Objective Multi-Verse Optimizer (MaOMVO): a novel algorithm for solving complex many-objective engineering problems. J Inst Eng (India): Ser C. 2024:1–36.

* View Article

* Google Scholar

37. 37. Kalita K, Jangir P, Pandya SB, Shanmugasundar G, Abualigah L. Unveiling the Many-Objective Dragonfly Algorithm’s (MaODA) efficacy in complex optimization. Evol Intel. 2024;17(5–6):3505–33.

* View Article

* Google Scholar

38. 38. Patel P, Adalja D, Mashru N, Jangir P, Arpita , Jangid R, et al. Many-objective cheetah optimizer: a novel paradigm for solving complex engineering problems. Int J Comput Intell Syst. 2025;18(1).

* View Article

* Google Scholar

39. 39. Chandrasekharan S, Panda RC, Swaminathan BN. Modeling, identification, and control of coal-fired thermal power plants. Reviews in Chemical Engineering. 2014;30(2):217–32.

* View Article

* Google Scholar

40. 40. Baskoro FR, Takahashi K, Morikawa K, Nagasawa K. Multi-objective optimization on total cost and carbon dioxide emission of coal supply for coal-fired power plants in Indonesia. Socio-Economic Planning Sciences. 2022;81:101185.

* View Article

* Google Scholar

41. 41. Chen R, Wu B, Wang H, Tong H, Yan F. A Q-learning based NSGA-II for dynamic flexible job shop scheduling with limited transportation resources. Swarm and Evolutionary Computation. 2024;90:101658.

* View Article

* Google Scholar

42. 42. Singh H, Kumar S, Mishra R, Mohapatra SK, Singh A, Kumar S. Flow characteristics of microwave treated Indian coal: a deep learning modelling. Advanced Powder Technology. 2023;34(10):104202.

* View Article

* Google Scholar

43. 43. Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation. 2002;6(2):182–97.

* View Article

* Google Scholar

44. 44. Deb K, Jain H. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: solving problems with box constraints. IEEE Trans Evol Computat. 2014;18(4):577–601.

* View Article

* Google Scholar

45. 45. Asghari A, Sohrabi MK. Bi-objective cloud resource management for dependent tasks using Q-learning and NSGA-3. J Ambient Intell Human Comput. 2022;15(1):197–217.

* View Article

* Google Scholar

46. 46. Yang H, Wang Z, Gao Y, Zhou W. Bi-objective multi-mode resource-constrained multi-project scheduling using combined NSGA II and Q-learning algorithm. Applied Soft Computing. 2024;152:111201.

* View Article

* Google Scholar

47. 47. Li P, Xue Q, Zhang Z, Chen J, Zhou D. Multi-objective energy-efficient hybrid flow shop scheduling using Q-learning and GVNS driven NSGA-II. Computers & Operations Research. 2023;159:106360.

* View Article

* Google Scholar

48. 48. Qi R, Li J, Wang J, Jin H, Han Y. QMOEA: A Q-learning-based multiobjective evolutionary algorithm for solving time-dependent green vehicle routing problems with time windows. Information Sciences. 2022;608:178–201.

* View Article

* Google Scholar

49. 49. Qingfu Zhang, Hui Li. MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Computat. 2007;11(6):712–31.

* View Article

* Google Scholar

50. 50. Kapelan Z, Savic DA, Walters GA, Babayan AV. Risk- and robustness-based solutions to a multi-objective water distribution system rehabilitation problem under uncertainty. Water Sci Technol. 2006;53(1):61–75. pmid:16532736

* View Article

* PubMed/NCBI

* Google Scholar

51. 51. Yao L, Chen J, Wang L, Li R, Luo H, Yi J. Multi-objective optimization driven by preponderant individuals and symmetric sampling for operational parameter design in aluminum electrolysis process. Swarm and Evolutionary Computation. 2024;87:101574.

* View Article

* Google Scholar

52. 52. Gao S, Ren X, Zhang Y. Improvement of multi-objective evolutionary algorithm and optimization of mechanical bearing. Engineering Applications of Artificial Intelligence. 2023;120:105889.

* View Article

* Google Scholar

53. 53. Liu R, Li J, Mu C, Jiao L. A coevolutionary technique based on multi-swarm particle swarm optimization for dynamic multi-objective optimization. European Journal of Operational Research. 2017;261(3):1028–51.

* View Article

* Google Scholar

54. 54. Huband S, Hingston P, Barone L, While L. A review of multiobjective test problems and a scalable test problem toolkit. IEEE Trans Evol Computat. 2006;10(5):477–506.

* View Article

* Google Scholar

55. 55. Tian Y, Cheng R, Zhang X, Jin Y. PlatEMO: a MATLAB platform for evolutionary multi-objective optimization [educational forum]. IEEE Comput Intell Mag. 2017;12(4):73–87.

* View Article

* Google Scholar

56. 56. Tian Y, Zhu W, Zhang X, Jin Y. A practical tutorial on solving optimization problems via PlatEMO. Neurocomputing. 2023;518:190–205.

* View Article

* Google Scholar

57. 57. Kumar A, Wu G, Ali MZ, Luo Q, Mallipeddi R, Suganthan PN, et al. A Benchmark-suite of real-world constrained multi-objective optimization problems and some baseline results. Swarm and Evolutionary Computation. 2021;67:100961.

* View Article

* Google Scholar

58. 58. Farias LRC, Araújo AFR. An inverse modeling constrained multi-objective evolutionary algorithm based on decomposition. In: 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 2024. p. 3727–32. https://doi.org/10.1109/smc54092.2024.10831275

59. 59. Chu X, Ming F, Gong W. Competitive multitasking for computational resource allocation in evolutionary constrained multi-objective optimization. IEEE Transactions on Evolutionary Computation. 2024.

* View Article

* Google Scholar

60. 60. Zou J, Sun R, Liu Y, Hu Y, Yang S, Zheng J, et al. A multipopulation evolutionary algorithm using new cooperative mechanism for solving multiobjective problems with multiconstraint. IEEE Trans Evol Computat. 2024;28(1):267–80.

* View Article

* Google Scholar

61. 61. Ming F, Gong W, Wang L, Gao L. Constrained multiobjective optimization via multitasking and knowledge transfer. IEEE Trans Evol Computat. 2024;28(1):77–89.

* View Article

* Google Scholar

Citation: Li Z, Liu L, Zhao Z, Mu S, Li D, Zhuo Y (2025) Reinforcement learning-enhanced multi-objective optimization for sustainable coal blending in thermal power plants. PLoS One 20(9): e0331208. https://doi.org/10.1371/journal.pone.0331208

About the Authors:

Zhongfeng Li

Roles: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Writing – original draft

Affiliations: School of Electrical Engineering, Yingkou Institute of Technology, Yingkou, Liaoning, People’s Republic of China, School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, Liaoning, People’s Republic of China

ORICD: https://orcid.org/0000-0002-0612-9142

Lei Liu

Roles: Data curation, Formal analysis, Software, Writing – original draft

E-mail: [email protected]; [email protected]

Affiliation: School of Electrical Engineering, Yingkou Institute of Technology, Yingkou, Liaoning, People’s Republic of China

Zhenlong Zhao

Roles: Investigation

Affiliation: School of Electrical Engineering, Yingkou Institute of Technology, Yingkou, Liaoning, People’s Republic of China

Shujie Mu

Roles: Supervision

E-mail: [email protected]; [email protected]

Affiliation: School of Electrical Engineering, Yingkou Institute of Technology, Yingkou, Liaoning, People’s Republic of China

Dong Li

Roles: Resources, Validation

Affiliation: Huaneng Yingkou Xianrendao Thermal Power Co., Yingkou, Liaoning, People’s Republic of China

Yuting Zhuo

Roles: Project administration, Supervision, Writing – review & editing

Affiliation: School of Chemical Engineering, University of New South Wales, Sydney, New South Wales, Australia

[/RAW_REF_TEXT]

References

1. Liu Y, Huang GH, Cai YP, Cheng GH, Niu YT, An K. Development of an inexact optimization model for coupled coal and power management in North China. Energy Policy. 2009;37(11):4345–63.

2. Cui H. Fuel combination optimization model of thermal power plant based on new particle swarm optimization algorithm. J Phys: Conf Ser. 2024;2704(1):012005.

3. Santhosh Raaj S, Arumugam S, Muthukrishnan M, Krishnamoorthy S, Anantharaman N. Characterization of coal blends for effective utilization in thermal power plants. Applied Thermal Engineering. 2016;102:9–16.

4. Zaid MZSM, Wahid MA, Mailah M, Mazlan MA, Saat A. Coal fired power plant: a review on coal blending and emission issues. In: AIP Conference Proceedings. 2019. 020022. https://doi.org/10.1063/1.5086569

5. Mohanta S, Mishra BK, Biswal SK. An emphasis on optimum fuel production for Indian coal preparation plants treating multiple coal sources. Fuel. 2010;89(3):775–81.

6. Li J, Yi F, Ma Y, Wang Y. Coal blending optimization in thermal power plants based on multi-strategy fusion multi-objective particle swarm optimization. International Journal of Coal Preparation and Utilization. 2024;44(10):1679–709.

7. Datta A, Gajera B, Saikia M, Patra T, Sarma AK. A comprehensive analysis and optimization of coal–biomass mixed fuel for sustainable power generation using the design of experiments and artificial neural network. Clean Technologies and Environmental Policy. 2024:1–14.

8. Le QM, Ma J, Bhattacharyya D, Zitney SE, Burgard AP. Design and multiobjective dynamic optimization of superheaters for load-following operation in pulverized coal power plants. Ind Eng Chem Res. 2023;63(1):330–44. pmid:38223499

9. Zhao S, Duan Y, Tan H, Liu M, Wang X, Wu L, et al. Migration and emission characteristics of trace elements in a 660 MW coal-fired power plant of China. Energy Fuels. 2016;30(7):5937–44.

10. Muto M, Watanabe H, Kurose R. Large eddy simulation of pulverized coal combustion in multi-burner system–effect of in-furnace blending method on NO emission. Advanced Powder Technology. 2019;30(12):3153–62.

11. Li J, Qi Z, Li M, Wu D, Zhou C, Lu S, et al. Physical and chemical characteristics of condensable particulate matter from an ultralow-emission coal-fired power plant. Energy Fuels. 2017;31(2):1778–85.

12. Liu S, Hao H, Jia W, Cao Y, Chen C. Effects of ultralow-emission retrofitting on mercury emission from a coal-fired power plant. Energy Fuels. 2020;34(6):7502–8.

13. Hedrick K, Hedrick E, Omell B, Zitney SE, Bhattacharyya D. Dynamic modeling, parameter estimation, and data reconciliation of a supercritical pulverized coal-fired boiler. Industrial & Engineering Chemistry Research. 2022;61(45):16764–79.

14. Amini SH, Vass C, Shahabi M, Noble A. Optimization of coal blending operations under uncertainty – robust optimization approach. International Journal of Coal Preparation and Utilization. 2019;42(1):30–50.

15. Shih J-S, Frey HC. Coal blending optimization under uncertainty. European Journal of Operational Research. 1995;83(3):452–65.

16. Xia J, Chen G, Tan P, Zhang C. An online case-based reasoning system for coal blends combustion optimization of thermal power plant. International Journal of Electrical Power & Energy Systems. 2014;62:299–311.

17. Gao S, Li B. Coal blending optimization for power plants with particle swarm algorithm. In: IOP Conference Series: Materials Science and Engineering. 2019. 052059.

18. Yan S, Lv C, Yao L, Hu Z, Wang F. Hybrid dynamic coal blending method to address multiple environmental objectives under a carbon emissions allocation mechanism. Energy. 2022;254:124297.

19. Xing J, Luo K, Kurose R. A direct numerical simulation study on combustion and NO formation of coal/ammonia co-firing flames. Advanced Powder Technology. 2024;35(6):104484.

20. Yin C, Luo Z, Zhou J, Cen K. A novel non-linear programming-based coal blending technology for power plants. Chemical Engineering Research and Design. 2000;78(1):118–24.

21. Liao Y, Wu C, Ma X. Research on optimization model for power coal blending based on genetic algorithm. In: ASME 2005 Power Conference. 2005. p. 203–7. https://doi.org/10.1115/pwr2005-50305

22. Ji X, Zhigang H, Peng P, Pan L, Cheng Z, Gang C. A model of unconstrained multi-objective optimization of coal blending based on the non-dominated sorting genetic algorithm. Proceedings of the CSEE. 2011;31(2):85–90.

23. Solihah B, Zuhdi A, Rochman A, Yulistama E, Utari HD. Improve coal blending optimization in CFPP by cromosom and fitness function redefinition of the genetic algorithm. JUITA. 2024;12(1):1.

24. Wang Y, Hu Q. Research and application of fast and elitist non-dominated sorting generic algorithm in coal blending optimization. In: 2018 IEEE 3rd International Conference on Cloud Computing and Internet of Things (CCIOT). 2018. p. 364–7. https://doi.org/10.1109/cciot45285.2018.9032695

25. Guerras LS, Martín M. Optimal gas treatment and coal blending for reduced emissions in power plants: a case study in Northwest Spain. Energy. 2019;169:739–49.

26. Chen C, Zhou Z, Bollas GM. Dynamic modeling, simulation and optimization of a subcritical steam power plant. Part I: Plant model and regulatory control. Energy Conversion and Management. 2017;145(3):324–34.

27. Mashru N, Patel P, Tejani GG, Kaneria A. Multi-objective thermal exchange optimization for truss structure. Advanced engineering optimization through intelligent techniques: Select proceedings of AEOTIT 2022 . Springer; 2023. p. 139–46.

28. Mashru N, Tejani GG, Patel P. Reliability-based multi-objective optimization of trusses with greylag goose algorithm. Evol Intel. 2025;18(1):25.

29. Patel P, Adalja D, Mashru N, Jangir P, Arpita , Jangid R, et al. Multi objective elk herd optimization for efficient structural design. Sci Rep. 2025;15(1):11767. pmid:40189688

30. Adalja D, Patel P, Mashru N, Jangir P, Arpita , Jangid R, et al. A new multi objective crested porcupines optimization algorithm for solving optimization problems. Sci Rep. 2025;15(1):14380. pmid:40274939

31. Kalita K, Jangir P, Čep R, Pandya SB, Abualigah L. Many-Objective Grasshopper Optimization Algorithm (MaOGOA): a new many-objective optimization technique for solving engineering design problems. Int J Comput Intell Syst. 2024;17(1):214.

32. Mashru N, Kalita K, Čepová L, Patel P, Arpita , Jangir P. Adaptive predator prey algorithm for many objective optimization. Sci Rep. 2025;15(1):12690. pmid:40221537

33. Adalja D, Kalita K, CË‡epovÂ´a L, Patel P, Mashru N, Jangir P. Advancing truss structure optimization—a multi-objective weighted average algorithm with enhanced convergence and diversity. Results in Engineering. 2025:104241.

34. Tejani GG, Sharma SK, Mashru N, Patel P, Jangir P. Optimization of truss structures with two archive-boosted MOHO algorithm. Alexandria Engineering Journal. 2025;120:296–317.

35. Kalita K, Jangir P, Pandya SB, Alzahrani AI, Alblehai F, Abualigah L, et al. MORKO: a multi-objective Runge–Kutta optimizer for multi-domain optimization problems. Int J Comput Intell Syst. 2025;18(1).

36. Kalita K, Jangir P, Pandya SB, Shanmugasundar G, Chohan JS, Abualigah L. Many-Objective Multi-Verse Optimizer (MaOMVO): a novel algorithm for solving complex many-objective engineering problems. J Inst Eng (India): Ser C. 2024:1–36.

37. Kalita K, Jangir P, Pandya SB, Shanmugasundar G, Abualigah L. Unveiling the Many-Objective Dragonfly Algorithm’s (MaODA) efficacy in complex optimization. Evol Intel. 2024;17(5–6):3505–33.

38. Patel P, Adalja D, Mashru N, Jangir P, Arpita , Jangid R, et al. Many-objective cheetah optimizer: a novel paradigm for solving complex engineering problems. Int J Comput Intell Syst. 2025;18(1).

39. Chandrasekharan S, Panda RC, Swaminathan BN. Modeling, identification, and control of coal-fired thermal power plants. Reviews in Chemical Engineering. 2014;30(2):217–32.

40. Baskoro FR, Takahashi K, Morikawa K, Nagasawa K. Multi-objective optimization on total cost and carbon dioxide emission of coal supply for coal-fired power plants in Indonesia. Socio-Economic Planning Sciences. 2022;81:101185.

41. Chen R, Wu B, Wang H, Tong H, Yan F. A Q-learning based NSGA-II for dynamic flexible job shop scheduling with limited transportation resources. Swarm and Evolutionary Computation. 2024;90:101658.

42. Singh H, Kumar S, Mishra R, Mohapatra SK, Singh A, Kumar S. Flow characteristics of microwave treated Indian coal: a deep learning modelling. Advanced Powder Technology. 2023;34(10):104202.

43. Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation. 2002;6(2):182–97.

44. Deb K, Jain H. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: solving problems with box constraints. IEEE Trans Evol Computat. 2014;18(4):577–601.

45. Asghari A, Sohrabi MK. Bi-objective cloud resource management for dependent tasks using Q-learning and NSGA-3. J Ambient Intell Human Comput. 2022;15(1):197–217.

46. Yang H, Wang Z, Gao Y, Zhou W. Bi-objective multi-mode resource-constrained multi-project scheduling using combined NSGA II and Q-learning algorithm. Applied Soft Computing. 2024;152:111201.

47. Li P, Xue Q, Zhang Z, Chen J, Zhou D. Multi-objective energy-efficient hybrid flow shop scheduling using Q-learning and GVNS driven NSGA-II. Computers & Operations Research. 2023;159:106360.

48. Qi R, Li J, Wang J, Jin H, Han Y. QMOEA: A Q-learning-based multiobjective evolutionary algorithm for solving time-dependent green vehicle routing problems with time windows. Information Sciences. 2022;608:178–201.

49. Qingfu Zhang, Hui Li. MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Computat. 2007;11(6):712–31.

50. Kapelan Z, Savic DA, Walters GA, Babayan AV. Risk- and robustness-based solutions to a multi-objective water distribution system rehabilitation problem under uncertainty. Water Sci Technol. 2006;53(1):61–75. pmid:16532736

51. Yao L, Chen J, Wang L, Li R, Luo H, Yi J. Multi-objective optimization driven by preponderant individuals and symmetric sampling for operational parameter design in aluminum electrolysis process. Swarm and Evolutionary Computation. 2024;87:101574.

52. Gao S, Ren X, Zhang Y. Improvement of multi-objective evolutionary algorithm and optimization of mechanical bearing. Engineering Applications of Artificial Intelligence. 2023;120:105889.

53. Liu R, Li J, Mu C, Jiao L. A coevolutionary technique based on multi-swarm particle swarm optimization for dynamic multi-objective optimization. European Journal of Operational Research. 2017;261(3):1028–51.

54. Huband S, Hingston P, Barone L, While L. A review of multiobjective test problems and a scalable test problem toolkit. IEEE Trans Evol Computat. 2006;10(5):477–506.

55. Tian Y, Cheng R, Zhang X, Jin Y. PlatEMO: a MATLAB platform for evolutionary multi-objective optimization [educational forum]. IEEE Comput Intell Mag. 2017;12(4):73–87.

56. Tian Y, Zhu W, Zhang X, Jin Y. A practical tutorial on solving optimization problems via PlatEMO. Neurocomputing. 2023;518:190–205.

57. Kumar A, Wu G, Ali MZ, Luo Q, Mallipeddi R, Suganthan PN, et al. A Benchmark-suite of real-world constrained multi-objective optimization problems and some baseline results. Swarm and Evolutionary Computation. 2021;67:100961.

58. Farias LRC, Araújo AFR. An inverse modeling constrained multi-objective evolutionary algorithm based on decomposition. In: 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 2024. p. 3727–32. https://doi.org/10.1109/smc54092.2024.10831275

59. Chu X, Ming F, Gong W. Competitive multitasking for computational resource allocation in evolutionary constrained multi-objective optimization. IEEE Transactions on Evolutionary Computation. 2024.

60. Zou J, Sun R, Liu Y, Hu Y, Yang S, Zheng J, et al. A multipopulation evolutionary algorithm using new cooperative mechanism for solving multiobjective problems with multiconstraint. IEEE Trans Evol Computat. 2024;28(1):267–80.

61. Ming F, Gong W, Wang L, Gao L. Constrained multiobjective optimization via multitasking and knowledge transfer. IEEE Trans Evol Computat. 2024;28(1):77–89.

Word count: 8912

Show less

© 2025 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Reinforcement learning-enhanced multi-objective optimization for sustainable coal blending in thermal power plants

Content area

Abstract

Full text

1 Introduction

2 Coal blending optimization model

2.1 Objective functions

2.2 Parameter calibration

2.3 Decision variables

3 The proposed QLNSGA-II algorithm

3.1 Standard NSGA-II

3.2 Q-Learning enhanced NSGA-II

3.2.1 State-action-reward framework.

3.2.2 Dynamic operator selection.

3.2.3 Adaptive constraint handling.

4 Evaluation of the proposed QLNSGA-II

4.1 Results and discussion on IGD and HV performance

4.2 Results and discussion on WFG and UF test functions

4.3 Performance on real-world multi-objective problems

5 Application of the QLNSGA-II on a Huaneng Yingkou power plant case study

5.1 Experimental results and analysis

5.2 Economic benefits and industrial impact

6 Conclusion

References