It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Knowledge distillation has become a standard technique for compressing large language models into efficient student models, but existing methods often struggle to balance prediction accuracy with explanation quality. Recent approaches such as Distilling Step-by-Step (DSbS) introduce explanation supervision, yet they apply it in a uniform manner that may not fully exploit the different learning dynamics of prediction and explanation. In this work, we propose a task-structured curriculum learning (TSCL) framework that structures training into three sequential phases: (i) prediction-only, to establish stable feature representations; (ii) joint prediction–explanation, to align task outputs with rationale generation; and (iii) explanation-only, to refine the quality of rationales. This design provides a simple but effective modification to DSbS, requiring no architectural changes and adding negligible training cost. We justify the phase scheduling with ablation studies and convergence analysis, showing that an initial prediction-heavy stage followed by a balanced joint phase improves both stability and explanation alignment. Extensive experiments on five datasets (e-SNLI, ANLI, CommonsenseQA, SVAMP, and MedNLI) demonstrate that TSCL consistently outperforms strong baselines, achieving gains of +1.7–2.6 points in accuracy and 0.8–1.2 in ROUGE-L, corresponding to relative error reductions of up to 21%. Beyond lexical metrics, human evaluation and ERASER-style faithfulness diagnostics confirm that TSCL produces more faithful and informative explanations. Comparative training curves further reveal faster convergence and lower variance across seeds. Efficiency analysis shows less than 3% overhead in wall-clock training time and no additional inference cost, making the approach practical for real-world deployment. This study demonstrates that a simple task-structured curriculum can significantly improve the effectiveness of knowledge distillation. By separating and sequencing objectives, TSCL achieves a better balance between accuracy, stability, and explanation quality. The framework generalizes across domains, including medical NLI, and offers a principled recipe for future applications in multimodal reasoning and reinforcement learning.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Department of Computer Engineering, Faculty of Engineering and Architecture, łzmir Katip Çelebi University, łzmir, 35620, Turkey
2 Department of Computer Engineering, Faculty of Engineering, łzmir Institute of Technology, łzmir, 35430, Turkey





