Content area
High school dropout compromises the right to education and generates significant social and economic losses, highlighting the need for effective strategies to address the problem. In light of this scenario, this research aimed to propose a predictive model for high school dropout among students in the state public school system of Santa Catarina using Machine Learning. To this end, the study sought to answer the following question: how can Data Science, through Machine Learning, support the process of predicting high school dropout among students in the state public school system of Santa Catarina? Characterized as applied or primary in nature, with exploratory and descriptive objectives, a quantitative and qualitative approach, and documentary technical procedures, the research developed and evaluated three predictive models (M0, M1, and M2) based on educational data available at different points in the school year. The Random Forest Classifier algorithm and content analysis were used for variable selection. The data were obtained from the administrative database of the State Department of Education of Santa Catarina and from national public sources. The results indicated a progressive improvement in the predictive capacity of the models, with emphasis on model M2, which achieved an accuracy of 91.85%. It was observed that factors related to the student and the school maintain a relevant role in predicting school dropout throughout the academic year. The structured export of the generated outputs and the researcher’s position within the State Department of Education of Santa Catarina contribute to the applicability and reproducibility of the proposed approach.