A Student Dropout Risk Prediction Model Based on Supervised Learning Techniques and Large Language Models (LLMs)
DOI:
https://doi.org/10.55677/ijhrsss/04-2025-Vol02I4Keywords:
Large Language Model (LLM), prediction, dropout risk, machine learning, supervised learningAbstract
Early prediction of student dropout risk is an essential but challenging task in Vietnamese higher education. This study proposes a novel model combining supervised machine learning and large language models (LLMs) to predict student dropout risk. The model utilizes structured information and unstructured data to analyze influencing factors comprehensively. By converting student data into natural language and using pre-trained LLMs, the model can understand the context and complex relationships between factors, thereby improving prediction accuracy compared to traditional methods. The study's main contributions are to propose architecture integrating LLMs into the dropout risk classification problem, identify critical factors influencing the decision to drop out and discuss the potential application of the model in practice to support early intervention.
References
Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., & Nshimyumukiza, P. C. (2022). Predicting student's dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Computers and Education: Artificial Intelligence, 3, 100066. https://doi.org/10.1016/j.caeai.2022.100066
Márquez-Vera, C., Cano, A., Romero, C., Noaman, A. Y., Fardoun, H. M., & Ventura, S. (2016). Early dropout prediction using data mining: A case study with high school students. Expert Systems, 33(1), 107–124. https://doi.org/10.1111/exsy.12135
Kloft, M., Stiehler, F., Zheng, Z., & Pinkwart, N. (2014). Predicting MOOC dropout over weeks using machine learning methods. Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, 60–65. https://doi.org/10.3115/v1/W14-4111
Srinivas, K., Raghunathan, B. K., & Govardhan, A. (2013). Predicting student performance: A statistical and data mining approach. International Journal of Computer Applications, 63(8), 35–39. https://doi.org/10.5120/10489-5242
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171–4186. https://doi.org/10.18653/v1/N19-1423
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., & Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
Sulak, S. A., & Koklu, N. (2024). Predicting student dropout using machine learning algorithms. Intelligent Methods in Engineering Sciences, 3(3), 91–98. https://doi.org/10.58190/imiens.2024.103
Hassan, M. A., Muse, A. H., & Nadarajah, S. (2024). Predicting student dropout rates using supervised machine learning: Insights from the 2022 National Education Accessibility Survey in Somaliland. Applied Sciences, 14(17), 7593. https://doi.org/10.3390/app14177593
Durrani, U. K., Malik, A., Akpinar, M., Dordevic, M., Togher, M., & Aoudi, S. (2024). Assessing the effectiveness of large language models in predicting student dropout rates. Proceedings of the International Conference on Advanced Machine Learning and Applications, 60–65.
Psyridou, M., Prezja, F., Torppa, M., Lerkkanen, M.-K., Poikkeus, A.-M., & Vasalampi, K. (2024). Machine learning predicts upper secondary education dropout as early as the end of primary school. Scientific Reports, 14, 12956. https://doi.org/10.1038/s41598-024-63629-0
Villar, A., & de Andrade, C. R. V. (2024). Supervised machine learning algorithms for predicting student dropout and academic success: A comparative study. Discover Artificial Intelligence, 4, Article 2. https://doi.org/10.1007/s44163-023-00079-z
Kim, H., & Lee, J. (2023). University student dropout prediction using pretrained language models. Applied Sciences, 13(12), 7073. https://doi.org/10.3390/app13127073
Sulak, S. A., & Koklu, N. (2023). Factors influencing dropout students in higher education. Education Research International, 2023, Article 7704142. https://doi.org/10.1155/2023/7704142
Celestin, M., & Faustin, M. (2024). School dropout and students’ academic performance in public twelve years basic education schools of Rwanda. Journal of Education, 7(2), 20–33. https://doi.org/10.53819/81018102t5318
Okoye, K., Nganji, J. T., Escamilla, J., & Hosseini, S. (2024). Machine learning model (RG-DMML) and ensemble algorithm for prediction of students' retention and graduation in education. Computers and Education: Artificial Intelligence, 6, 100205. https://doi.org/10.1016/j.caeai.2024.100205
Arizmendi, C.J., Bernacki, M.L., Raković, M. et al.2023. Predicting student outcomes using digital logs of learning behaviors: Review, current standards, and suggestions for future work. Behav Res 55, 3026–3054 (2023). https://doi.org/10.3758/s13428-022-01939-9
Rahman, M. S. (2016). The advantages and disadvantages of using qualitative and quantitative approaches and methods in language "testing and assessment" research: A literature review. Journal of Education and Learning, 6(1), 102–112. https://doi.org/10.5539/jel.v6n1p102
Ozdemir, N.K., Kemer, F.N.A., Arslan, A. et al. A Qualitative Study of Unveiling School Dropout Complexity in Türkiye. Child Ind Res 17, 1001–1021 (2024). https://doi.org/10.1007/s12187-024-10116-7
Ersozlu, Z., Taheri, S. & Koch, I. A review of machine learning methods used for educational data. Educ Inf Technol 29, 22125–22145 (2024). https://doi.org/10.1007/s10639-024-12704-0
Vaarma, M., & Li, H. (2024). Predicting student dropouts with machine learning: An empirical study in Finnish higher education. Technology in Society, 76, 102474. https://doi.org/10.1016/j.techsoc.2024.102474
Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., & Nshimyumukiza, P. C. (2022). Predicting student's dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Computers and Education: Artificial Intelligence, 3, 100066. https://doi.org/10.1016/j.caeai.2022.100066
Galitsky, B. A. (2023). Truth-O-Meter: Collaborating with LLM in fighting its hallucinations. In B. Galitsky (Ed.), Developing Enterprise Chatbots (pp. 85–104). Elsevier. https://doi.org/10.1016/B978-0-443-29246-0.00004-3
Thapa, S., Shiwakoti, S., Shah, S.B. et al.,2025. Large language models (LLM) in computational social science: prospects, current state, and challenges. Soc. Netw. Anal. Min. 15, 4 (2025). https://doi.org/10.1007/s13278-025-01428-9
Dupéré, V., Leventhal, T., Dion, E., Crosnoe, R., Archambault, I., & Janosz, M. (2020). School-based extracurricular activity involvement and high school dropout among at-risk students: Consistency matters. Applied Developmental Science, 24(2), 129–146. https://doi.org/10.1080/10888691.2020.1796665
Tsolou, O. and Babalis, T. (2020) The Contribution of Family Factors to Dropping Out of School in Greece. Creative Education, 11, 1375-1401. doi: 10.4236/ce.2020.118101.
Lessky, F., & Unger, M. (2022). Working long hours while studying: A higher risk for First-in-Family students and students of particular fields of study? European Journal of Higher Education, 13(3), 347–366. https://doi.org/10.1080/21568235.2022.2047084
Iqbal, A., Iftikhar, M., & Hussain, T. (2023). Impact of social media use on the mental health of university students. International Journal of Academic Research in Business and Social Sciences, 13(9), 1234–1245. https://doi.org/10.6007/IJARBSS/v13-i9/12345
Watson, T. N., & Bogotch, I. (2016). (Re)Imagining school as community: Lessons learned from teachers. School Community Journal, 26(1), 93–114.
Khalid, R. Z., Ullah, A., Khan, A., Khan, A., & Inayat, M. H. (2023). Comparison of standalone and hybrid machine learning models for prediction of critical heat flux in vertical tubes. Energies, 16(7), 3182. https://doi.org/10.3390/en16073182
Wan, G., Lu, Y., Wu, Y., Hu, M., & Li, S. (2024). Large language models for causal discovery: Current landscape and future directions. arXiv preprint arXiv:2402.11068. https://doi.org/10.48550/arXiv.2402.11068
Mumuni, A., & Mumuni, F. (2024). Automated data processing and feature engineering for deep learning and big data applications: A survey. Journal of Information and Intelligence, 3(1), 1–15. https://doi.org/10.1016/j.jii.2024.01.002
Zhu, X., Li, Q., Cui, L., & Liu, Y. (2024). Large language model enhanced text-to-SQL generation: A survey. arXiv preprint arXiv:2410.06011
De Laat, P.B. (2018). Algorithmic Decision-Making Based on Machine Learning from Big Data: Can Transparency Restore Accountability?. Philos. Technol. 31, 525–541 (2018). https://doi.org/10.1007/s13347-017-0293-z
Nagy, M., Molontay, R. (2024). Interpretable Dropout Prediction: Towards XAI-Based Personalized Intervention. Int J Artif Intell Educ 34, 274–300 (2024). https://doi.org/10.1007/s40593-023-00331-8
Lee, S., & Chung, J. Y. (2019). The machine learning-based dropout early warning system for improving the performance of dropout prediction. Applied Sciences, 9(15), 3093. https://doi.org/10.3390/app9153093
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Human Research and Social Science Studies

This work is licensed under a Creative Commons Attribution 4.0 International License.