A Student Dropout Risk Prediction Model Based on Supervised Learning Techniques and Large Language Models (LLMs)

Author's Information:

Dat Tao Huu

Saigon University
 

Vol 02 No 04 (2025):Volume 02 Issue 04 April 2025

Page No.: 129-135

Abstract:

Early prediction of student dropout risk is an essential but challenging task in Vietnamese higher education. This study proposes a novel model combining supervised machine learning and large language models (LLMs) to predict student dropout risk. The model utilizes structured information and unstructured data to analyze influencing factors comprehensively. By converting student data into natural language and using pre-trained LLMs, the model can understand the context and complex relationships between factors, thereby improving prediction accuracy compared to traditional methods. The study's main contributions are to propose architecture integrating LLMs into the dropout risk classification problem, identify critical factors influencing the decision to drop out and discuss the potential application of the model in practice to support early intervention.

KeyWords:

Large Language Model (LLM), prediction, dropout risk, machine learning, supervised learning

References:

  1. Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., & Nshimyumukiza, P. C. (2022). Predicting student's dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Computers and Education: Artificial Intelligence, 3, 100066. https://doi.org/10.1016/j.caeai.2022.100066
  2. Márquez-Vera, C., Cano, A., Romero, C., Noaman, A. Y., Fardoun, H. M., & Ventura, S. (2016). Early dropout prediction using data mining: A case study with high school students. Expert Systems, 33(1), 107–124. https://doi.org/10.1111/exsy.12135
  3. Kloft, M., Stiehler, F., Zheng, Z., & Pinkwart, N. (2014). Predicting MOOC dropout over weeks using machine learning methods. Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, 60–65. https://doi.org/10.3115/v1/W14-4111
  4. Srinivas, K., Raghunathan, B. K., & Govardhan, A. (2013). Predicting student performance: A statistical and data mining approach. International Journal of Computer Applications, 63(8), 35–39. https://doi.org/10.5120/10489-5242
  5. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171–4186. https://doi.org/10.18653/v1/N19-1423
  6. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., & Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
  7. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
  8. Sulak, S. A., & Koklu, N. (2024). Predicting student dropout using machine learning algorithms. Intelligent Methods in Engineering Sciences, 3(3), 91–98. https://doi.org/10.58190/imiens.2024.103
  9. Hassan, M. A., Muse, A. H., & Nadarajah, S. (2024). Predicting student dropout rates using supervised machine learning: Insights from the 2022 National Education Accessibility Survey in Somaliland. Applied Sciences, 14(17), 7593. https://doi.org/10.3390/app14177593
  10. Durrani, U. K., Malik, A., Akpinar, M., Dordevic, M., Togher, M., & Aoudi, S. (2024). Assessing the effectiveness of large language models in predicting student dropout rates. Proceedings of the International Conference on Advanced Machine Learning and Applications, 60–65.
  11. Psyridou, M., Prezja, F., Torppa, M., Lerkkanen, M.-K., Poikkeus, A.-M., & Vasalampi, K. (2024). Machine learning predicts upper secondary education dropout as early as the end of primary school. Scientific Reports, 14, 12956. https://doi.org/10.1038/s41598-024-63629-0
  12. Villar, A., & de Andrade, C. R. V. (2024). Supervised machine learning algorithms for predicting student dropout and academic success: A comparative study. Discover Artificial Intelligence, 4, Article 2. https://doi.org/10.1007/s44163-023-00079-z
  13. Kim, H., & Lee, J. (2023). University student dropout prediction using pretrained language models. Applied Sciences, 13(12), 7073. https://doi.org/10.3390/app13127073
  14. Sulak, S. A., & Koklu, N. (2023). Factors influencing dropout students in higher education. Education Research International, 2023, Article 7704142. https://doi.org/10.1155/2023/7704142
  15. Celestin, M., & Faustin, M. (2024). School dropout and students’ academic performance in public twelve years basic education schools of Rwanda. Journal of Education, 7(2), 20–33. https://doi.org/10.53819/81018102t5318
  16. Okoye, K., Nganji, J. T., Escamilla, J., & Hosseini, S. (2024). Machine learning model (RG-DMML) and ensemble algorithm for prediction of students' retention and graduation in education. Computers and Education: Artificial Intelligence, 6, 100205. https://doi.org/10.1016/j.caeai.2024.100205
  17. Arizmendi, C.J., Bernacki, M.L., Raković, M. et al.2023. Predicting student outcomes using digital logs of learning behaviors: Review, current standards, and suggestions for future work. Behav Res 55, 3026–3054 (2023). https://doi.org/10.3758/s13428-022-01939-9
  18. Rahman, M. S. (2016). The advantages and disadvantages of using qualitative and quantitative approaches and methods in language "testing and assessment" research: A literature review. Journal of Education and Learning, 6(1), 102–112. https://doi.org/10.5539/jel.v6n1p102
  19. Ozdemir, N.K., Kemer, F.N.A., Arslan, A. et al. A Qualitative Study of Unveiling School Dropout Complexity in Türkiye. Child Ind Res 17, 1001–1021 (2024). https://doi.org/10.1007/s12187-024-10116-7
  20. Ersozlu, Z., Taheri, S. & Koch, I. A review of machine learning methods used for educational data. Educ Inf Technol 29, 22125–22145 (2024). https://doi.org/10.1007/s10639-024-12704-0
  21. Vaarma, M., & Li, H. (2024). Predicting student dropouts with machine learning: An empirical study in Finnish higher education. Technology in Society, 76, 102474. https://doi.org/10.1016/j.techsoc.2024.102474
  22. Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., & Nshimyumukiza, P. C. (2022). Predicting student's dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Computers and Education: Artificial Intelligence, 3, 100066. https://doi.org/10.1016/j.caeai.2022.100066
  23. Galitsky, B. A. (2023). Truth-O-Meter: Collaborating with LLM in fighting its hallucinations. In B. Galitsky (Ed.), Developing Enterprise Chatbots (pp. 85–104). Elsevier. https://doi.org/10.1016/B978-0-443-29246-0.00004-3
  24. Thapa, S., Shiwakoti, S., Shah, S.B. et al.,2025. Large language models (LLM) in computational social science: prospects, current state, and challenges. Soc. Netw. Anal. Min. 15, 4 (2025). https://doi.org/10.1007/s13278-025-01428-9
  25. Dupéré, V., Leventhal, T., Dion, E., Crosnoe, R., Archambault, I., & Janosz, M. (2020). School-based extracurricular activity involvement and high school dropout among at-risk students: Consistency matters. Applied Developmental Science, 24(2), 129–146. https://doi.org/10.1080/10888691.2020.1796665
  26. Tsolou, O. and Babalis, T. (2020) The Contribution of Family Factors to Dropping Out of School in Greece. Creative Education, 11, 1375-1401. doi: 10.4236/ce.2020.118101.
  27. Lessky, F., & Unger, M. (2022). Working long hours while studying: A higher risk for First-in-Family students and students of particular fields of study? European Journal of Higher Education, 13(3), 347–366. https://doi.org/10.1080/21568235.2022.2047084
  28. Iqbal, A., Iftikhar, M., & Hussain, T. (2023). Impact of social media use on the mental health of university students. International Journal of Academic Research in Business and Social Sciences, 13(9), 1234–1245. https://doi.org/10.6007/IJARBSS/v13-i9/12345
  29. Watson, T. N., & Bogotch, I. (2016). (Re)Imagining school as community: Lessons learned from teachers. School Community Journal, 26(1), 93–114.
  30. Khalid, R. Z., Ullah, A., Khan, A., Khan, A., & Inayat, M. H. (2023). Comparison of standalone and hybrid machine learning models for prediction of critical heat flux in vertical tubes. Energies, 16(7), 3182. https://doi.org/10.3390/en16073182
  31. Wan, G., Lu, Y., Wu, Y., Hu, M., & Li, S. (2024). Large language models for causal discovery: Current landscape and future directions. arXiv preprint arXiv:2402.11068. https://doi.org/10.48550/arXiv.2402.11068
  32. Mumuni, A., & Mumuni, F. (2024). Automated data processing and feature engineering for deep learning and big data applications: A survey. Journal of Information and Intelligence, 3(1), 1–15. https://doi.org/10.1016/j.jii.2024.01.002
  33. Zhu, X., Li, Q., Cui, L., & Liu, Y. (2024). Large language model enhanced text-to-SQL generation: A survey. arXiv preprint arXiv:2410.06011
  34. De Laat, P.B. (2018). Algorithmic Decision-Making Based on Machine Learning from Big Data: Can Transparency Restore Accountability?. Philos. Technol. 31, 525–541 (2018). https://doi.org/10.1007/s13347-017-0293-z
  35. Nagy, M., Molontay, R. (2024). Interpretable Dropout Prediction: Towards XAI-Based Personalized Intervention. Int J Artif Intell Educ 34, 274–300 (2024). https://doi.org/10.1007/s40593-023-00331-8
  36. Lee, S., & Chung, J. Y. (2019). The machine learning-based dropout early warning system for improving the performance of dropout prediction. Applied Sciences, 9(15), 3093. https://doi.org/10.3390/app9153093