Author:
Marcell Nagy Budapesti Műszaki és Gazdaságtudományi Egyetem, Matematika Intézet, Sztochasztika Tanszék Budapest Magyarország; Budapest University of Technology and Economics, Institute of Mathematics, Department of Stochastics Budapest Hungary;
eKRÉTA Informatikai Zrt. Budapest Magyarország; eKRÉTA Infromatics Budapest Hungary

Search for other papers by Marcell Nagy in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0001-5666-7777
Open access

Összefoglalás.

A hallgatói lemorzsolódás az egyik legégetőbb probléma a felsőoktatásban. Ebben a munkában a lemorzsolódás előrejelzésén keresztül bemutatjuk, hogyan tudják segíteni a felsőoktatás résztvevőit a magyarázható mesterséges intelligencia (XAI) eszközök, mint például a permutációs fontosság, a parciális függőségi ábra és a SHAP. Végül pedig kitérünk a kutatás gyakorlati hasznosulásának lehetőségeire, például, hogy az egyéni előrejelzések magyarázata hogyan teszi lehetővé a személyre szabott beavatkozást. Az elemzések során azt találtuk, hogy a középiskolai tanulmányi átlag bír a legnagyobb prediktív erővel a végzés tényére vonatkozóan. Továbbá annak ellenére, hogy egy műszaki egyetem adatait elemeztük, azt találtuk, hogy a humán tárgyaknak is nagy inkrementális prediktív erejük van a végzés tényére vonatkozóan a reál tárgyakhoz képest.

Summary.

Delayed completion and student drop-out are some of the most critical problems in higher education, especially regarding STEM programs. A high drop-out rate induces both individual and economic loss, hence a detailed investigation of the main reasons for dropping out is warranted. Recently, there has been a lot of interest in the use of machine learning methods for the early detection of students at risk of dropping out. However, there has not been much debate on the use of interpretable machine learning (IML) and explainable artificial intelligence (XAI) technologies for dropout prediction. In this paper, we show how IML and XAI techniques can assist educational stakeholders in dropout prediction using data from the Budapest University of Technology and Economics. We demonstrate that complex black-box machine learning algorithms, for example CatBoost, are able to effectively detect at-risk student using only pre-enrollment achievement measures, but they lack interpretability. We demonstrate how the predictions can be explained both globally and locally using IML methods including permutation importance (PI), partial dependence plot (PDP), LIME, and SHAP values.

Using global interpretations, we have found that the factor that has the greatest impact on academic performance is the high school grade point average, which measures general knowledge by taking into account grades in history, mathematics, Hungarian language and literature, a foreign language and a science subject. However, we also found that both mathematics and the subject of choice are among the most important variables, which suggests that program-specific knowledge is not negligible and complements general knowledge. We discovered that students are more likely to drop out if they do not start their university studies immediately after leaving secondary school. Using a partial dependence plot, we showed that humanities also have incremental predictive power, despite the fact that this analysis is based on data from a technical university. Finally, we also discuss the potential practical applications of our work, such as how the explanation of individual predictions allows for personalized interventions, for example by offering appropriate remedial courses and tutoring sessions. Our approach is unique in that we not only estimate the probability of dropping out, but also interpret the model and provide explanations for each prediction. As a result, this framework can be used in several fields. By predicting which majors they could be most successful in based on high school performance indicators, it might, for instance, assist high school students in selecting the appropriate programs at universities and hence this way it could be used for career assistance. Through the explanations of local predictions, the framework provided can also assist students in identifying the skills they need to develop to succeed in their university studies.

  • 1

    Adadi, A., & Berrada, M. (2018) Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access, Vol. 66. pp. 52138–52160.

  • 2

    Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019) Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  • 3

    Altmann, A., Toloş i, L., Sander, O., & Lengauer, T. (2010) Permutation importance: a corrected feature importance measure. Bioinformatics, Vol. 26. Issue 10. pp. 1340–1347.

  • 4

    Alyahyan, E., & Düş tegör, D. (2020) Predicting academic success in higher education: literature review and best practices. International Journal of Educational Technology in Higher Education, Vol. 17. pp. 1–21.

  • 5

    Avella, J. T., Kebritchi, M., Nunn, S. G., & Kanai, T. (2016) Learning analytics methods, benefits, and challenges in higher education: A systematic literature review. Online Learning, Vol. 20. Issue 2. pp. 13–29.

  • 6

    Baranyi, M., & Molontay, R. (2021) Comparing the effectiveness of two remedial mathematics courses using modern regression discontinuity techniques. Interactive Learning Environments, Vol. 29. pp. 247–269.

  • 7

    Baranyi, M., Nagy, M., & Molontay, R. (2020) Interpretable Deep Learning for University Dropout Prediction. Proceedings of the 21st Annual Conference on Information Technology Education, pp. 13–19.

  • 8

    Behr, A., Giese, M., Teguim, K. H., Theune, K. (2020) Early prediction of university dropouts–a random forest approach. Jahrbücher für Nationalökonomie und Statistik, Vol. 240. Issue 6. pp. 743–789.

  • 9

    Cano, A., & Leonard, J. D. (2019) Interpretable multiview early warning system adapted to underrepresented student populations. IEEE Transactions on Learning Technologies, Vol. 12. No. 2. pp. 198–211.

  • 10

    Coussement, K., Phan, M., De Caigny, A., Benoit, D. F., & Raes, A. (2020) Predicting student dropout in subscription-based online learning environments: The beneficial impact of the logit leaf model. Decision Support Systems, Vol. 135. 113325. https://doi.org/10.1016/j.dss.2020.113325

  • 11

    Daniel, B. K. (2017) Overview of big data and analytics in higher education. In: Big data and learning analytics in higher education. pp. 1–4. Springer

  • 12

    Du, M., Liu, N., & Hu, X. (2019) Techniques for interpretable machine learning. Communications of the ACM, Vol. 63. No. 1. pp. 68–77.

  • 13

    Dutt, A., Ismail, M. A., & Herawan, T. (2017) A systematic review on educational data mining. IEEE Access, Vol. 5. pp. 15991–16005.

  • 14

    Fisher, A., Rudin, C., & Dominici, F. (2019) All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. J. Mach. Learn. Res., Vol. 20. No. 177. pp. 1–81.

  • 15

    Freitas, F. A., Vasconcelos, F. F., Peixoto, S. A., Hassan, M. M., Dewan, M., Albuquerque, V. H., & Rebouças Filho, P. P. (2020) IoT System for School Dropout Prediction Using Machine Learning Techniques Based on Socioeconomic Data. Electronics, Vol. 9. No. 10. 1613.

  • 16

    Greenwell, B. M., Boehmke, B. C., & McCarthy, A. J. (2018) A simple and effective model-based variable importance measure. arXiv preprint arXiv:1805.04755.

  • 17

    Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G.-Z. (2019) XAI—Explainable artificial intelligence. Science Robotics, Vol. 4. No. 37. https://doi.org/10.1126/scirobotics.aay7120

  • 18

    Helal, S., Li, J., Liu, L., Ebrahimie, E., Dawson, S., & Murray, D. J. (2019) Identifying key factors of student academic performance by subgroup discovery. International Journal of Data Science and Analytics, Vol. 7. pp. 227–245.

  • 19

    Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-Colorado, B. (2019) A systematic review of deep learning approaches to educational data mining. Complexity, Vol. 2019. https://doi.org/10.1155/2019/1306039

  • 20

    Karimi, A.-H., Barthe, G., Balle, B., & Valera, I. (2020) Model-agnostic counterfactual explanations for consequential decisions. International Conference on Artificial Intelligence and Statistics. pp. 895–905.

  • 21

    Karlos, S., Kostopoulos, G., & Kotsiantis, S. (2020) Predicting and interpreting students’ grades in distance higher education through a semi-regression method. Applied Sciences, Vol. 10. No. 23. 8413.

  • 22

    Kumar, I. E., Venkatasubramanian, S., Scheidegger, C., & Friedler, S. (2020) Problems with Shapley-value-based explanations as feature importance measures. International Conference on Machine Learning, Vol. 119. pp. 5491–5500.

  • 23

    Latif, A., Choudhary, A. I., & Hammayun, A. A. (2015) Economic effects of student dropouts: A comparative study. Journal of Global Economics. Vol. 3. No. 2. pp. 1–4. https://doi.org/10.4172/2375-4389.1000137

  • 24

    Lee, S., & Chung, J. Y. (2019) The machine learning-based dropout early warning system for improving the performance of dropout prediction. Applied Sciences, Vol. 9. No. 15. 3093.

  • 25

    Lester, J., Klein, C., Rangwala, H., & Johri, A. (2017) Learning Analytics in Higher Education: ASHE Higher Education Report, Vol. 43. No. 5.

  • 26

    Looveren, A. V., & Klaise, J. (2021) Interpretable counterfactual explanations guided by prototypes. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 650–665.

  • 27

    Lundberg, S. M., & Lee, S.-I. (2017) A Unified Approach to Interpreting Model Predictions. In: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (eds): Advances in Neural Information Processing Systems 30 pp. 4765–4774. Curran Associates, Inc.

  • 28

    Márquez-Vera, C., Cano, A., Romero, C., Noaman, A. Y., Mousa Fardoun, H., & Ventura, S. (2016) Early dropout prediction using data mining: a case study with high school students. Expert Systems, Vol. 33. No. 1. pp. 107–124.

  • 29

    Mingyu, Z., Sutong, W., Yanzhang, W., & Dujuan, W. (2021) An interpretable prediction method for university student academic crisis warning. Complex & Intelligent Systems, Vol. 8. pp. 323–336.

  • 30

    Molnar, C. (2020) Interpretable Machine Learning. Lulu.com.

  • 31

    Molnar, C., König, G., Herbinger, J., Freiesleben, T., Dandl, S., Scholbeck, C. A., … Bischl, B. (2020) General Pitfalls of Model-Agnostic Interpretation Methods for Machine Learning Models. arXiv preprint arXiv:2007.04131.

  • 32

    Molontay, R., & Nagy, M. (2022) How to improve the predictive validity of a composite admission score? A case study from Hungary. Assessment & Evaluation in Higher Education, pp. 1–19.

  • 33

    Mothilal, R. K., Sharma, A., & Tan, C. (2020) Explaining machine learning classifiers through diverse counterfactual explanations. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 607–617.

  • 34

    Nagrecha, S., Dillon, J. Z., & Chawla, N. V. (2017) MOOC Dropout Prediction: Lessons Learned from Making Pipelines Interpretable. Proceedings of the 26th International Conference on World Wide Web Companion, pp. 351–359.

  • 35

    Nagy, M., & Molontay, R. (2021) Comprehensive analysis of the predictive validity of the university entrance score in Hungary. Assessment & Evaluation in Higher Education, Vol. 46. No. 8. pp. 1235–1253.

  • 36

    Nagy, M., Molontay, R., & Szabó, M. (2019) A web application for predicting academic performance and identifying the contributing factors. 47th Annual Conference of SEFI, pp. 1794–1806.

  • 37

    Niculescu-Mizil, A., & Caruana, R. (2005) Predicting good probabilities with supervised learning. Proceedings of the 22nd international conference on Machine learning, pp. 625–632.

  • 38

    Platt, J. (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, Vol. 10. pp. 61–74.

  • 39

    Powell, W. W., & Snellman, K. (2004) The knowledge economy. The Annual Review of Sociology, Vol. 30. pp. 199–220.

  • 40

    Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018) CatBoost: unbiased boosting with categorical features. Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS’18), pp. 1–11.

  • 41

    Ranjeeth, S., Latchoumi, T. P., & Paul, P. V. (2020) A survey on predictive models of learning analytics. Procedia Computer Science, Vol. 167. pp. 37–46.

  • 42

    Rastrollo-Guerrero, J. L., Gomez-Pulido, J. A., & Durán-Domínguez, A. (2020) Analyzing and predicting students’ performance by means of machine learning: A review. Applied Sciences, Vol. 10. No. 3. 1042.

  • 43

    Ribeiro, M. T., Singh, S., & Guestrin, C. (2016) “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016. pp. 1135–1144.

  • 44

    Rovira, S., Puertas, E., & Igual, L. (2017) Data-driven system to predict academic grades and dropout. PLoS ONE, Vol. 12. No. 2. e0171207.

  • 45

    Sargsyan, A., Karapetyan, A., Woon, W. L., & Alshamsi, A. (2020) Explainable AI as a Social Microscope: A Case Study on Academic Performance. International Conference on Machine Learning, Optimization, and Data Science, pp. 257–268.

  • 46

    Séllei, B., Stumphauser, N., & Molontay, R. (2021) Traits versus Grades—The Incremental Predictive Power of Positive Psychological Factors over Pre-Enrollment Achievement Measures on Academic Performance. Applied Sciences, Vol. 11. Vol. 4. 1744.

  • 47

    Slack, D., Hilgard, S., Jia, E., Singh, S., & Lakkaraju, H. (2020) Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 180–186.

  • 48

    Smith, B. I., Chimedza, C., & Bührmann, J. H. (2021) Individualized help for at-risk students using model-agnostic and counterfactual explanations. Education and Information Technologies, Vol. 27. pp. 1539–1558.

  • 49

    Varga, E. B., & Sátán, Á. (2021) Detecting at-risk students on Computer Science bachelor programs based on pre-enrollment characteristics. Hungarian Educational Research Journal, Vol. 11. No. 3. pp. 297–310.

  • 50

    Vultureanu-Albiş I, A., & Bădică, C. (2021) Improving Students’ Performance by Interpretable Explanations using Ensemble Tree-Based Approaches. 2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI), pp. 215–220.

  • 51

    Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019) Systematic review of research on artificial intelligence applications in higher education–where are the educators? International Journal of Educational Technology in Higher Education, Vol. 16. pp. 1–27.

  • 52

    Zeleny, K., Molontay, R., & Szabó, M. (2021) A kollégiumi lét egyetemi teljesítményre gyakorolt hatásának vizsgálata. Statisztikai Szemle, Vol. 99. pp. 46–79.

  • 53

    Zhang, W., Zhou, Y., & Yi, B. (2019) An Interpretable Online Learner’s Performance Prediction Model Based on Learning Analytics. Proceedings of the 2019 11th International Conference on Education Technology and Computers, pp. 148–154.

  • 54

    Zwick, R., & Himelfarb, I. (2011) The effect of high school socioeconomic status on the predictive validity of SAT scores and high school grade-point average. Journal of Educational Measurement, Vol. 48. No. 2. pp. 101–121.

  • Collapse
  • Expand
The author instructions are available in separate PDFs.
Please, download the Hungarian version from HERE, the English version from HERE.
The Submissions templates are available in MS Word.
For articles in Hungarian, please download it from HERE and for articles in English from HERE.
 

Editor-in-Chief:

Founding Editor-in-Chief:

  • Tamás NÉMETH

Managing Editor:

  • István SABJANICS (Ministry of Interior, Budapest, Hungary)

Editorial Board:

  • Attila ASZÓDI (Budapest University of Technology and Economics)
  • Zoltán BIRKNER (University of Pannonia)
  • Valéria CSÉPE (Research Centre for Natural Sciences, Brain Imaging Centre)
  • Gergely DELI (University of Public Service)
  • Tamás DEZSŐ (Migration Research Institute)
  • Imre DOBÁK (University of Public Service)
  • Marcell Gyula GÁSPÁR (University of Miskolc)
  • József HALLER (University of Public Service)
  • Charaf HASSAN (Budapest University of Technology and Economics)
  • Zoltán GYŐRI (Hungaricum Committee)
  • János JÓZSA (Budapest University of Technology and Economics)
  • András KOLTAY (National Media and Infocommunications Authority)
  • Gábor KOVÁCS (University of Public Service)
  • Levente KOVÁCS buda University)
  • Melinda KOVÁCS (Hungarian University of Agriculture and Life Sciences (MATE))
  • Miklós MARÓTH (Avicenna Institue of Middle Eastern Studies )
  • Judit MÓGOR (Ministry of Interior National Directorate General for Disaster Management)
  • József PALLO (University of Public Service)
  • István SABJANICS (Ministry of Interior)
  • Péter SZABÓ (Hungarian University of Agriculture and Life Sciences (MATE))
  • Miklós SZÓCSKA (Semmelweis University)

Ministry of Interior
Science Strategy and Coordination Department
Address: H-2090 Remeteszőlős, Nagykovácsi út 3.
Phone: (+36 26) 795 906
E-mail: scietsec@bm.gov.hu

DOAJ

2023  
CrossRef Documents 32
CrossRef Cites 15
Days from submission to acceptance 59
Days from acceptance to publication 104
Acceptance Rate 81%

2022  
CrossRef Documents 38
CrossRef Cites 10
Days from submission to acceptance 54
Days from acceptance to publication 78
Acceptance Rate 84%

2021  
CrossRef Documents 46
CrossRef Cites 0
Days from submission to acceptance 33
Days from acceptance to publication 85
Acceptance Rate 93%

2020  
CrossRef Documents 13
CrossRef Cites 0
Days from submission to acceptance 30
Days from acceptance to publication 62
Acceptance Rate 93%

Publication Model Gold Open Access
Submission Fee none
Article Processing Charge none

Scientia et Securitas
Language Hungarian
English
Size A4
Year of
Foundation
2020
Volumes
per Year
1
Issues
per Year
4
Founder Academic Council of Home Affairs and
Association of Hungarian PhD and DLA Candidates
Founder's
Address
H-2090 Remeteszőlős, Hungary, Nagykovácsi út 3.
H-1055 Budapest, Hungary Falk Miksa utca 1.
Publisher Akadémiai Kiadó
Publisher's
Address
H-1117 Budapest, Hungary 1516 Budapest, PO Box 245.
Responsible
Publisher
Chief Executive Officer, Akadémiai Kiadó
Applied
Licenses
CC-BY 4.0
CC-BY-NC 4.0
ISSN ISSN 2732-2688

Monthly Content Usage

Abstract Views Full Text Views PDF Downloads
Dec 2023 0 229 19
Jan 2024 0 155 18
Feb 2024 0 190 12
Mar 2024 0 184 28
Apr 2024 0 19 12
May 2024 0 35 10
Jun 2024 0 0 0