University dropout: Prevention patterns through the application of educational data mining
DOI:
https://doi.org/10.7203/relieve.26.1.16061Keywords:
Student environment, Computer learning, Decision trees, Counseling, Feature selectionAbstract
Recently, the use of educational data mining techniques has gained great relevance when applied to performance prediction, creation of predictive retention models, behaviour profiles and school failure, amongst others. For the present paper we applied an attribute selection algorithm to identify the most important factors influencing drop out decision. Decision trees were used to define patterns that can alert an imminent dropout. A tool was adapted and administered online to 300 students from public HEIs, and 200 students from private HEIs currently enrolled on a higher education program. By means of the attribute selection algorithm, 27 relevant factors were found. Within the three main factors, the lack of counselling, an adequate student environment and academic follow-up were recognized, whilst, 7 patterns were found through the decision tree. These included factors such as: student environment, insufficient financial support, experience of an uncomfortable situation and place of career choice, amongst others. Finally, it has been seen that school drop-out does not depend on a single factor but is multifactorial. It is imperative to expand the sample to include other cities. This will enable various algorithms to be applied, providing greater information and leading to the establishment of accurate mechanisms for reducing university drop-out rates, according to the characteristics of the student population in each region.
References
Abarca R., A., & Sánchez V., M. A. (2005). La deserción estudiantil en la educación superior: el caso de la Universidad de Costa Rica. Revista Electrónica "Actualidades Investigativas en Educación", 5, 1-22. https://bit.ly/35TVeLE
Abu-Oda, G. S., & El-Halees, A. M. (2015). Data mining in higher education: university student dropout case study. International Journal of Data Mining y Knowledge Management Process (IJDKP), 5(1), 15-27. https://doi.org/10.5121/ijdkp.2015.5102
Agaoglu, M. (2016). Predicting instructor performance using data mining techniques in higher education. IEEE Access, 4. https://doi.org/10.1109/ACCESS.2016.2568756
Al-Barrak, M. A., & Al-Razgan, M. (2016). Predicting student’s final GPA using decision trees: a case study. International Journal of Information and Education Technology, 6(7), 528-533. https://doi.org/10.7763/IJIET.2016.V6.745
Alkhasawneh, R., & Hargraves, R. H. (2014). Developing a Hybrid Model to Predict Student First Year Retention in STEM Disciplines Using Machine Learning Techniques. Journal of STEM Education: Innovations and Research, 5(3), 35-42. ERIC. https://bit.ly/2Rd04hi
Aulck, L., Velagapudi, N., Blumenstock, J., & West, J. (2017). Predicting Student Dropout in Higher Education. Machine Learning in Social Good Applications, 16-20. https://bit.ly/3aRtae6
Barbosa M. L. M., Serra da Cruz, S. M., & Zimbrão, G. (2014). The Impact of High Dropout Rates in a Large Public Brazilian University: A Quantitative Approach Using Educational Data Mining. 6th International Conference on Computer Supported Education (págs. 124-129). Barcelona, Spain: INSTICC. https://bit.ly/2ZsYFbD
Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. USA: O´Really Media, Inc.
Bishop, C. M. (2007). Pattern recognition and Machine Learning. Singapore: Springer.
Cabrera, L., Bethencour, J. T., Álvarez P. P., & González A. M. (2006). El problema del abandono de los estudios universitarios. RELIEVE, 12(2), 171-203. https://doi.org/10.7203/relieve.12.2.4226
Carvajal O. P., & Trejos C. Á. A. (2016). Revisión de estudios sobre deserción estudiantil en educación superior en Latinoamérica bajo la perspectiva de Pierre Bourdieu. Congresos CLABES. Quito, Ecuador: Escuela Politécnica Nacional. https://bit.ly/2UP9mlT
Cha, G.-W., Kim, Y.-C., Moon, H. J., & Hong, W.-H. (2017). New approach for forecasting demolition waste generation using chisquared automatic interaction detection (CHAID) method. Journal of Cleaner Production, 168, 375-385. https://doi.org/10.1016/j.jclepro.2017.09.025
Chen, W., Xie, X., Peng, J., Wang, J., Duan, Z., & Hong, H. (2017). GIS-based landslide susceptibility modelling: a comparative assessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision tree models. Geomatics, Natural Hazards and Risk, 8(2), 950-973. https://doi.org/10.1080/19475705.2017.1289250
Chiheb, F., Boumahdi, F., Bouarfa, H., & Boukraa, D. (2017). Predicting students’ performance using decision trees: Case of an Algerian University. 2017 International Conference on Mathematics and Information Technology (ICMIT). Adrar, Algeria: IEEE. https://doi.org/10.1109/MATHIT.2017.8259704
Dekker, G. W., Pechenizki, M., & Vleeshouwers, J. M. (2009). Predicting Students Drop Out: A Case Study. 2nd International Conference on Educational Data Mining (págs. 41-50). Cordoba, Spain: International Educational Data Mining Society. https://bit.ly/2ZlH1a3
Delen, D. (2011). Predicting Student Attrition with Data Mining Methods. Journal of College Student Retention: Research, Theory y Practice, 13(1), 17-35. https://doi.org/10.2190/CS.13.1.b
Del Pobil, A. P., Mira, J., & Ali, M. (1998). Tasks and Methods in Applied Artificial Intelligence. 11th International Conference on Industrial and Engineering Applications of Artificial In telligence and Expert Systems. 1416. Castellón, España: Springer.
Estrada-Danell, R. I., Zamarripa-Franco, R. A., Zúñiga-Garay, P. G., & Martínez-Trejo, I. (2016). Aportaciones desde la minería de datos al proceso de captación de matrícula de instituciones de educación superior particulares. Revista Electrónica Educare, 20(3), 1-21. https://doi.org/10.15359/ree.20-3.11
Fozdar, B. I., Kumar, L. S., & Kannan, S. (2006). A Survey of a Study on the Reasons Responsible for Student Dropout from the Bachelor of Science Programme at Indira Gandhi National Open University. International Review of Research in Open and Distance Learning, 7(3), 1-15. https://doi.org/10.19173/irrodl.v7i3.291
Frank, E., Hall, Mark A., & Witten I. H. (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.
Freitas, A. A. (2002). Data Mining and Knowledge Discovery with Evolutionary Algorithms. The Netherlands: Springer-Verlag. https://doi.org/10.1007/978-3-662-04923-5
Guevara, C., Sanchez-Gordon, S., Arias-Flores, H., Varela-Aldás, J., Castillo-Salazar, D., Borja, M., . . . Yandún-Velasteguí, M. (2019). Detection of Student Behavior Profiles Applying Neural Networks and Decision Trees. 1026, págs. 591-597. Munich, Germany: Springer, Cham. https://doi.org/10.1007/978-3-030-27928-8_9
Gupta, B., Rawat, A., Jain, A., Arora, A., & Dhami, N. (2017). Analysis of Various Decision Tree Algorithms for Classification in Data Mining. International Journal of Computer Applications, 163(8), 15-19. https://doi.org/10.5120/ijca2017913660
Han, J., Kamber, M., & Pei, J. (2011). Data mining: concepts and techniques. Amsterdam: Morgan Kaufmann.
INEGI. (2018). Estadísticas a propósito del día mundial de la población (11 de julio). Ciudad de México: INEGI. https://bit.ly/2xbnZHd
Jadhav, S. D., & Channe, H. P. (2016). Comparative Study of K-NN, Naive Bayes and Decision Tree Classification Techniques. International Journal of Science and Research (IJSR), 5(1), 1842-1845. https://doi.org/10.21275/v5i1.NOV153131
Kabra, R. R., & Bichkar, R. S. (2011). Performance Prediction of Engineering Students using Decision Trees. International Journal of Computer Applications, 36(11), 9-12. https://bit.ly/2JdxckV
Kotsiantis, S., Pierrakeas, C., & Pintelas, P. E. (2003). Preventing Student Dropout in Distance Learning Using Machine Learning Techniques. Knowledge-Based Intelligent Information and Engineering Systems, 7th International Conference (págs. 267-274). Oxford, UK.: Springer-Verlag Berlin Heidelberg. https://doi.org/10.1007/978-3-540-45226-3_37
Kumar Y. S., Bharadwaj, B., & Pal, S. (2012). Mining Education Data to Predict Student's Retention: A comparative Study. International Journal of Computer Science and Information Security, 10(2), 113-117. https://bit.ly/2JJw9t1
Lavado, P., & Gallegos, J. (2005). La dinámica de la deserción escolar en el Perú: un enfoque usando modelos de duración. Lima, Perú: Universidad del Pacífico. https://bit.ly/39PH3rJ
Londoño A. L. F. (2013). Factores de riesgo presentes en la deserción estudiantil en la Corporación Universitaria Lasallista. Revista Virtual Universidad Católica del Norte (38), 183-194. https://bit.ly/1OnjEwM
Longest, K. C. (2019). Using Stata for Quantitative Analysis. California, USA: SAGE Publications.
M.P. van der Aalst, W. (2011). Process Mining: Discovery, Conformance and Enhancement of Business Processes (Google eBook). London, UK: Springer-Verlag. https://doi.org/10.1007/978-3-642-19345-3
Márquez-Vera C., Romero M. C., & Ventura S. S. (2012). Predicción del Fracaso Escolar mediante Técnicas de Minería de Datos. Revista Iberoamericana de Tecnologías del Aprendizaje, 7(3), 109-117. https://bit.ly/2zoZKmo
Márquez-Vera, C., Cano, A., Romero, C., Mohammad N. A. Y., Fardoun, H. M., & Ventura, S. (2016). Early dropout prediction using data mining: a case study with high school students. Expert Systems, 33(1), 107-125. https://doi.org/10.1111/exsy.12135
Mitchell, T. M. (1997). Machine Learning. Singapore: McGraw-Hill.
Mitchell, T. M. (2000). Decision Tree Learning. Washington State University. https://bit.ly/2N1AI32
Morales C. J., & Parraga-Alava, J. (2018). How Predicting The Academic Success of Students of the ESPAM MFL?: A Preliminary Decision Trees Based Study. 2018 IEEE Third Ecuador Technical Chapters Meeting (ETCM). Cuenca, Ecuador: IEEE. https://doi.org/10.1109/ETCM.2018.8580296
Morillas, A. (2014). Muestreo en poblaciones finitas. Notas del curso. Málaga-España: Universidad de Málaga. https://bit.ly/2JLLA3K
OECD. (2017). Skills Strategy Diagnostic Report: Mexico 2017, OECD Skills Studies. París: OECD Publishing. https://doi.org/10.1787/9789264287679-en
OECD. (2019). Higher Education in Mexico: Labour Market Relevance and Outcomes, Higher. París: OECD Publishing. https://doi.org/10.1787/9789264309432-en
Pal, S. (2012). Mining Educational Data Using Classification to Decrease Dropout Rate of Students. International Journal of Multidisciplinary Sciences and Engineering, 3(5), 35-39. https://bit.ly/2xVhAjc
Raju, D. y Schumacker, R. (2015). Exploring Student Characteristics of Retention that Lead to Graduation in Higher Education Using Data Mining Models. Journal of college student retention: Research, Theory y Practice, 16(5), 563-591. https://doi.org/10.2190/CS.16.4.e
Rodríguez-Maya, N. E., Lara-Álvarez, C., May-Tzuc, O., & Suárez-Carranza, B. A. (2017). Modeling Students' Dropout in Mexican Universities. Research in Computing Science, 139, 163-175. https://doi.org/10.13053/rcs-139-1-13
Ruíz C., L. (2009). Deserción en la educación superior recinto Las Minas. Período 2001-2007. Ciencia e Interculturalidad, 4(2), 30-46. https://doi.org/10.5377/rci.v4i1.288
Sara, N. B., Halland, R., Igel, C., & Alstrup, S. (2015). High-School Dropout Prediction Using Machine Learning: A Danish Large-scale Study. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (págs. 319-324). Bruges, Belgium: i6doc.com. https://bit.ly/2MFgzkp
Secretaría de Educación Pública. (2019). Abandono escolar. Ciudad de México: SEP.
Secretaría de Educación Pública. (2019). Principales cifras del sistema educativo nacional 2018-2019. Ciudad de México: Dirección General de Planeación, Programación y Estadística. https://bit.ly/2yCwivX
Sharma, H., & Kumar, S. (2016). A Survey on Decision Tree Algorithms of Classification in Data Mining. International Journal of Science and Research (IJSR), 5(4), 2094-2097. https://doi.org/10.21275/v5i4.NOV162954
Sivakumar, S., Venkataraman, S., & Selvaraj, R. (2016). Predictive Modeling of Student Dropout Indicators in Educational Data Mining using Improved Decision Tree. Indian Journal of Science and Technology, 9(4), 1-5. https://doi.org/10.17485/ijst/2016/v9i4/87032
Universidad Tecnológica de Tabasco. (2019). Glosario de Términos. Villermosa, Tabasco: Universidad Tecnológica de Tabasco. https://bit.ly/2xZ60DK
Ustebay, S., Turgut, Z., & Ali A. M. (2018). Intrusion Detection System with Recursive Feature Elimination by Using Random Forest and Deep Learning Classifier. 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT) (págs. 71-76). Ankara, Turkey: IEEE. https://doi.org/10.1109/IBIGDELFT.2018.8625318
Vélez, A., & López, J. D. F. (2004). Estrategias para vencer la deserción universitaria. Educación y Educadores (7). 177-203. https://bit.ly/39MgeEJ
Valle G. R., Eslava G. G., Manzano P. A., & García M. M. (2014). Encuesta Internacional sobre el Abandono en la Educación Superior. Unión Europea. https://bit.ly/2p8k2Pk
Vijayalakshmi, M., & Kumar, A. S. (2011). Efficiency of decision trees in predicting student's academic performance. Computer Science y Information Technology, 335-343. https://doi.org/10.5121/csit.2011.1230
Vries, W., León A. P., Romero M. J. F., & Hernández S. I. (2011). ¿Desertores o decepcionados? Distintas causas para abandonar los estudios universitarios. Revista de la Educación Superior, 40(160), 29-49. https://bit.ly/1TzOzru
Witten, I. H., & Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. San Francisco, CA: ELSEVIER.
Witten, I. H., Frank, E., & Hall, M.A. (2011). Data mining: Practical machine learning tools and techniques (3a. ed.). Morgan Kaufmann Publishers, Burlington. https://doi.org/10.1016/B978-0-12-374856-0.00001-8
Yamao, E., Saavedra, L. C., Campos P. R., & Huancas H. V. D. (2018). Prediction of academic performance using data mining in first year students of peruvian university. CAMPUS, XXIII(26), 151-160. https://doi.org/10.24265/campus.2018.v23n26.05
Yang, S., Lu, O., Huang, A., Huang , J., Ogata, H., & Lin, A. (2017). Predicting Students' Academic Performance Using Multiple Linear Regression and Principal Component Analysis. Journal of Information Processing, 170-176. https://doi.org/10.2197/ipsjjip.26.170
Yukselturk, E., Ozekes, S., & Kılıç T. Y. (2014). Predicting dropout student: an application of data mining methods in an online education program. European Journal of Open, Distance and e-Learning, 17(1), 119-133. https://doi.org/10.2478/eurodl-2014-0008
Downloads
Published
Issue
Section
License
The authors grant non-exclusive rights of exploitation of works published to RELIEVE and consent to be distributed under the Creative Commons Attribution-Noncommercial Use 4.0 International License (CC-BY-NC 4.0), which allows third parties to use the published material whenever the authorship of the work and the source of publication is mentioned, and it is used for non-commercial purposes.
The authors can reach other additional and independent contractual agreements, for the non-exclusive distribution of the version of the work published in this journal (for example, by including it in an institutional repository or publishing it in a book), as long as it is clearly stated that the Original source of publication is this magazine.
Authors are encouraged to disseminate their work after it has been published, through the internet (for example, in institutional archives online or on its website) which can generate interesting exchanges and increase work appointments.
The fact of sending your paper to RELIEVE implies that you accept these conditions.