Abstract
About one-third of college students drop out before finishing their degree. The majority of those remaining will take longer than 4 years to complete their degree at “4-year” institutions. This problem emphasizes the need to identify students who may benefit from support to encourage timely graduation. Here we empirically develop machine learning algorithms, specifically Random Forest, to accurately predict if and when first-time-in-college undergraduates will graduate based on admissions, academic, and financial aid records two to six semesters after matriculation. Credit hours earned, college and high school grade point averages, estimated family (financial) contribution, and enrollment and grades in required gateway courses within a student’s major were all important predictors of graduation outcome. We predicted students’ graduation outcomes with an overall accuracy of 79%. Applying the machine learning algorithms to currently enrolled students allowed identification of those who could benefit from added support. Identified students included many who may be missed by established university protocols, such as students with high financial need who are making adequate but not strong degree progress.