Abstract
Nowadays, the virtual learning environment has become an ideal tool for professional self-development and bringing courses for various learner audiences across the world. There is currently an increasing interest in researching the topic of learner dropout and low completion in distance learning, with one of the main concerns being elevated rates of occurrence. Therefore, the early prediction of learner withdrawal has become a major challenge, as well as identifying the factors, which contribute to this increasingly occurring phenomenon. In that regard, this manuscript presents a framework for withdrawal prediction model for the data from The Open University, one of the largest distance learning institutions. For that purpose, we start by pre-processing the dataset and tackling the challenge of discretization process and unbalanced data. Secondly, this paper identifies the semantical issues of raw data by introducing new behavioural indicators. Finally, we reckon on machine learning algorithms for withdrawal prediction model to understand the lack of learners' commitment at an early stage.