Vorname: Arturo Buitrago
Time-series prediction of the performance of retail businesses is an area of interest for every agent along the card payment ecosystem and its value chain. Accurate forecasting of merchants’ transactions can be used to evaluate the business’ health and provide decision makers with valuable insights. Machine learning for time-series prediction has awoken a significant amount of interest; recent research in the field has been promising but its replicability and the validity of its results has been thrown into question. This work provides a basis by which to generate value from existing datasets, uses reproducible techniques and presents robust and valid results. This work sets out to test whether machine learning methods can predict the transactions of small to medium-sized merchants more accurately than statistical methods — the de facto standard. It also seeks to identify the best possible predictor for the selected task. The author first conducted a wide-ranging literature research and created a list of 21 predictors and implemented them. I also defined statistical benchmarks and measurements by which to compare them against each other. The predictors were submitted to extensive cross-validation and their predictive performance was evaluated. The five highest scoring algorithms after this stage were then submitted to an exhaustive hyperparameter grid search to achieve more precise point predictions. The top predictor was clearly identified. Eleven machine learning predictors outperformed all statistical benchmarks; no neural network-based approaches cleared this bar. K-nearest neighbor regression was found to most accurately predict merchants’ transactions. Gradient tree boosting was deemed the best result to form the basis for a future prediction system due to its robustness, scalability and the availability of libraries dedicated to gradient boosting approaches.