Knowledge Distillation (KD) is a way to compress neural networks. In most cases a fully trained, large neural network is available that is used to train a smaller model. Several factors have an influence on the success of KD. For example, the training data used and the architecture of the models can have an impact.
The problem studied in this thesis is the transfer of a neural network to a drone which has limited memory. Therefore, the utilization of the large network on the drone is not directly possible and KD is employed. This work specifically examines the effects of KD, using a model with many parameters that has been trained for a specific flight maneuver with the help of Reinforcement Learning (RL). The knowledge of this model is transferred to a smaller one, which will then be utilized on the drone and perform the flight maneuver in real time and in the real world.
An experiment, examining different data generation methods, shows that the training data used has a large impact on the distillation result. Other tests use one or more additional models to distill the knowledge. These additional medium models have a size that is between the large and small model. In the experiments that use a medium model, the medium network is trained first using the large network and afterwards it is used to train the small model. The tests reveal that additional training steps can further improve the results. The experiments show that with the right training data and with one or more distillation steps very good small models can be trained that outperform the large network originally trained with RL.