Author:

Michael Sorg
Supervisor:Prof. Gudrun Klinker
Advisor:Adnane Jadid, Christian Eichhorn
Submission Date:15.04.2020

Abstract

In this thesis a deep learning approach for 6-DoF pose estimation is developed. In doing so a neural network model is trained on several sensor inputs (GPS, IMU and odometry) with the aim of predicting a 6D pose (latitude, longitude, altitude, pitch, roll and yaw).

During an architecture search with simulation data it turned out that a recurrent neural network (RNN) containing one gated recurrent unit (GRU) with 256 units performs best.

This model archives a mean position accuracy of 3,5 meter while the mean pitch, roll and yaw error is below 1,5. By increasing the dataset size from 2 hours to 20 hours the results could be improved to 0,95 meter and 0,3.

By training on a real world dataset which contains 1,5 hours of driving time the model archived a mean position accuracy of 3,6 meter and a orientation accuracy of 0,05 , 0,16 and 0,08 for pitch, roll and yaw.

Results/Implementation/Project Description





Conclusion


In the first part of this thesis simulation data was used to find a suitable network architecture. The dataset includes 9 input features coming from GPS, accelerometer and gyroscope. All sensor measurements are simulated with 100 Hz in order to simplify the problem. The architecture search reveals that recurrent neural networks (RNN) work better than a combination of convolutional and recurrent neural networks (CRNN). The
best performing model contains one gated recurrent unit (GRU) layer with 256 units. The model archived a mean position accuracy of 3,5 meter and a mean orientation error of 1,5° when training on 2 hours of data. By increasing the dataset size up to 20 hours the mean average error can be improved to 0,95 meter and 0,3 °. Experiments have shown that the network is also able to process sensor signals with different update rates. When the GPS signal was simulated with 1 Hz the performance did not decline significantly. However, some problems have been identified. Firstly, during training gradients and weights changes are very small which indicates that the learning is not optimal. All attempts to fix this vanishing gradient problem were not successful. Secondly, the network basically only uses the GPS measurements for the position prediction and does not include IMU data in order to improve the result. It seems that the model is not able to fuse the IMU data with the GPS measurements at least for the position prediction. For the orientation estimation the network is able to do sensor fusion, because the performance gets worse if one of the three sensors is removed.


In the second part of this thesis a real dataset was used for training which includes 1,5 hours of driving time. This dataset contains 19 input features. Again an architecture search was done which confirms the results from the simulation data meaning that pure recurrent neural networks (RNN) with one GRU layer work best. In this case the model archived a mean position accuracy of 3,6 meter. Pitch, roll and yaw values were at 0,12 °, 0,22 ° and 0,15 °. Besides GPS and IMU in this case also odometry measurements (wheel velocity and wheel angle) are available. By including odometry inputs the orientation accuracy can be improved to 0,05 °, 0,16 ° and 0,08 °. Again, this shows that the network is able to do sensor fusion for estimating the orientation. Why the model does not include IMU and odometry measurements for predicting the position remains an open question.