Author:

Jan-Philipp Fahlbusch
Supervisor:Prof. Gudrun Klinker
Advisor:Sven Liedtke
Submission Date:15.05.2020

Abstract

Creating understandable and easy to use interactive content is the goal of any software, wanting its user base to intuitively work within its environment. It is very important to test many different interaction concepts early on during development. This is especially true for interactive objects in new technologies, such as augmented and virtual reality, as they often require new approaches for a streamlined user experience. For the aforementioned rapid prototyping process, we want to create a solution that can be used in any current game engine, supporting the creation of augmented or virtual reality content. Furthermore, this thesis provides a proof of concept, showing that the automatic placement of content and environment objects is possible in real-time.

The future of any head-mounted display in both augmented and virtual reality lies in wireless setups, making the ability to analyse the surrounding of the user in real-time an imminent requirement of any application deployed on such a device. By only processing the point cloud of a room, our approach is able to procedurally generate and digitally recreate environments, based on the user’s actual surroundings. Through the detection of distinctive room features, this implementation is able to provide areas, such as walls or tables, to place interactive content on automatically and independently of any user interactions, except the required prior scanning of their surroundings.

In this thesis, we will combine both the augmented and virtual reality settings into a single solution, despite the slightly different requirements, generating a simplified and render optimized environment over the existing room mesh, creating bounds with at least fifteen times less vertices and triangles compared to the original. The bounds are calculated in less than 50 milliseconds on recommended virtual reality hardware, allowing our solution to function as a real-time replacement of the original mesh.

Moreover, this work groups the created spatial mappings into categories for different types of interactive content, allowing for automatic placement in suitable locations. With a processing time of less than 200 milliseconds to analyse a standard surrounding, this enables applications to distribute content optimally around the user in a very short time frame. Creating such an environment with different areas to test interactive instances helps simplify the testing and comparison process of new interaction concepts in augmented and virtual reality, speeding up the development cycle of new and user friendly technologies. Additionally, this can be used in finished applications, procedurally placing new environments around the headset wearer based on their surroundings, with special areas for predefined interaction objects placed in easily reachable locations.

Results

Room Understanding

The final outcome of our algorithm detecting and marking valid surfaces for interactive content in room one. Our methods reduce all overlapping areas on walls and leave only minor overlaps for horizontal surfaces. However, these areas can be neglected as seen in this figure. All cyan coloured squares and rectangles are valid surfaces. For this room our algorithm detects 115 suitable areas.

The graphical representation of the average execution time of the room understanding algorithm for each room and tested version. The range of measured results is presented through the interval, while the average is the dot in between. Each room had very different execution times, which were mostly consistent over all tested versions.

The number of detected areas for each room. While room one was actually the smallest room, our algorithm could detect 116 suitable areas on walls, floor, ceiling, and furniture. Room two had through its simpler geometry only 75 areas, while room three, due to its huge size, has 227 areas.

Room Recreation

The boundaries of room one, as a mesh after the greedy meshing algorithm. The structure of the voxels were kept from the voxel representation on both the inside and outside. The inaccurate behaviour for shading the room mesh is due to not normal and UV mapping the vertices. The triangulation of the mesh can actually be seen in these screenshots, as we render the shaded geometry and the wireframe. This mesh representation might not be the most optimal solution in relation to vertex and triangle count, but reduces the numbers considerably compared to the original room mesh.

The graphical representation of the average execution time for the room reconstruction algorithm measured in each room and tested version. The range of measured results is presented through the interval, while the average is the dot in between. Each room had very different execution times, which were mostly consistent over all tested versions.

Our room reconstruction algorithm is not only measured on its execution time, but also on the vertex and triangle count of the returned mesh. This count mainly depends on the complexity and size of the room. For the two rooms scanned by the HoloLens, we could reduce the triangle and vertex count to around a twentieth of the original count, while for room two, the vertex count was reduced down to less than a 130th of the original count.

Comparison between Unity and Unreal

The result of our algorithm implemented in Unity. The left column shows the room understanding results, where a red area indicates close interaction and green an interaction over a distance. The middle column depicts both algorithms results, while the right only renders the calculated simplified bounding volume. Only small areas of the detected surfaces overlap with the bounds and all important details are maintained.

The result of our algorithm implemented in the Unreal Engine. The left column shows the room understanding results, where a red area indicates close interaction and green an interaction over a distance. The middle column depicts both algorithms results, while the right only renders the calculated simplified bounding volume. Only small areas of the detected surfaces overlap with the bounds and all important details are maintained.

Conclusion

The main contributions of this thesis are twofold. First, we provided an algorithm to detect and categorise the features of a room and translate them to suitable areas for interactive content. Our approach is, compared to existing solutions, hardware and software independent and can be adopted through its ensured compatibility throughout many different platforms, such as AR and VR headsets. Moreover, it can be used in other software and hardware solutions, which allow the usage of DLLs written in C++. This thesis proves that our suggested implementation can handle data of expected complexity and average room sizes in real-time, taking less than 200 ms to compute the valid areas on recommended VR PC hardware. With the targeted frame rate of 90 FPS for VR headsets, this translates to less than 20 passed frames until the results are available to be displayed for the user. Not blocking the main thread of the game engine, means that the application can continue to run smoothly while the results are being calculated. Comparing the maximum times recorded to the time it takes on average to blink an eye, which is between 100 and 400 ms, depending on the individual, our algorithm literally is finished in the blink of an eye. This is achieved by implementing a four-part algorithm, going in order from feature detection, suitable area definition, outcome reduction, to the categorization of the results.

Secondly, this thesis provides a solution for recreating room bounds custom-tailored to the needs of an AR and VR environment. This room recreation implementation does not need to recreate every exact detail present in the room, but rather the important boundaries keeping the user from running into real-world objects. Additionally it gives developers an idea of where objects should be placed to mirror the environment of the real room for recreating a new digital environment based on the real surroundings. Through execution times of less than 40 ms for an average room utilised for AR and VR environments, we also proved that this algorithm can be used to provide the recreated room in real-time to the application at least every five frames. Not even the HoloLens can update the room geometry in a faster time. This allows our algorithm to be used for continues room recreation updates, as well as informing the player when he is about to collide into real-world objects. This is achieved by combining the room feature detection with a greedy meshing algorithm.

By testing the proposed approaches over multiple rooms, we could determine its strengths and weaknesses. For one, we proved the deterministic nature of our algorithm and the reliable detection of suitable areas and room bounds, provided the passed mesh geometry has a floor and ceiling parallel to the X-Z-plane, walls are perpendicular to the floor, the mesh contains a uniform distribution of triangles and vertices across all surfaces, and the floor and ceiling are at least 2.3 m apart. Through the two tested rooms, where our method was unable to produce any results, we proved that our algorithm needs to be able to detect the floor and ceiling and will fail to provide any results if at least one of both is not detected correctly.

Thesis:

Kickoff Slides:

Final Slides: