Abstract: Through the digitization and automation of vehicles, an increasing amount of data is continuously generated, processed and analyzed. Especially, the storage of this data is of particular importance, since historical vehicle data enables the analysis of driving behavior, the optimization of vehicle functions and the generation of new business models to be able to provide costumers the best experience. However, different communication protocols for inter and intra vehicle communication yield highly complex sensor networks, whose sensor recordings are rarely available synchronously in a central node. This heterogeneous nature of vehicle data requires efficient processing and storage. As a consequence, we benchmark different data structures and metadata concepts in combination with various established databases and file-systems, in order to identify an optimal system for storing vehicle sensor data. Our research shows that the data structure which is embedded in the Lakehouse has to be optimized to achieve the maximum performance for backbones with heterogeneous sensor data. Therefore, we developed the timestamp partitioned data structure named as Schema-2 which shows in combination with TimescaleDB and Druid optimal performance compared to the state-of-the-art time series data structure.
Loading