Multimodal Data Encoding and Compression in Apache IoTDB

Published: 01 Jan 2024, Last Modified: 12 Jan 2025Int. J. Softw. Informatics 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Time-series data are widely used in industrial manufacturing, meteorology, ships, electric power, vehicles, finance, and other fields, which promote the booming development of time-series database management systems. Faced with larger data scales and more diverse data modalities, efficiently storing and managing the data is very critical, and data encoding and compression are more and more important and worth studying. Existing data encoding methods and systems fail to consider the characteristics of data in different modalities thoroughly, and some methods of time-series data analysis have not been applied to the problem of data encoding. We comprehensively introduce the multimodal data encoding methods and their system implementation in the Apache IoTDB time-series database system, especially for the Internet of Things application scenarios. Our encoding method comprehensively considers data in multiple models including timestamp data, numerical data, Boolean data, frequency domain data, and text data, and fully explores and utilizes the characteristics of the corresponding modal of data, especially the characteristics of timestamp intervals approximation in timestamp modality, to carry out targeted data encoding design. At the same time, the data quality issue that may occur in practical applications has been taken into consideration in the encoding algorithm. Experimental evaluation and analysis at the encoding algorithm level and the system level over multiple datasets validate the effectiveness of our encoding method and its system implementation.
Loading