Abstract: The utilization of binary representation of the embeddings over real valued features represents a promising avenue, in terms of memory savings and faster operations for various machine learning models. In this research paper, we delve into the exploration of barcode representation for text embeddings derived from BERT, which is optimized using Co-ordinate Search algorithm. These binary embeddings present a compact representation of text, thereby mitigating memory and computational demands, which is especially advantageous in the context of resource-intensive large-scale text processing tasks. In our study, we introduce a novel optimal threshold technique, coupled with the Coordinate Search algorithm to transform continuous BERT embeddings into binary barcodes thereby enabling effective Natural Language Processing while sustaining computational efficiency. The optimal barcode representations have been applied in Natural Language Processing applications, showcasing its innovative potential in revolutionizing text representation. Through an extensive series of experiments on various NLP task encompassing diverse datasets, we comprehensively evaluate our approach, comparing it against a spectrum of thresholding techniques. The binary embeddings achieved by optimal thresholds outperform traditional binarization methods in terms of accuracy. The proposed method for generating a binary representations is versatile, being independent of the model, data and task, making it applicable across various machine learning applications.
Loading