Vector embeddings by sequence similarity and context for improved compression, similarity search, clustering, organization, and manipulation of cDNA libraries

Published: 01 Jan 2025, Last Modified: 20 May 2025Comput. Biol. Chem. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Each short sequence was embedded in a different vector.•Effective clustering and enhanced compression were achieved for cDNA libraries.•Clustering was performed based on amino acid characteristics.•Vector embedding was combined with an algorithm to determine vector proximity.•The sequence-embedding technique can be used to encode barcodes and cDNA sequences.
Loading