Keywords: Queryretrieval, documentembedding, Linq-Embed-Mistral, co sinesimilarity, informationretrievalsystem, semanticsearch, NLP models
Abstract: This paper introduces an innovative methodology for question-paper retrieval tasks, designed specifically for competitive environments demanding high precision and recall rates. Our approach combines textual components from both queries and documents in a novel way to optimize information retrieval processes. By concatenating the question’s ‘question’ and ‘body’ sections to form a comprehensive query, and merging the article’s ‘title’ and ‘abstract’ to represent the document, we create rich text inputs that encapsulate the essence of each entity.
The cornerstone of our retrieval system is the utilization of the Linq-Embed-Mistral model from Hugging Face. This sophisticated model transforms the concatenated query and document texts into dense vector representations, harnessing the power of advanced natural language processing. These embeddings capture semantic nuances and contextual similarities, enabling more accurate matching.
Employing cosine similarity as a ranking measure, we compare the query vectors against document vectors, retrieving the top 20 matches that exhibit the highest degree of alignment. This strategy ensures not only relevance but also expediency, filtering out the most pertinent research papers from extensive databases swiftly.
Through empirical evaluations, we validate the effectiveness of our method, demonstrating its potential to significantly enhance the performance of question-paper retrieval systems. Our findings contribute to the progression of information retrieval methodologies, particularly within academic and research communities.
Submission Number: 13
Loading