Abstract: In this paper we propose an architecture specifically devoted to the analysis of huge natural language biomedical textual collections, with the purpose of searching for semantic similarity in order to obtain useful hints for effective simulation that could help physicians in diagnosis tasks. We leverage Word Embedding models trained with word2vec algorithm and a Big Data architecture for their processing and management. We performed some preliminary analyses using a dataset extracted from the whole PubMed library and we developed a web front-end to show the usability of this methodology in a real context.
0 Replies
Loading