Abstract: Patents are legal rights issued to inventors to protect their inventions for a certain period and play an important role in today's artificial innovation. With the ever-increasing number of patents each year, an effective and efficient patent management and search system is indispensable for determining how different an invention is from prior works from the vast amount of patent data. However, the chnologists are using now is still based on the strategy of traditional keyword-based Boolean, which requires complex bool expressions. This type of strategy leads to poor performance and costs too much labor power to filter in post-processing. To address these issues, we proposed CoPatE: a novel Contrastive Learning Framework for Patent Embeddings to capture the high-level semantics of the large-scale patents, where a patent semantic compression module learns the informative claims to reduce the computational complexity, and a tags auxiliary learning module is to enhance the semantics of a patent from the structure to learn the high-quality patent embeddings. The CoPatE is trained with the patents from USPTO from 2013 to 2020 and tested by the patents from 2021 with the CPC scheme. The experimental results demonstrate that our model achieves a 17.7% increase at Recall@100 compared to the second-best method on the patent retrieval task and achieves 64.5% at Micro-F1 in the patent classification task.
0 Replies
Loading