MAVE: A Product Dataset for Multi-source Attribute Value Extraction
Abstract: Attribute value extraction refers to the task of identifying values of an attribute of interest from product information.
Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product ranking, retrieval and recommendations.
While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications.
In this paper, we introduce MAVE, a new dataset to better facilitate research on product attribute value extraction.
MAVE is composed of a curated set of 2.2 million products from Amazon pages, with 3 million attribute-value annotations across 1257 unique categories.
MAVE has four main and unique advantages:
First, MAVE is the largest product attribute value extraction dataset by the number of attribute-value examples.
Second, MAVE includes multi-source representations from the product, which captures the full product information with high attribute coverage.
Third, MAVE represents a more diverse set of attributes and values relative to what previous datasets cover.
Lastly, MAVE provides a very challenging zero-shot test set, as we empirically illustrate in the experiments.
We further propose a novel approach that effectively extracts the attribute value from the multi-source product information. We conduct
extensive experiments with several baselines and show that MAVE is an effective dataset for attribute value extraction task. It is also a very challenging task on zero-shot attribute extraction.
0 Replies
Loading