Abstract: Choosing a proper clothing collocation requires the sense of fashion. Yet modeling how people select items is challenging: the items in a collocation should be compatible but there are too many attributes to consider (e.g., color, texture, style) for each kind of fashion items. In this paper, we propose to learn a global compatible outfit generation model from existing outfit images and text descriptions. Our approach relies on a bidirectional LSTM to model the relationship between different categories of fashion items and then predict the item based on all the other items. Meanwhile, embedded visual semantic descriptions are exploited to guide the generation with attribute information. Combining these structures, it is guaranteed that in the resulting outfit, items share a similar style and neither redundant nor missing items exist for essential categories. We demonstrate our method applied to an outfit dataset containing about 160,000 fashion items. Experimental results indicate that a good sense of fashion is obtained by the proposed method.
0 Replies
Loading