FlashProfile: a framework for synthesizing data profilesOpen Website

2018 (modified: 29 May 2021)Proc. ACM Program. Lang. 2018Readers: Everyone
Abstract: We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios.
0 Replies

Loading