A Database-based Rather Than a Language Model-based Natural Language Processing Method

TL;DR: We propose a new paradigm for the NLP problem, in which we have redefined natural language and set a new research object. The proposed method is more closer to the way humans process information.
Abstract: Language models pre-training for NLP tasks take natural language as the direct modeling object. However, we believe that natural language is essentially a way of encoding information ( knowledge). Therefore, the object of study for natural language should be the information encoded in language, and the organizational and compositional structure of the information described in language. Based on this understanding, we propose a database-based NLP method that changes the modeling object from natural language to the information encoded in natural language. On this basis, 1) sentences generation task is transformed into read operations implemented on the database, and some sentence encoding rules to be followed; 2) sentences understanding task is transformed into sentence decoding rules and a series of Boolean operations implemented on the database; 3) learning task can be achieved by writing operations. Our method is more closer to how the human brain processes information and has excellent interpretability and scalability.
