Abstract: Since the Turing Test was proposed in the 1950s, humanity began exploring artificial intelligence, with an aim to bridge the interaction gap between machines and human language. This exploration enables machines to comprehend how humans acquire, produce, and understand language, as well as the relationship between linguistic expression and the world. The paper explores the basic principles of natural language representation, the formalization of natural language, and the modeling methods of language models. The paper analyzes, summarizes and compares the mainstream technologies and methods, including vector space-based, topic model-based, graph-based, and neural network-based approaches. And how to improve the development trend and direction of language model understanding ability is predicted and further discussed.
Loading