Sentence embeddings and their relation with sentence structures. (Plongements de phrases et leurs relations avec les structures de phrases)

Antoine Simoulin

Published: 01 Jan 2022, Last Modified: 13 Jun 2023undefined 2022Readers: Everyone

Abstract: Historically, models of human language assume that sentences have a symbolic structure and that this structure allows us to compute their meaning by composition. In recent years, deep learning models have successfully processed tasks automatically without relying on an explicit language structure, thus challenging this fundamental assumption. This thesis seeks to clearly identify the role of structure in language modeling by deep learning methods. The dissertation specifically investigates the construction of sentence embeddings—semantic representations based on vectors—by deep neural networks. Firstly, we study the integration of linguistic biases in neural network architectures to constrain their composition sequence based on a traditional tree structure. Secondly, we relax these constraints to analyze the latent structures induced by the neural networks. In both cases, we analyze the compositional properties of the models as well as the semantic properties of the sentence embeddings. This thesis begins with an overview of the main methods used to represent the meaning of sentences, either symbolically or using deep learning. The second part proposes several experiments introducing linguistic biases in neural network architectures to build sentence embeddings. The first chapter explicitly combines numerous sentence structures to build semantic representations. The second chapter jointly learns symbolic structures and vector representations. The third chapter introduces a formal framework for graph transformers. Finally, the fourth chapter studies the impact of the structure on the generalization capacity of the models and compares their compositional capabilities. The last part compares the models to larger-scale approaches. It seeks to discuss current trends favoring larger models, more easily parallelized and trained on more data, at the expense of finer modeling. The two chapters from this part report on the training of large models of automatic language processing and compare these approaches with those developed in the second part from a qualitative and quantitative point of view.

0 Replies