Return to search

Transformer-Encoders for Mathematical Answer Retrieval

Every day, an overwhelming volume of new information is produced. Information Retrieval systems play a crucial role in managing this deluge and facilitate users to find relevant information. Simultaneously, the volume of scientific literature is also rapidly increasing, requiring powerful retrieval tools in this domain.
Current methods of Information Retrieval employ language models based on the Transformer architecture, called Transformer-Encoder models, in this work. These models are generally trained in two phases: initially, a pre-training on general natural language data is performed, then fine-tuning follows, which adapts the model to a specific task such as classification or retrieval. Since Transformer-Encoder models are pre-trained on general natural language corpora, they perform well on these documents. However, scientific documents exhibit different features. The language in these documents is characterized by mathematical notation, such as formulae. Applying Transformer-Encoder models to these documents results in a low retrieval performance (effectiveness). A possible solution is to adapt the model to the new domain by further pre-training on a data set originating in the new domain. This process is called Domain-Adaptive Pre-Training and has been successfully applied to other domains.
Mathematical Answer Retrieval involves finding relevant answers from a large corpus for mathematical questions. Both the question and the answers can contain mathematical notation and natural language. To retrieve relevant answers, the model must 'understand' the problem specified in the question and the solution of the answers. This property makes the task of Mathematical Answer Retrieval well suited to evaluate whether Transformer-Encoder models can model mathematical and natural language in conjunction. Transformer-Encoder models showed a low performance on this task compared to traditional retrieval approaches, which is surprising given the success of Transformer-Encoder models in other domains.
This thesis, therefore, deals with the domain-adaption of Transformer-Encoder models for the domain of mathematical documents and the development of a retrieval approach using these models for Mathematical Answer Retrieval. We start by presenting a retrieval pipeline using the Cross-Encoder setup, a specific architecture of applying Transformer-Encoder models for retrieval. Then, we enhance the retrieval pipeline by adapting the pre-training schema of the Transformer-Encoder models to capture mathematical language better. Our evaluation demonstrates the strengths of the Cross-Encoder setup using our domain-adapted Transformer-Encoder models.
In addition to these contributions, we also present an analysis framework to evaluate what knowledge of mathematics the models have learned. This analysis framework is used to study Transformer-Encoder models before and after fine-tuning for mathematical retrieval. We show that Transformer-Encoder models learn structural features of mathematical formulae during pre-training but rely more on other superficial information for Mathematical Answer Retrieval. These analyses also enable us to improve our fine-tuning setup further. In conclusion, our findings suggest that Transformer-Encoder models can be applied as a suitable and powerful approach for Mathematical Answer Retrieval.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:91575
Date27 May 2024
CreatorsReusch, Anja
ContributorsLehner, Wolfgang, Weikum, Gerhard, Technische Universität Dresden
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/publishedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds