Global ETD Search

Return to search

Développement du système MathNat pour la formalisation automatique des textes mathématiques / Developing System MathNat for Automatic Formalization of Mathematical texts

Le langage mathématique courant et les langages mathématiques formelssont très éloignés. Par <<langage mathématique courant>> nousentendons la prose que le mathématicien utilise tous les jours dansses articles et ses livres. C'est une langue naturelle avec desexpressions symboliques et des notations spécifiques. Cette langue està la fois flexible et structurée mais reste sémantiquementintelligible par tous les mathématiciens.Cependant, il est très difficile de formaliser automatiquement cettelangue. Les raisons principales sont: la complexité et l'ambiguïté deslangues naturelles en général, le mélange inhabituel entre languenaturelle et notations symboliques tout aussi ambiguë et les sautsdans le raisonnement qui sont pour l'instant bien au-delà descapacités des prouveurs de théorèmes automatiques ou interactifs.Pour contourner ce problème, les assistants de preuves actuelsutilisent des langages formels précis dans un système logique biendéterminé, imposant ainsi de fortes restrictions par rapport auxlangues naturelles. En général ces langages ressemblent à des langagesde programmation avec un nombre limité de constructions possibles etune absence d'ambiguïté.Ainsi, le monde des mathématiques est séparé en deux, la vastemajorité qui utilise la langue naturelle et un petit nombre utilisantaussi des méthodes formelles. Cette seconde communauté est elle-mêmesubdivisée en autant de groupes qu'il y a d'assistants de preuves. Onperd alors l'intelligibilité des preuves pour tous les mathématiciens.Pour résoudre ce problème, on peut se demander:est-il possible d'écrire un programme qui comprend la langue naturellemathématique et qui la traduit vers un langage formel afin depermettre sa validation?Ce problème se subdivise naturellement en deux sous-problèmes tous lesdeux très difficiles:1. l'analyse grammaticale des textes mathématiques et leur traductiondans un langage formel,2. la validation des preuves écrites dans ce langage formel.Le but du projet MathNat (Mathematics in controlled Natural languages)est de faire un premier pas pour répondre à cette question trèsdifficile, en se concentrant essentiellement sur la première question.Pour cela, nous développons CLM (Controlled Language for Mathematics)qui est un sous-ensemble de l'anglais avec une grammaire et un lexiquerestreint, mais qui inclut tout de même quelques ingrédientsimportants des langues naturelles comme les pronoms anaphoriques, lesréférences, la possibilité d'écrire la même chose de plusieursmanières, des adjectifs distributifs ou non, ...Le second composant de MathNath est MathAbs (Mathematical Abstractlanguage). C'est un langage formel, indépendant du choix d'un systèmelogique permettant de représenter la sémantique des textes enpréservant leur structure et le fil du raisonnement. MathAbs est conçucomme un langage intermédiaire entre CLM et un système logique formelpermettant la vérification des preuves.Nous proposons un système qui permet de traduire CLM vers MathAbsdonnant ainsi une sémantique précise à CLM. Nous considèrons que cetravail est déjà un progrès notable, même si pour l'instant on estloin de pouvoir vérifier formellement toutes les preuves en MathAbsainsi générées.Pour le second problème, nous avons réalisé une petite expérience entraduisant MathAbs vers une liste de formules en logique du premierordre dont la validité garantit la correction de la preuve. Nous avonsensuite essayé de vérifier ces formules avec des prouveurs dethéorèmes automatiques validant ainsi quelques exemples. / There is a wide gap between the language of mathematics and itsformalized versions. The term "language of mathematics" or"mathematical language" refers to prose that the mathematician uses inauthoring textbooks and publications. It mainly consists of naturallanguage, symbolic expressions and notations. It is flexible,structured and semantically well-understood by mathematicians.However, it is very difficult to formalize it automatically. Some ofthe main reasons are: complex and rich linguistic features of naturallanguage and its inherent ambiguity; intermixing of natural languagewith symbolic mathematics causing problems which are unique of itskind, and therefore, posing more ambiguity; and the possibility ofcontaining reasoning gaps, which are hard to fill using the currentstate of art theorem provers (both automated and interactive).One way to work around this problem is to abandon the use of thelanguage of mathematics. Therefore in current state of art of theoremproving, mathematics is formalized manually in very precise, specificand well-defined logical systems. The languages supported by thesesystems impose strong restrictions. For instance, these languages havenon-ambiguous syntax with a limited number of possible syntacticconstructions.This enterprise divides the world of mathematics in two groups. Thefirst group consists of a vast majority of mathematicians whose relyon the language of mathematics only. In contrast, the second groupconsists of a minority of mathematicians. They use formal systems suchas theorem provers (interactive ones mostly) in addition to thelanguage of mathematics.To bridge the gap between the language of mathematics and itsformalized versions, we may ask the following gigantic question:Can we build a program that understands the language of mathematicsused by mathematicians and can we mechanically verify its correctness?This problem can naturally be divided in two sub-problems, both very hard:1. Parsing mathematical texts (mainly proofs) and translating thoseparse trees to a formal language after resolving linguistic issues.2. Verification of this formal version of mathematics.The project MathNat (Mathematics in controlled Natural language) aimsat being the first step towards solving this problem, focusing mainlyon the first question.First, we develop a Controlled Language for Mathematics (CLM) which isa precisely defined subset of English with restricted grammar anddictionary. To make CLM natural and expressive, we support some richlinguistic features such as anaphoric pronouns and references,rephrasing of a sentence in multiple ways and the proper handling ofdistributive and collective readings.Second, we automatically translate CLM to a system independent formaldescription language (MathAbs), with a hope to make MathNat accessibleto any proof checking system. Currently, we translate MathAbs intoequivalent first order formulas for verification.

http://www.theses.fr/2012GRENM001/document

Linguistique informatique

Langage contrôlé

Systèmes formels

Vérification de preuves

Computational linguistics

Language technology

Controlled languages

The language of mathematics

Identifer	oai:union.ndltd.org:theses.fr/2012GRENM001
Date	18 January 2012
Creators	Muhammad, Humayoun
Contributors	Grenoble, Raffalli, Christophe, Ranta, Aarne
Source Sets	Dépôt national des thèses électroniques françaises
Language	French
Detected Language	French
Type	Electronic Thesis or Dissertation, Text

Page generated in 0.0027 seconds

Développement du système MathNat pour la formalisation automatique des textes mathématiques / Developing System MathNat for Automatic Formalization of Mathematical texts

Description

Links & Downloads

Tags

Additional Fields