Global ETD Search

1071	Addressing the brittleness of knowledge-based question-answering Chaw, Shaw Yi 02 April 2012 (has links) Knowledge base systems are brittle when the users of the knowledge base are unfamiliar with its content and structure. Querying a knowledge base requires users to state their questions in precise and complete formal representations that relate the facts in the question with relevant terms and relations in the underlying knowledge base. This requirement places a heavy burden on the users to become deeply familiar with the contents of the knowledge base and prevents novice users to effectively using the knowledge base for problem solving. As a result, the utility of knowledge base systems is often restricted to the developers themselves. The goal of this work is to help users, who may possess little domain expertise, to use unfamiliar knowledge bases for problem solving. Our thesis is that the difficulty in using unfamiliar knowledge bases can be addressed by an approach that funnels natural questions, expressed in English, into formal representations appropriate for automated reasoning. The approach uses a simplified English controlled language, a domain-neutral ontology, a set of mechanisms to handle a handful of well known question types, and a software component, called the Question Mediator, to identify relevant information in the knowledge base for problem solving. With our approach, a knowledge base user can use a variety of unfamiliar knowledge bases by posing their questions with simplified English to retrieve relevant information in the knowledge base for problem solving. We studied the thesis in the context of a system called ASKME. We evaluated ASKME on the task of answering exam questions for college level biology, chemistry, and physics. The evaluation consists of successive experiments to test if ASKME can help novice users employ unfamiliar knowledge bases for problem solving. The initial experiment measures ASKME's level of performance under ideal conditions, where the knowledge base is built and used by the same knowledge engineers. Subsequent experiments measure ASKME's level of performance under increasingly realistic conditions. In the final experiment, we measure ASKME's level of performance under conditions where the knowledge base is independently built by subject matter experts and the users of the knowledge base are a group of novices who are unfamiliar with the knowledge base. Results from the evaluation show that ASKME works well on different knowledge bases and answers a broad range of questions that were posed by novice users in a variety of domains. / text Knowledge bases Question answering Problem solving Natural language processing Project Halo Question Mediator Domain neutral ontologies Component Library Controlled languages AP exams Machine reading
1072	Real-time road traffic information detection through social media Khatri, Chandra P. 21 September 2015 (has links) In current study, a mechanism to extract traffic related information such as congestion and incidents from textual data from the internet is proposed. The current source of data is Twitter, however, the same mechanism can be extended to any kind of text available on the internet. As the data being considered is extremely large in size automated models are developed to stream, download, and mine the data in real-time. Furthermore, if any tweet has traffic related information then the models should be able to infer and extract this data. To pursue this task, Artificial Intelligence, Machine Learning, and Natural Language Processing techniques are used. These models are designed in such a way that they are able to detect the traffic congestion and traffic incidents from the Twitter stream at any location. Currently, the data is collected only for United States. The data is collected for 85 days (50 complete and 35 partial) randomly sampled over the span of five months (September, 2014 to February, 2015) and a total of 120,000 geo-tagged traffic related tweets are extracted, while six million geo-tagged non-traffic related tweets are retrieved. The classification models for detection of traffic congestion and incidents are trained on this dataset. Furthermore, this data is also used for various kinds of spatial and temporal analysis. A mechanism to calculate level of traffic congestion, safety, and traffic perception for cities in U.S. is proposed. Traffic congestion and safety rankings for the various urban areas are obtained and then they are statistically validated with existing widely adopted rankings. Traffic perception depicts the attitude and perception of people towards the traffic. It is also seen that traffic related data when visualized spatially and temporally provides the same pattern as the actual traffic flows for various urban areas. When visualized at the city level, it is clearly visible that the flow of tweets is similar to flow of vehicles and that the traffic related tweets are representative of traffic within the cities. With all the findings in current study, it is shown that significant amount of traffic related information can be extracted from Twitter and other sources on internet. Furthermore, Twitter and these data sources are freely available and are not bound by spatial and temporal limitations. That is, wherever there is a user there is a potential for data. Real-time Traffic Traffic perception Traffic incident Traffic congestion Social media Twitter Data mining Traffic flow Machine learning Artificial intelligence Natural language processing
1073	Σχεδιασμός και ανάπτυξη πρότυπου συστήματος μορφολογικής ανάλυσης ονομάτων της Αρχαίας Ελληνικής γλώσσας / Design and development of a model system of morphological parsing of the nouns of the Ancient Greek language Σώρρα, Μαρία 13 January 2015 (has links) Η Επεξεργασία Φυσικής Γλώσσας (Natural Language Processing, NLP) είναι το επιστημονικό πεδίο που συνδυάζει τη γλωσσολογική γνώση με αυτή της επιστήμης των υπολογιστών. Παρέχει την δυνατότητα επεξεργασίας φυσικών γλωσσών με υπολογιστικά μοντέλα και βοηθά τους χρήστες να πραγματοποιούν πλήθος εργασιών. Η ραγδαία ανάπτυξη του παγκόσμιου ιστού και η αύξηση των χρηστών οδηγεί στην ανάγκη για εξέλιξη της γλωσσικής τεχνολογίας (Language Technology). Μια φυσική γλώσσα που παρουσιάζει ιδιαίτερο και παγκόσμιο ενδιαφέρον είναι η Αρχαία Ελληνική, η οποία ως αντικείμενο μελέτης και έρευνας προσβλέπει κυρίως στην απόκτηση των γλωσσικών και πολιτιστικών γνώσεων που αποδεδειγμένα έδωσαν τις βάσεις του σημερινού πολιτισμού. Το ενδιαφέρον γύρω από την Αρχαία Ελληνική εντοπίζεται όχι μόνο σε γλωσσολογικό επίπεδο, αλλά και σε λογοτεχνικό, φιλοσοφικό και εκπαιδευτικό που αφορά την εκμάθηση και τη διδασκαλία. Οι έως τώρα προσεγγίσεις είναι αποτέλεσμα κλασσικών ερευνητικών μεθόδων, θεωρητικών και εμπειρικών από ειδικούς, που στερούνται αυτοματοποίησης. Η οποιαδήποτε προσπάθεια υπολογιστικής επεξεργασίας της Αρχαίας Ελληνικής θα πρέπει να ξεπεράσει ζητήματα που τυχόν θα προκύψουν από την ίδια την πολύπλοκη φύση της γλώσσας, τη γραφή, τη δομή, το λεξιλόγιο και την ετυμολογία της. Η διπλωματική αυτή εργασία αποτελεί μια πρώτη προσπάθεια ανάπτυξης ενός συστήματος μορφολογικής ανάλυσης των ονομάτων της Αρχαίας Ελληνικής. Οι λόγοι επιλογής των ονομάτων είναι ότι αποτελούν μικρό μέρος της γλώσσας , λίγες οι εξαιρέσεις στους κανόνες κλίσης, δεν παρατηρείται αλλομορφία και τέλος η μεγάλη συχνότητα εμφάνισής τους σε Αρχαία Ελληνικά κείμενα. Ο Μορφολογικός Αναλυτής μπορεί να αποτελέσει την υποδομή για περαιτέρω έρευνα στην προσπάθεια κατασκευής ενός πλήρους συστήματος που θα περιλαμβάνει όλα τα μέρη του λόγου και όλα τα επίπεδα ανάλυσης. Μορφολογική ανάλυση (Morphological Parsing) ονομάζεται το πρόβλημα της αναγνώρισης ότι μια λέξη αποσυντίθεται σε μορφήματα και η δημιουργία μια δομημένης αναπαράστασης γι' αυτό το γεγονός. Η διαδικασία της μορφολογικής ανάλυσης προϋποθέτει την αναγνώριση των λέξεων/φράσεων (προ-επεξεργασία δεδομένων) και ακολουθεί η διαδικασία παροχής πληροφοριών για τις λέξεις, δηλαδή η κατασκευή του Μορφολογικού Αναλυτή. Μια προσέγγιση για την κατασκευή του είναι η χρήση λεξικού και των κατάλληλων γραμματικών κανόνων. Με άλλα λόγια, σχεδιάστηκε και δημιουργήθηκε ένα λογισμικό το οποίο έχει ενσωματώσει τους απαραίτητους κανόνες γραμματικής, δέχεται ως όρισμα τον πρώτο τύπου ενός ονόματος και εξάγει πληροφορίες για την κατηγορία κλίσης του αλλά και τους υπόλοιπους τύπους κλίσης του. Δημιουργήθηκε δηλαδή μια βασική εφαρμογή η οποία μπορεί στην συνέχεια να εξελιχθεί και για άλλα μέρη του λόγου με στόχο την όσο το δυνατόν πλήρη δυνατότητα ψηφιακής επεξεργασίας της συγκεκριμένης γλώσσας. Το πρώτο στάδιο εργασιών που περιλαμβάνει η διπλωματική εργασία ήταν η μελέτη της σχετικής βιβλιογραφίας, όσον αφορά την Γλωσσική τεχνολογία, καθώς και των κανόνων γραμματικής για τα Αρχαία Ελληνικά ονόματα. Ακολούθησε η ανάπτυξη του λογισμικού που περιλαμβάνει όχι μόνο τους κανόνες κλίσης αλλά και τους αντίστοιχους κανόνες τονισμού των ονομάτων του ήδη πολύπλοκου πολυτονικού συστήματος της Αρχαία Ελληνικής γλώσσας. Επόμενο στάδιο εργασιών ήταν η συλλογή μεγάλου όγκου δεδομένων από κείμενα της Αρχαίας Ελληνικής. Στην συνέχεια, πραγματοποιήθηκε η αυτόματη εξόρυξη πλήθους κειμένων που περιέχονται στον ιστότοπο της ψηφιακής βιβλιοθήκης Perseus. Τελικό στάδιο ήταν η δημιουργία ενός interface που στόχο έχει ένα πιο φιλικό προς τον χρήστη μορφολογικό αναλυτή. / The Natural Language Processing (NLP) is a scientific field that combines linguistic knowledge with the computer science. It enables the process of natural languages with computational models and helps the users to perform numerous tasks. The rapid growth of the Web and the increasing number of users leads to the need for the development of the Language Technology. A natural language of particular and global interest is the Ancient Greek language, which as a subject of study and research is primarily aimed at the acquirement of language and cultural knowledge, provided the foundations of modern culture. The interest in Ancient Greek language is not only found in linguistic level but also in literary, philosophical and educational , both teaching and learning. The approaches, so far, are the result of classical research methods , theoretical and empirical, which lack automation. Any attempt of computational process of the Ancient Greek should overcome issues that could arise from the complex nature of the language itself, the writing, the structure, the vocabulary and the etymology. The M.Sc. thesis is a first attempt to develop a system of morphological parsing of the names (nouns) of the Ancient Greek language. The reason why names are chosen is that they constitute a small part of the language , there are few exceptions to the grammatical (inclination) rules , there is no allomorphism and their high frequency of occurrence in the ancient Greek documents. The morphological parser could be considered as the infrastructure for further research so as to develop a complete system that would include all the parts of speech and all the levels of analysis according to the Language Technology.Morphological parsing is called the problem of the recognition that a word is decomposed into morphemes and the creation of a structured representation of the development of this event. The procedure of the Morphological parsing implies the recognition of the words/ phrases (data pre-processing) providing information about the words , namely the development of the morphological parser. One possible approach to the development of the morphological parser could be the use of dictionary and the appropriate grammatical rules . In other words, there has been designed and created a software which has integrated the necessary grammatical rules , accepting as input the first type of the noun and prints information about the deviation (category and types). There has been created a basic application which could be developed for the rest parts of speech in order to achieve a full digital processing of the particular language.The first stage of the task , included in the thesis, was the study of relevant literature regarding the Language Technology and the grammatical rules of the ancient Greek names. The next step, was the software development that does not only include the deviation rules but also the rules of the polytonic system of the ancient Greek. Subsequently , the collection of the data from ancient Greek documents followed. Then , an automatical extraction of a great number of documents from the website of Perseus digital library was conducted. Finally, an effort to develop an interface was made in order the morphological parser to become user friendly. Ονόματα Αρχαία Ελληνικά Γλωσσική τεχνολογία Μορφολογική ανάλυση Γραμματική 006.35 Names Ancient Greek Language technology Morphological parsing Grammar Natural language processing Python
1074	Name Networks: A Content-Based Method for Automated Discovery of Social Networks to Study Collaborative Learning Gruzd, Anatoliy January 2009 (has links) As a way to gain greater insight into the operation of Library and Information Science (LIS) e-learning communities, the presented work applies automated text mining techniques to text-based communication to identify, describe and evaluate underlying social networks within such communities. The main thrust of the study is to find a way to use computers to automatically discover social ties that form between students just from their threaded discussions. Currently, one of the most common but time consuming methods for discovering social ties between people is to ask questions about their perceived social ties via a survey. However, such a survey is difficult to collect due to the high cost associated with data collection and the sensitive nature of the types of questions that must be asked. To overcome these limitations, the paper presents a new, content-based method for automated discovery of social networks from threaded discussions dubbed name networks. When fully developed, name networks can be used as a real time diagnostic tool for educators to evaluate and improve teaching models and to identify students who might need additional help or students who may provide such help to others. Library Science Information Extraction Virtual Communities Sociology Communications Computational Linguistics Learning Science Internet Natural Language Processing Quantitative Research
1075	Αυτόματη εξαγωγή λεξικής - σημασιολογικής γνώσης από ηλεκτρονικά σώματα κειμένων με χρήση ελαχίστων πόρων / Automatic extraction of lexico - semantic knowledge from electronic text corpora using minimal resources Θανόπουλος, Αριστομένης 25 June 2007 (has links) Το αντικείμενο της διατριβής είναι η μελέτη μεθόδων αυτόματης εξαγωγής των συμφράσεων και των σημασιολογικών ομοιοτήτων των λέξεων από μεγάλα σώματα κειμένων. Υιοθετείται μια προσέγγιση ελάχιστων γλωσσικών πόρων που εξασφαλίζει την απεριόριστη μεταφερσιμότητα των μεθόδων σε φυσικές γλώσσες και θεματικές περιοχές. Για την αξιολόγηση των προτεινόμενων μεθόδων προτείνονται, αξιολογούνται και εφαρμόζονται μεθοδολογίες με βάση πρότυπες βάσεις λεξικής γνώσης (στην Αγγλική), όπως το WordNet. Για την εξαγωγή των συμφράσεων προτείνονται νέα μέτρα εξαγωγής στατιστικά σημαντικών διγράμμων και γενικά ν-γράμμων που αξιολογούνται θετικά. Για την εξαγωγή των λεξικών - σημασιολογικών ομοιοτήτων των λέξεων ακολουθείται καταρχήν η προσέγγιση ομοιότητας περικειμένων λέξεων με παραθυρικές μεθόδους, όπου μελετώνται το πεδίο συμφραζομένων, το φιλτράρισμα των συνεμφανίσεων των λέξεων, τα μέτρα ομοιότητας, όπου εισάγεται ο παράγοντας του αριθμού κοινών παραμέτρων, καθώς και η αντιμετώπιση συστηματικών σφαλμάτων, ενώ προτείνεται η αξιοποίηση των λειτουργικών λέξεων. Επιπλέον, προτείνεται η αξιοποίηση της ομοιότητας περικείμενων εκφράσεων, που απαντάται συχνά σε θεματικώς εστιασμένα κείμενα, με ένα αλγόριθμο βασισμένο στην ετεροσυσχέτιση ακολουθιών λέξεων. Μελετάται η μεθοδολογία αξιοποίησης των παρατακτικών συνδέσεων ενώ προτείνεται μια μέθοδος ενοποίησης ετερογενών σωμάτων γνώσης λεξικών – σημασιολογικών ομοιοτήτων. Τέλος, η εξαχθείσα γνώση μετασχηματίζεται σε σημασιολογικές κλάσεις με μια συμβολική μέθοδο ιεραρχικής ομαδοποίησης και επίσης ενσωματώνεται επιτυχώς σε ένα διαλογικό σύστημα μηχανικής μάθησης όπου ενισχύει την απόδοση της αναγνώρισης του σκοπού του χρήστη συμβάλλοντας στην εκτίμηση του ρόλου των άγνωστων λέξεων. / The research described in this dissertation regards automatic extraction of collocations and lexico-semantic similarities from large text corpora. We follow an approach based on minimal linguistic resources in order to achieve unrestricted portability across languages and thematic domains. In order to evaluate the proposed methods we propose, evaluate and apply methodologies based on English gold standard lexical resources, such as WordNet. For the extraction of collocations we propose and test a few novel measures for the identification of statistically significant bigrams and, generally, n-grams, which exhibit strong performance. For the extraction of lexico-semantic similarities we follow a distributional window-based approach. We study the contextual scope, the filtering of lexical co-occurrences and the performance of similarity measures. We propose the incorporation of the number of common parameters into the latter, the exploitation of functional words and a method for the elimination of systematic errors. Moreover, we propose a novel approach to exploitation of word sequence similarities, common in technical texts, based on cross-correlation of word sequences. We refine an approach for word similarity extraction from coordinations and we propose a method for the amalgamation of lexico-semantic similarity databases extracted via different principles and methods. Finally, the extracted similarity knowledge is transformed in the form of soft hierarchical semantic clusters and it is successfully incorporated into a machine learning based dialogue system, reinforcing the performance of user’s plan recognition by estimating the semantic role of unknown words. Λεξική σημασιολογία Αυτόματες μέθοδοι Στατιστικές μέθοδοι Συμφράσεις 410.285 Lexical semantics Automatic methods Statistical methods Natural language processing Semantic similarity Collocations
1076	Αυτόματη επιλογή σημασιολογικά συγγενών όρων για την επαναδιατύπωση των ερωτημάτων σε μηχανές αναζήτησης πληροφορίας / Automatic selection of semantic related terms for reformulating a query into a search engine Κοζανίδης, Ελευθέριος 14 September 2007 (has links) Η βελτίωση ερωτημάτων (Query refinement) είναι η διαδικασία πρότασης εναλλακτικών όρων στους χρήστες των μηχανών αναζήτησης του Διαδικτύου για την διατύπωση της πληροφοριακής τους ανάγκης. Παρόλο που εναλλακτικοί σχηματισμοί ερωτημάτων μπορούν να συνεισφέρουν στην βελτίωση των ανακτηθέντων αποτελεσμάτων, η χρησιμοποίησή τους από χρήστες του Διαδικτύου είναι ιδιαίτερα περιορισμένη καθώς οι όροι των βελτιωμένων ερωτημάτων δεν περιέχουν σχεδόν καθόλου πληροφορία αναφορικά με τον βαθμό ομοιότητάς τους με τους όρους του αρχικού ερωτήματος, ενώ συγχρόνως δεν καταδεικνύουν το βαθμό συσχέτισής τους με τα πληροφοριακά ενδιαφέροντα των χρηστών. Παραδοσιακά, οι εναλλακτικοί σχηματισμοί ερωτημάτων καθορίζονται κατ’ αποκλειστικότητα από τη σημασιολογική σχέση που επιδεικνύουν οι συμπληρωματικοί όροι με τους αρχικούς όρους του ερωτήματος, χωρίς να λαμβάνουν υπόψη τον επιδιωκόμενο στόχο της αναζήτησης που υπολανθάνει πίσω από ένα ερώτημα του χρήστη. Στην παρούσα εργασία θα παρουσιάσουμε μια πρότυπη τεχνική βελτίωσης ερωτημάτων η οποία χρησιμοποιεί μια λεξική οντολογία προκειμένου να εντοπίσει εναλλακτικούς σχηματισμούς ερωτημάτων οι οποίοι αφενός, θα περιγράφουν το αντικείμενο της αναζήτησης του χρήστη και αφετέρου θα σχετίζονται με τα ερωτήματα που υπέβαλε ο χρήστης. Το πιο πρωτοποριακό χαρακτηριστικό της τεχνικής μας είναι η οπτική αναπαράσταση του εναλλακτικού ερωτήματος με την μορφή ενός ιεραρχικά δομημένου γράφου. Η αναπαράσταση αυτή παρέχει σαφείς πληροφορίες για την σημασιολογική σχέση μεταξύ των όρων του βελτιωμένου ερωτήματος και των όρων που χρησιμοποίησε ο χρήστης για να εκφράσει την πληροφοριακή του ανάγκη ενώ παράλληλα παρέχει την δυνατότητα στον χρήστη να επιλέξει ποιοι από τους υποψήφιους όρους θα συμμετέχουν τελικά στην διαδικασία βελτιστοποίησης δημιουργώντας διαδραστικά το νέο ερώτημα. Τα αποτελέσματα των πειραμάτων που διενεργήσαμε για να αξιολογήσουμε την απόδοση της τεχνικής μας, είναι ιδιαίτερα ικανοποιητικά και μας οδηγούν στο συμπέρασμα ότι η μέθοδός μας μπορεί να βοηθήσει σημαντικά στη διευκόλυνση του χρήστη κατά τη διαδικασία επιλογής ερωτημάτων για την ανάκτηση πληροφορίας από τα δεδομένα του Παγκόσμιου Ιστού. / Query refinement is the process of providing Web information seekers with alternative wordings for expressing their information needs. Although alternative query formulations may contribute to the improvement of retrieval results, nevertheless their realization by Web users is intrinsically limited in that alternative query wordings do not convey explicit information about neither their degree nor their type of correlation to the user-issued queries. Moreover, alternative query formulations are determined based on the semantics of the issued query alone and they do not consider anything about the search intentions of the user issuing that query. In this paper, we introduce a novel query refinement technique which uses a lexical ontology for identifying alternative query formulations that are both informative of the user’s interests and related to the user selected queries. The most innovative feature of our technique is the visualization of the alternative query wordings in a graphical representation form, which conveys explicit information about the refined queries correlation to the user issued requests and which allows the user select which terms to participate in the refinement process. Experimental results demonstrate that our method has a significant potential in improving the user search experience. Διεύρυνση ερωτημάτων Βελτίωση ερωτημάτων Σημασιολογικά δίκτυα Ανάκτηση πληροφορίας 025.524 Query expansion Query refinement Semantic networks Information retrieval Natural language processing Sense disambiguation
1077	Automatic Concept-Based Query Expansion Using Term Relational Pathways Built from a Collection-Specific Association Thesaurus Lyall-Wilson, Jennifer Rae January 2013 (has links) The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of all the concepts within the document collection as well as the relationships these concepts have to one another. Because the representation is generated using data from the association thesaurus, a mapping will exist between the representation of the concepts and the terms used to describe these concepts. The research applies to search engines designed for use in an individual website with content focused on a specific conceptual domain. Therefore, both the document collection and the subject content must be well-bounded, which affords the ability to make use of techniques not currently feasible for general purpose search engine used on the entire web. automatic query expansion conceptual network information retrieval Lucene search engine Natural Language Processing (NLP) association thesaurus
1078	JSreal : un réalisateur de texte pour la programmation web Daoust, Nicolas 09 1900 (has links) La génération automatique de texte en langage naturel est une branche de l’intelligence artificielle qui étudie le développement de systèmes produisant des textes pour différentes applications, par exemple la description textuelle de jeux de données massifs ou l’automatisation de rédactions textuelles routinières. Un projet de génération de texte comporte plusieurs grandes étapes : la détermination du contenu à exprimer, son organisation en structures comme des paragraphes et des phrases et la production de chaînes de caractères pour un lecteur humain ; c’est la réalisation, à laquelle ce mémoire s’attaque. Le web est une plateforme en constante croissance dont le contenu, de plus en plus dynamique, se prête souvent bien à l’automatisation par un réalisateur. Toutefois, les réalisateurs existants ne sont pas conçus en fonction du web et leur utilisation requiert beaucoup de connaissances, compliquant leur emploi. Le présent mémoire de maîtrise présente JSreal, un réalisateur conçu spécifiquement pour le web et facile d’apprentissage et d’utilisation. JSreal permet de construire une variété d’expressions et de phrases en français, qui respectent les règles de grammaire et de syntaxe, d’y ajouter des balises HTML et de les intégrer facilement aux pages web. / Natural language generation, a part of artificial intelligence, studies the development of systems that produce text for different applications, for example the textual description of massive datasets or the automation of routine text redaction. Text generation projects consist of multiple steps : determining the content to be expressed, organising it in logical structures such as sentences and paragraphs, and producing human-readable character strings, a step usually called realisation, which this thesis takes on. The web is constantly growing and its contents, getting progressively more dynamic, are well-suited to automation by a realiser. However, existing realisers are not designed with the web in mind and their operation requires much knowledge, complicating their use. This master’s thesis presents JSreal, a realiser designed specifically for the web and easy to learn and use. JSreal allows its user to build a variety of French expressions and sentences, to add HTML tags to them and to easily integrate them into web pages. / Site web associé au mémoire: http://daou.st/JSreal Génération automatique de texte Réalisation de texte Natural language processing Natural language generation Text realisation
1079	Improvements to the complex question answering models Imam, Md. Kaisar January 2011 (has links) In recent years the amount of information on the web has increased dramatically. As a result, it has become a challenge for the researchers to find effective ways that can help us query and extract meaning from these large repositories. Standard document search engines try to address the problem by presenting the users a ranked list of relevant documents. In most cases, this is not enough as the end-user has to go through the entire document to find out the answer he is looking for. Question answering, which is the retrieving of answers to natural language questions from a document collection, tries to remove the onus on the end-user by providing direct access to relevant information. This thesis is concerned with open-domain complex question answering. Unlike simple questions, complex questions cannot be answered easily as they often require inferencing and synthesizing information from multiple documents. Hence, we considered the task of complex question answering as query-focused multi-document summarization. In this thesis, to improve complex question answering we experimented with both empirical and machine learning approaches. We extracted several features of different types (i.e. lexical, lexical semantic, syntactic and semantic) for each of the sentences in the document collection in order to measure its relevancy to the user query. We have formulated the task of complex question answering using reinforcement framework, which to our best knowledge has not been applied for this task before and has the potential to improve itself by fine-tuning the feature weights from user feedback. We have also used unsupervised machine learning techniques (random walk, manifold ranking) and augmented semantic and syntactic information to improve them. Finally we experimented with question decomposition where instead of trying to find the answer of the complex question directly, we decomposed the complex question into a set of simple questions and synthesized the answers to get our final result. / x, 128 leaves : ill. ; 29 cm Question-answering systems -- Research Database searching Querying (Computer science) -- Research Information retrieval -- Research Dissertations, Academic
1080	Grapheme-to-phoneme conversion and its application to transliteration Jiampojamarn, Sittichai Unknown Date No description available. Grapheme to phoneme Transliteration String transduction Natural language processing NLP Computational linguistics Alignments Online large margin training Text to speech Speech synthesis

Search results