Global ETD Search

21	Pattern Acquisition Methods for Information Extraction Systems Marcińczuk, Michał January 2007 (has links) This master thesis treats about Event Recognition in the reports of Polish stockholders. Event Recognition is one of the Information Extraction tasks. This thesis provides a comparison of two approaches to Event Recognition: manual and automatic. In the manual approach regular expressions are used. Regular expressions are used as a baseline for the automatic approach. In the automatic approach three Machine Learning methods were applied. In the initial experiment the Decision Trees, naive Bayes and Memory Based Learning methods are compared. A modification of the standard Memory Based Learning method is presented which goal is to create a classifier that uses only positives examples in the classification task. The performance of the modified Memory Based Learning method is presented and compared to the baseline and also to other Machine Learning methods. In the initial experiment one type of annotation is used and it is the meeting date annotation. The final experiment is conducted using three types of annotations: the meeting time, the meeting date and the meeting place annotation. The experiments show that the classification can be performed using only one class of instances with the same level of performance. / (+48)669808616 Natural Language Processing Information Extraction Patterns Acquisition Linguistic Patterns Memory Based Learning Event Recognition Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
22	Automatsko određivanje vrsta riječi u morfološki složenom jeziku / Automatic parts of speech determination in amorphologically complex language Dimitrijević Strahinja 24 July 2015 (has links) <p>Istraţivanje je imalo za cilj da provjeri u<br />kojoj mjeri se na&scaron; kognitivni sistem moţe<br />osloniti na fonotaktiĉke informacije, tj.<br />moguće/dozvoljene kombinacije fonema/<br />grafema, u zadacima automatske percepcije i<br />produkcije rijeĉi u jezicima sa bogatom<br />infleksionom morfologijom.<br />Da bi se dobio odgovor na to pitanje,<br />sprovedene su tri studije. U prvoj studiji, uz<br />pomoć ma&scaron;ina sa vektorima podr&scaron;ke (SVM),<br />obavljena je diskriminacija promjenljivih<br />vrsta rijeĉi. U drugoj studiji, produkcija<br />infleksionih oblika rijeĉi izvedena je<br />pomoću uĉenja zasnovanog na memoriji<br />(MBL). Na osnovu rezultata iz druge studije,<br />izveden je eksperiment u kojem se traţila<br />potvrda kognitivne vjerodostojnosti modela i<br />kori&scaron;ćenih informacija.<br />Diskriminacija promjenljivih vrsta rijeĉi<br />obavljena je na osnovu dozvoljenih sekvenci<br />dva i tri grafema/fonema (tzv. bigrama i<br />trigrama), ĉije su frekvencije javljanja<br />unutar pojedinaĉnih gramatiĉkih tipova<br />izraĉunate u zavisnosti od njihovog poloţaja<br />u rijeĉima: na poĉetku, na kraju, unutar<br />rijeĉi, svi zajedno. Maksimalna taĉnost se<br />kretala oko 95% i dobijena je na svim<br />bigramima, uz pomoć RBF jezgrene<br />funkcije. Ovako visok procenat taĉne<br />diskriminacije ukazuje da postoje<br />karakteristiĉne distribucije bigrama za<br />razliĉite vrste promjenljivih rijeĉi. S druge<br />strane, najmanje informativnim su se<br />pokazali bigrami na kraju i na poĉetku rijeĉi.<br />MBL model iskori&scaron;ćen je u zadatku<br />automatske infleksione produkcije, tako &scaron;to<br />je za zadatu rijeĉ, na osnovu fonotaktiĉkih<br />informacija iz posljednja ĉetiri sloga,<br />generisan traţeni infleksioni oblik. Na<br />uzorku od 89024 promjenljivih rijeĉi uzetih<br />iz Frekvencijskog reĉnika dnevne &scaron;tampe<br />srpskog jezika, koristeći metod izostavljanja<br />jednog primjera i konstantu veliĉinu skupa<br />susjeda (k = 7), ostvarena je taĉnost oko<br />92%. Identifikovano je nekoliko faktora koji<br />su uticali na ovu taĉnost, kao &scaron;to su: vrsta<br />rijeĉi, gramatiĉki tip, naĉin tvorbe i broj<br />primjera u okviru jednog gramatiĉkog tipa,<br />broju izuzetaka, broj fonolo&scaron;kih alternacija<br />itd.<br />U istraţivanju na subjektima, u zadatku<br />leksiĉke odluke, za rijeĉi koje je MBL<br />pogre&scaron;no obradio utvrĊeno je duţe vrijeme<br />obrade. Ovo ukazuje na kognitivnu<br />vjerodostojnost uĉenja zasnovanog na<br />memoriji. Osim toga, potvrĊena je i<br />kognitivna vjerodostojnost fonotaktiĉkih<br />informacija, ovaj put u zadatku<br />razumijevanja jezika.<br />Sveukupno, nalazi dobijeni u ove tri studije<br />govore u prilog teze o znaĉajnoj ulozi<br />fonotaktiĉkih informacija u percepciji i<br />produkciji morfolo&scaron;ki sloţenih rijeĉi.<br />Rezultati, takoĊe, ukazuju na potrebu da se<br />ove informacije uzmu u obzir kada se<br />diskutuje pojavljivanje većih jeziĉkih<br />jedinica i obrazaca.</p> / <p>The study was aimed at testing the extent to<br />which our cognitive system can rely on<br />phonotactic information, i.e., possible/<br />permissible combinations of phonemes/<br />graphemes, in the tasks of automatic<br />processing and production of words in<br />languages with rich inflectional<br />morphology.<br />In order to obtain the answer to this<br />question, three studies have been conducted.<br />In the first study, by applying the support<br />vector machines (SVM) the discrimination<br />of part of speech (PoS) with more than one<br />possible meaning (i.e., ambiguous PoS) was<br />performed. In the second study, the<br />production of inflected word forms was<br />done with memory based learning (MBL).<br />Based on the results from the second study,<br />a behavioral experiment was conducted as<br />the third study, to test cognitive plausibility<br />of the MBL performance.<br />The discrimination of ambiguous PoS was<br />performed using permissible sequences of<br />two and three characters/sounds (i.e.,<br />bigrams and trigrams), whose frequency of<br />occurrence within individual grammatical<br />types was calculated depending on their<br />position in a word: at the beginning, at the<br />end, and irrespective of position in a word.<br />Maximum accuracy achieved was<br />approximatelly 95%. It was obtained when<br />bigrams irrespective of position in a word<br />were used. SVM model used RBF kernel<br />function. Such high accuracy suggests that<br />brigrams' probability distribution is<br />informative about the types of flective<br />words. Interestingly, the least informative<br />were bigrams at the end and at the beginning<br />of words.<br />The MBL model was used in the task of<br />automatic production of inflected forms,<br />utilizingphonotactic information from the<br />last four syllables. In a sample of 89024<br />flective words, taken from the Frequency<br />dictionary of Serbian language (daily press),<br />achieved accuracy was 92%. For this result<br />the MBL used leave<br />-one<br />-out method and nearest neighborhood size of 7 (k = 7). We</p><p>identified several factors that have<br />contributed to the accuracy; in particular,<br />part of speech, grammatical type, formation<br />method and number of examples within one<br />grammatical type, number of exceptions, the<br />number of phonological alternations, etc.<br />The visual lexical decision experiment<br />revealed that words that the MBL model<br />produced incorrectly also induced elongated<br />reaction time latencies. Thus, we concluded<br />that the MBL model might be cognitively<br />plausibile. In addition, we reconfirmed<br />informativeness of phonotactic information,<br />this time in human conmprehension task.<br />Overall, findings from three undertaken<br />studies are in favor of phonotactic<br />information for both processing and<br />production of morphologically complex<br />words. Results also suggest a necessity of<br />taking into account this information when<br />discussing emergence of larger units and<br />language patterns.</p>
23	The memory-based paradigm for vision-based robot localization Jüngel, Matthias 04 October 2012 (has links) Für mobile autonome Roboter ist ein solides Modell der Umwelt eine wichtige Voraussetzung um die richtigen Entscheidungen zu treffen. Die gängigen existierenden Verfahren zur Weltmodellierung basieren auf dem Bayes-Filter und verarbeiten Informationen mit Hidden Markov Modellen. Dabei wird der geschätzte Zustand der Welt (Belief) iterativ aktualisiert, indem abwechselnd Sensordaten und das Wissen über die ausgeführten Aktionen des Roboters integriert werden; alle Informationen aus der Vergangenheit sind im Belief integriert. Wenn Sensordaten nur einen geringen Informationsgehalt haben, wie zum Beispiel Peilungsmessungen, kommen sowohl parametrische Filter (z.B. Kalman-Filter) als auch nicht-parametrische Filter (z.B. Partikel-Filter) schnell an ihre Grenzen. Das Problem ist dabei die Repräsentation des Beliefs. Es kann zum Beispiel sein, dass die gaußschen Modelle beim Kalman-Filter nicht ausreichen oder Partikel-Filter so viele Partikel benötigen, dass die Rechendauer zu groß wird. In dieser Dissertation stelle ich ein neues Verfahren zur Weltmodellierung vor, das Informationen nicht sofort integriert, sondern erst bei Bedarf kombiniert. Das Verfahren wird exemplarisch auf verschiedene Anwendungsfälle aus dem RoboCup (autonome Roboter spielen Fußball) angewendet. Es wird gezeigt, wie vierbeinige und humanoide Roboter ihre Position und Ausrichtung auf einem Spielfeld sehr präzise bestimmen können. Grundlage für die Lokalisierung sind bildbasierte Peilungsmessungen zu Objekten. Für die Roboter-Ausrichtung sind dabei Feldlinien eine wichtige Informationsquelle. In dieser Dissertation wird ein Verfahren zur Erkennung von Feldlinien in Kamerabildern vorgestellt, das ohne Kalibrierung auskommt und sehr gute Resultate liefert, auch wenn es starke Schatten und Verdeckungen im Bild gibt. / For autonomous mobile robots, a solid world model is an important prerequisite for decision making. Current state estimation techniques are based on Hidden Markov Models and Bayesian filtering. These methods estimate the state of the world (belief) in an iterative manner. Data obtained from perceptions and actions is accumulated in the belief which can be represented parametrically (like in Kalman filters) or non-parametrically (like in particle filters). When the sensor''s information gain is low, as in the case of bearing-only measurements, the representation of the belief can be challenging. For instance, a Kalman filter''s Gaussian models might not be sufficient or a particle filter might need an unreasonable number of particles. In this thesis, I introduce a new state estimation method which doesn''t accumulate information in a belief. Instead, perceptions and actions are stored in a memory. Based on this, the state is calculated when needed. The system has a particular advantage when processing sparse information. This thesis presents how the memory-based technique can be applied to examples from RoboCup (autonomous robots play soccer). In experiments, it is shown how four-legged and humanoid robots can localize themselves very precisely on a soccer field. The localization is based on bearings to objects obtained from digital images. This thesis presents a new technique to recognize field lines which doesn''t need any pre-run calibration and also works when the field lines are partly concealed and affected by shadows. Künstliche Intelligenz Robotik KI Fußball Lokalisierung Markov Monte-Carlo RoboCup Robotics Artificial Intelligence AI Robots Soccer Localization Self-Localization State Estimation Bayes Filter Particle Filter Memory-Based 004 Informatik 28 Informatik, Datenverarbeitung ST 308 ddc:004
24	Data-driven syntactic analysis Megyesi, Beata January 2002 (has links) No description available. natural language processing machine learning data-driven methods part-of-speech tagging chunking shallow parsing evaluation hidden Markov modeling maximum entropy learning memory-based learning transformation-based learning morphology syn
25	Data-driven syntactic analysis Megyesi, Beata January 2002 (has links) No description available. natural language processing machine learning data-driven methods part-of-speech tagging chunking shallow parsing evaluation hidden Markov modeling maximum entropy learning memory-based learning transformation-based learning morphology syn

Page generated in 0.0296 seconds