• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 26
  • 15
  • 8
  • 6
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 67
  • 23
  • 17
  • 15
  • 13
  • 12
  • 12
  • 12
  • 12
  • 11
  • 11
  • 10
  • 9
  • 9
  • 9
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Μεθοδολογία αυτόματου σημασιολογικού σχολιασμού στο περιεχόμενο ιστοσελίδων

Σπύρος, Γεώργιος 14 December 2009 (has links)
Στις μέρες μας η χρήση του παγκόσμιου ιστού έχει εξελιχθεί σε ένα κοινωνικό φαινόμενο. Η εξάπλωσή του είναι συνεχής και εκθετικά αυξανόμενη. Στα χρόνια που έχουν μεσολαβήσει από την εμφάνισή του, οι χρήστες έχουν αποκτήσει ένα βαθμό εμπειρίας και έχει γίνει από πλευράς τους ένα σύνολο αποδοχών βασισμένων σε αυτή ακριβώς την εμπειρία από τη χρήση του παγκόσμιου ιστού. Πιο συγκεκριμένα έχει γίνει αντιληπτό από τους χρήστες το γεγονός ότι οι ιστοσελίδες με τις οποίες αλληλεπιδρούν καθημερινά σχεδόν είναι δημιουργήματα κάποιων άλλων χρηστών. Επίσης έχει γίνει αντιληπτό ότι ο κάθε χρήστης μπορεί να δημιουργήσει τη δική του ιστοσελίδα και μάλιστα να περιλάβει σε αυτή αναφορές προς μια άλλη ιστοσελίδα κάποιου άλλου χρήστη. Οι αναφορές αυτές όμως, συνήθως δεν εμφανίζονται απλά και μόνο με τη μορφή ενός υπερσυνδέσμου. Τις περισσότερες φορές υπάρχει και κείμενο που τις συνοδεύει και που παρέχει πληροφορίες για το περιεχόμενο της αναφερόμενης ιστοσελίδας. Σε αυτή τη διπλωματική εργασία περιγράφουμε μια μεθοδολογία για τον αυτόματο σημασιολογικό σχολιασμό του περιεχομένου ιστοσελίδων. Τα εργαλεία και οι τεχνικές που περιγράφονται βασίζονται σε δύο κύριες υποθέσεις. Πρώτον, οι άνθρωποι που δημιουργούν και διατηρούν ιστοσελίδες περιγράφουν άλλες ιστοσελίδες μέσα σε αυτές. Δεύτερον, οι άνθρωποι συνδέουν τις ιστοσελίδες τους με την εκάστοτε ιστοσελίδα την οποία περιγράφουν μέσω ενός συνδέσμου αγκύρωσης (anchor link) που είναι καθαρά σημαδεμένος με μία συγκεκριμένη ετικέτα (tag) μέσα στον εκάστοτε HTML κώδικα. Ο αυτόματος σημασιολογικός σχολιασμός που επιχειρούμε για μια ιστοσελίδα ισοδυναμεί με την εύρεση μιας ετικέτας (tag) ικανής να περιγράψει το περιεχόμενο της. Η εύρεση αυτής της ετικέτας είναι μια διαδικασία που βασίζεται σε μία συγκεκριμένη μεθοδολογία που αποτελείται από ένα συγκεκριμένο αριθμό βημάτων. Κάθε βήμα από αυτά υλοποιείται με τη χρήση διαφόρων εργαλείων και τεχνικών και τροφοδοτεί με την έξοδό του την είσοδο του επόμενου βήματος. Βασική ιδέα της μεθοδολογίας είναι η συλλογή αρκετών κειμένων αγκύρωσης (anchor texts), καθώς και ενός μέρους του γειτονικού τους κειμένου, για μία ιστοσελίδα. Η συλλογή αυτή προκύπτει ύστερα από επεξεργασία αρκετών ιστοσελίδων που περιέχουν υπερσυνδέσμους προς τη συγκεκριμένη ιστοσελίδα. Η σημασιολογική ετικέτα για μια ιστοσελίδα προκύπτει από την εφαρμογή διαφόρων τεχνικών γλωσσολογικής επεξεργασίας στη συλλογή των κειμένων που την αφορούν. Έτσι προκύπτει το τελικό συμπέρασμα για το σημασιολογικό σχολιασμό του περιεχομένου της ιστοσελίδας. / Nowadays the World Wide Web usage has evolved into a social phenomenon. It’s spread is constant and it’s increasing exponentially. During the years that have passed since it’s first appearance, the users have gained a certain level of experience and they have made some acceptances through this experience. They have understood that the web pages with which they interact in their everyday web activities, are creations from some other users. It has also become clear that every user can create his own web page and include in it references to some other pages of his liking. These references don’t simply exist as hyperlinks. Most of the time they are accompanied by some text which provides useful information about the referenced page’s content. In this diploma thesis we describe a methodology for the automatic annotation of a web page’s contents. The tools and techniques that are described, are based in two main hypotheses. First, humans that create web pages describe other web pages inside them. Second, humans connect their web pages with any web page they describe via an anchor link which is clearly described with a tag in each page’s HTML code. The automatic semantic annotation that we attempt here for a web page is the process of finding a tag able to describe the page’s contents. The finding of this tag is a process based in a certain methodology which consists of a number of steps. Each step of these is implemented using various tools and techniques and his output is the next step’s input. The basic idea behind our methodology is to collect as many anchor texts as possible, along with a window of words around them, for each web page. This collection is the result of a procedure which involves the processing of many web pages that contain hyperlinks to the web page which we want to annotate. The semantic tag for a web page is derived from the usage of certain natural language processing techniques in the collection of documents that refer to the web page. Thus the final conclusion for the web page’s contents annotation is extracted.
42

Constitution d'une ressource sémantique arabe à partir d'un corpus multilingue aligné

Abdulhay, Authoul 23 November 2012 (has links) (PDF)
Cette thèse vise à la mise en œuvre et à l'évaluation de techniques d'extraction de relations sémantiques à partir d'un corpus multilingue aligné. Ces relations seront extraites par transitivité de l'équivalence traductionnelle, deux lexèmes possédant les mêmes équivalents dans une langue cible étant susceptibles de partager un même sens. D'abord, nos observations porteront sur la comparaison sémantique d'équivalents traductionnels dans des corpus multilingues alignés. A partir des équivalences, nous tâcherons d'extraire des "cliques", ou sous-graphes maximaux complets connexes, dont toutes les unités sont en interrelation, du fait d'une probable intersection sémantique. Ces cliques présentent l'intérêt de renseigner à la fois sur la synonymie et la polysémie des unités, et d'apporter une forme de désambiguïsation sémantique. Elles seront créées à partir de l'extraction automatique de correspondances lexicales, basée sur l'observation des occurrences et cooccurrences en corpus. Le recours à des techniques de lemmatisation sera envisagé. Ensuite nous tâcherons de relier ces cliques avec un lexique sémantique (de type Wordnet) afin d'évaluer la possibilité de récupérer pour les unités arabes des relations sémantiques définies pour des unités en anglais ou en français. Ces relations permettraient de construire automatiquement un réseau utile pour certaines applications de traitement de la langue arabe, comme les moteurs de question-réponse, la traduction automatique, les systèmes d'alignement, la recherche d'information, etc.
43

'Consider' and its Swedish equivalents in relation to machine translation

Andersson, Karin January 2007 (has links)
This study describes the English verb ’consider’ and the characteristics of some of its senses. An investigation of this kind may be useful, since a machine translation program, SYSTRAN, has invariably translated ’consider’ with the Swedish verbs ’betrakta’ (Eng: ’view’, regard’) and ’anse’ (Eng: ’regard’). This handling of ’consider’ is not satisfactory in all contexts. Since ’consider’ is a cogitative verb, it is fascinating to observe that both the theory of semantic primes and universals and conceptual semantics are concerned with cogitation in various ways. Anna Wierzbicka, who is one of the advocates of semantic primes and universals, argues that THINK should be considered as a semantic prime. Moreover, one of the prime issues of conceptual semantics is to describe how thoughts are constructed by virtue of e.g. linguistic components, perception and experience. In order to define and clarify the distinctions between the different senses, we have taken advantage of the theory of mental spaces. This thesis has been structured in accordance with the meanings that have been indicated in WordNet as to ’consider’. As a consequence, the senses that ’consider’ represents have been organized to form the subsequent groups: ’Observation’, ’Opinion’ together with its sub-group ’Likelihood’ and ’Cogitation’ followed by its sub-group ’Attention/Consideration’. A concordance tool, http://www.nla.se/culler, provided us with 90 literary quotations that were collected in a corpus. Afterwards, these citations were distributed between the groups mentioned above and translated into Swedish by SYSTRAN. Furthermore, the meanings as to ’consider’ have also been related to the senses, recorded by the FrameNet scholars. Here, ’consider’ is regarded as a verb of ’Cogitation’ and ’Categorization’. When this study was accomplished, it could be inferred that certain senses are connected to specific syntactic constructions. In other cases, however, the distinctions between various meanings can only be explained by virtue of semantics. To conclude, it appears to be likely that an implementation is facilitated if a specific syntactic construction can be tied to a particular sense. This may be the case concerning some meanings of ’consider’. Machine translation is presumably a much more laborious task, if one is solely governed by semantic conditions.
44

Semantic Analysis of Natural Language and Definite Clause Grammar using Statistical Parsing and Thesauri

Dagerman, Björn January 2013 (has links)
Services that rely on the semantic computations of users’ natural linguistic inputs are becoming more frequent. Computing semantic relatedness between texts is problematic due to the inherit ambiguity of natural language. The purpose of this thesis was to show how a sentence could be compared to a predefined semantic Definite Clause Grammar (DCG). Furthermore, it should show how a DCG-based system could benefit from such capabilities. Our approach combines openly available specialized NLP frameworks for statistical parsing, part-of-speech tagging and word-sense disambiguation. We compute the semantic relatedness using a large lexical and conceptual-semantic thesaurus. Also, we extend an existing programming language for multimodal interfaces, which uses static predefined DCGs: COactive Language Definition (COLD). That is, every word that should be acceptable by COLD needs to be explicitly defined. By applying our solution, we show how our approach can remove dependencies on word definitions and improve grammar definitions in DCG-based systems.
45

Taxonomy Based Image Retrieval : Taxonomy Based Image Retrieval using Data from Multiple Sources / Taxonomibaserad Bildsök

Larsson, Jimmy January 2016 (has links)
With a multitude of images available on the Internet, how do we find what we are looking for? This project tries to determine how much the precision and recall of search queries is improved by using a word taxonomy on traditional Text-Based Image Search and Content-Based Image Search. By applying a word taxonomy to different data sources, a strong keyword filter and a keyword extender were implemented and tested. The results show that depending on the implementation, the precision or the recall can be increased. By using a similar approach on real life implementations, it is possible to force images with higher precisions to the front while keeping a high recall value, thus increasing the experienced relevance of image search. / Med den mängd bilder som nu finns tillgänglig på Internet, hur kan vi fortfarande hitta det vi letar efter? Denna uppsats försöker avgöra hur mycket bildprecision och bildåterkallning kan öka med hjälp av appliceringen av en ordtaxonomi på traditionell Text-Based Image Search och Content-Based Image Search. Genom att applicera en ordtaxonomi på olika datakällor kan ett starkt ordfilter samt en modul som förlänger ordlistor skapas och testas. Resultaten pekar på att beroende på implementationen så kan antingen precisionen eller återkallningen förbättras. Genom att använda en liknande metod i ett verkligt scenario är det därför möjligt att flytta bilder med hög precision längre fram i resultatlistan och samtidigt behålla hög återkallning, och därmed öka den upplevda relevansen i bildsök.
46

Informační technologie v psychologii / Information Technologies in Psychology

Ličko, Jozef January 2009 (has links)
We focus on characteristic traits recognition of the autor from his written text. This thesis, in particular, deals with the implementaion of Kreitler psychosemantics method. The result of our work includes our own vocabulary, that is used to assign one of the parameters from the method. Implemented solution is successful when used on a set of words that was used as a source for the vocabulary construction.
47

A generic architecture for semantic enhanced tagging systems

Magableh, Murad January 2011 (has links)
The Social Web, or Web 2.0, has recently gained popularity because of its low cost and ease of use. Social tagging sites (e.g. Flickr and YouTube) offer new principles for end-users to publish and classify their content (data). Tagging systems contain free-keywords (tags) generated by end-users to annotate and categorise data. Lack of semantics is the main drawback in social tagging due to the use of unstructured vocabulary. Therefore, tagging systems suffer from shortcomings such as low precision, lack of collocation, synonymy, multilinguality, and use of shorthands. Consequently, relevant contents are not visible, and thus not retrievable while searching in tag-based systems. On the other hand, the Semantic Web, so-called Web 3.0, provides a rich semantic infrastructure. Ontologies are the key enabling technology for the Semantic Web. Ontologies can be integrated with the Social Web to overcome the lack of semantics in tagging systems. In the work presented in this thesis, we build an architecture to address a number of tagging systems drawbacks. In particular, we make use of the controlled vocabularies presented by ontologies to improve the information retrieval in tag-based systems. Based on the tags provided by the end-users, we introduce the idea of adding “system tags” from semantic, as well as social, resources. The “system tags” are comprehensive and wide-ranging in comparison with the limited “user tags”. The system tags are used to fill the gap between the user tags and the search terms used for searching in the tag-based systems. We restricted the scope of our work to tackle the following tagging systems shortcomings: - The lack of semantic relations between user tags and search terms (e.g. synonymy, hypernymy), - The lack of translation mediums between user tags and search terms (multilinguality), - The lack of context to define the emergent shorthand writing user tags. To address the first shortcoming, we use the WordNet ontology as a semantic lingual resource from where system tags are extracted. For the second shortcoming, we use the MultiWordNet ontology to recognise the cross-languages linkages between different languages. Finally, to address the third shortcoming, we use tag clusters that are obtained from the Social Web to create a context for defining the meaning of shorthand writing tags. A prototype for our architecture was implemented. In the prototype system, we built our own database to host videos that we imported from real tag-based system (YouTube). The user tags associated with these videos were also imported and stored in the database. For each user tag, our algorithm adds a number of system tags that came from either semantic ontologies (WordNet or MultiWordNet), or from tag clusters that are imported from the Flickr website. Therefore, each system tag added to annotate the imported videos has a relationship with one of the user tags on that video. The relationship might be one of the following: synonymy, hypernymy, similar term, related term, translation, or clustering relation. To evaluate the suitability of our proposed system tags, we developed an online environment where participants submit search terms and retrieve two groups of videos to be evaluated. Each group is produced from one distinct type of tags; user tags or system tags. The videos in the two groups are produced from the same database and are evaluated by the same participants in order to have a consistent and reliable evaluation. Since the user tags are used nowadays for searching the real tag-based systems, we consider its efficiency as a criterion (reference) to which we compare the efficiency of the new system tags. In order to compare the relevancy between the search terms and each group of retrieved videos, we carried out a statistical approach. According to Wilcoxon Signed-Rank test, there was no significant difference between using either system tags or user tags. The findings revealed that the use of the system tags in the search is as efficient as the use of the user tags; both types of tags produce different results, but at the same level of relevance to the submitted search terms.
48

Unsupervised Knowledge-based Word Sense Disambiguation: Exploration & Evaluation of Semantic Subgraphs

Manion, Steve Lawrence January 2014 (has links)
Hypothetically, if you were told: Apple uses the apple as its logo . You would immediately detect two different senses of the word apple , these being the company and the fruit respectively. Making this distinction is the formidable challenge of Word Sense Disambiguation (WSD), which is the subtask of many Natural Language Processing (NLP) applications. This thesis is a multi-branched investigation into WSD, that explores and evaluates unsupervised knowledge-based methods that exploit semantic subgraphs. The nature of research covered by this thesis can be broken down to: 1. Mining data from the encyclopedic resource Wikipedia, to visually prove the existence of context embedded in semantic subgraphs 2. Achieving disambiguation in order to merge concepts that originate from heterogeneous semantic graphs 3. Participation in international evaluations of WSD across a range of languages 4. Treating WSD as a classification task, that can be optimised through the iterative construction of semantic subgraphs The contributions of each chapter are ranged, but can be summarised by what has been produced, learnt, and raised throughout the thesis. Furthermore an API and several resources have been developed as a by-product of this research, all of which can be accessed by visiting the author’s home page at http://www.stevemanion.com. This should enable researchers to replicate the results achieved in this thesis and build on them if they wish.
49

VerbNet.Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil / VerbNet.BR: the semi-automatic construction of an on-line and domain-independent Verb Lexicon for Brazilian Portuguese

Scarton, Carolina Evaristo 28 January 2013 (has links)
A criação de recursos linguístico-computacionais de base, como é o caso dos léxicos computacionais, é um dos focos da área de Processamento de Línguas Naturais (PLN). Porém, a maioria dos recursos léxicos computacionais existentes é específica da língua inglesa. Dentre os recursos já desenvolvidos para a língua inglesa, tem-se a VerbNet, que é um léxico com informações semânticas e sintáticas dos verbos do inglês, independente de domínio, construído com base nas classes verbais de Levin, além de possuir mapeamentos para a WordNet de Princeton (WordNet). Considerando que há poucos estudos computacionais sobre as classes de Levin, que é a base da VerbNet, para línguas diferentes do inglês, e dada a carência de um léxico para o português nos moldes da VerbNet do inglês, este trabalho teve como objetivo a criação de um recurso léxico para o português do Brasil (chamado VerbNet.Br), semelhante à VerbNet. A construção manual destes recursos geralmente é inviável devido ao tempo gasto e aos erros inseridos pelo autor humano. Portanto, há um grande esforço na área para a criação destes recursos apoiada por técnicas computacionais. Uma técnica reconhecida e bastante usada é o uso de aprendizado de máquina em córpus para extrair informação linguística. A outra é o uso de recursos já existentes para outras línguas, em geral o inglês, visando à construção de um novo recurso alinhado, aproveitando-se de atributos multilíngues/cross-linguísticos (cross-linguistic) (como é o caso da classificação verbal de Levin). O método proposto neste mestrado para a construção da VerbNet.Br é genérico, porque pode ser utilizado para a construção de recursos semelhantes para outras línguas, além do português do Brasil. Além disso, futuramente, será possível estender este recurso via criação de subclasses de conceitos. O método para criação da VerbNet.Br é fundamentado em quatro etapas: três automáticas e uma manual. Porém, também foram realizados experimentos sem o uso da etapa manual, constatando-se, com isso, que ela pode ser descartada sem afetar a precisão e abrangência dos resultados. A avaliação do recurso criado foi realizada de forma intrínseca qualitativa e quantitativa. A avaliação qualitativa consistiu: (a) da análise manual de algumas classes da VerbNet, criando um gold standard para o português do Brasil; (b) da comparação do gold standard criado com os resultados da VerbNet.Br, obtendo resultados promissores, por volta de 60% de f-measure; e (c) da comparação dos resultados da VerbNet.Br com resultados de agrupamento de verbos, concluindo que ambos os métodos apresentam resultados similares. A avaliação quantitativa considerou a taxa de aceitação dos membros das classes da VerbNet.Br, apresentando resultados na faixa de 90% de aceitação dos membros em cada classe. Uma das contribuições deste mestrado é a primeira versão da VerbNet.Br, que precisa de validação linguística, mas que já contém informação para ser utilizada em tarefas de PLN, com precisão e abrangência de 44% e 92,89%, respectivamente / Building computational-linguistic base resources, like computational lexical resources (CLR), is one of the goals of Natural Language Processing (NLP). However, most computational lexicons are specific to English. One of the resources already developed for English is the VerbNet, a lexicon with domain-independent semantic and syntactic information of English verbs. It is based on Levin\'s verb classification, with mappings to Princeton\'s WordNet (WordNet). Since only a few computational studies for languages other than English have been made about Levin\'s classification, and given the lack of a Portuguese CLR similar to VerbNet, the goal of this research was to create a CLR for Brazilian Portuguese (called VerbNet.Br). The manual building of these resources is usually unfeasible because it is time consuming and it can include many human-made errors. Therefore, great efforts have been made to build such resources with the aid of computational techniques. One of these techniques is machine learning, a widely known and used method for extracting linguistic information from corpora. Another one is the use of pre-existing resources for other languages, most commonly English, to support the building of new aligned resources, taking advantage of some multilingual/cross-linguistic features (like the ones in Levin\'s verb classification). The method proposed here for the creation of VerbNet.Br is generic, therefore it may be used to build similar resources for languages other than Brazilian Portuguese. Moreover, the proposed method also allows for a future extension of the resource via subclasses of concepts. The VerbNet.Br has a four-step method: three automatic and one manual. However, experiments were also carried out without the manual step, which can be discarded without affecting precision and recall. The evaluation of the resource was intrinsic, both qualitative and quantitative. The qualitative evaluation consisted in: (a) manual analysis of some VerbNet classes, resulting in a Brazilian Portuguese gold standard; (b) comparison of this gold standard with the VerbNet.Br results, presenting promising results (almost 60% of f-measure); and (c), comparison of the VerbNet.Br results to verb clustering results, showing that both methods achieved similar results. The quantitative evaluation considered the acceptance rate of candidate members of VerbNet.Br, showing results around 90% of acceptance. One of the contributions of this research is to present the first version of VerbNet.Br. Although it still requires linguistic validation, it already provides information to be used in NLP tasks, with precision and recall of 44% and 92.89%, respectively
50

Semantic Relations in WordNet and the BNC

Ferschke, Oliver January 2009 (has links) (PDF)
From the introduction: It is not always easy to define what a word means. We can choose between a variety of possibilities, from simply pointing at the correct object as we say its name to lengthy definitions in encyclopaedias, which can sometimes fill multiple pages. Although the former approach is pretty straightforward and is also very important for first language acquisition, it is obviously not a practical solution for defining the semantics of the whole lexicon. The latter approach is more widely accepted in this context, but it turns out that defining dictionary and encyclopaedia entries is not an easy task. In order to simplify the challenge of defining the meaning of words, it is of great advantage to organize the lexicon in a way that the structure in which the words are integrated gives us information about the meaning of the words by showing their relation to other words. These semantic relations are the focal point of this paper. In the first chapter, different ways to describe meaning will be discussed. It will become obvious why semantic relations are a very good instrument to organizing the lexicon. The second chapter deals with WordNet, an electronic lexical database which follows precisely this approach. We will examine the semantic relations which are used in WordNet and we will study the distinct characteristics of each of them. Furthermore, we will see which contribution is made by which relation to the organization of the lexicon. Finally, we will look at the downside of the fact that WordNet is a manually engineered network by examining the shortcomings of WordNet. In the third chapter, an alternative approach to linguistics is introduced. We will discuss the principles of corpus linguistics and, using the example of the British National Corpus, we will consider possibilities to extract semantic relations from language corpora which could help to overcome the deficiencies of the knowledge based approach. In the fourth chapter, I will describe a project the goal of which is to extend WordNet by findings from cognitive linguistics. Therefore, I will discuss the development process of a piece of software that has been programmed in the course of this thesis. Furthermore, the results from a small‐scale study using this software will be analysed and evaluated in order to check for the success of the project. / Der Verfasser beschäftigt sich in seiner Magisterarbeit auf sehr detaillierte Weise mit semantischen Relationen von Wörtern. In einer Projektstudie versucht Herr Ferschke, auf der Basis eines bestehenden semantischen Netzes bestimmte kognitiv-relevante Objekte halb-automatisch herauszufiltern. Untermauert wird sein Projekt durch. eine Befragung Studierender zur konzeptuellen Einordnung dieser Objekte. Im ersten Kapitel legt Oliver Ferschke auf sehr fundierter linguistischer Basis verschiedene Möglichkeiten zur Beschreibung von Bedeutung dar. Er unterscheidet unterschiedliche Sichtweisen, was "Bedeutung" ausmacht und stellt diese klar gegenüber. Das zweite Kapitel widmet sich dem semantischen Netzwerk WordNet, welches sogenannte synsets für das Englische beschreibt. Aufbauend auf den in WordNet dargestellten semantischen Relationen stellt der Verfasser an ausgewählten Beispielen dar, wie englische Wörter in dieses Netzwerk eingebunden sind. Er bezieht sich dabei auf semantische Beziehungen wie Hyponymie, Meronymie, Gegenteile, Polysemie und belegt diese mit Beispielen. Darüber hinaus geht er auf einige Desiderata in WordNet ein. Das British National Corpus (BNC) wird im dritten Teil dieser Magisterarbeit eingehend vorgestellt. Für die Projektstudie werden aus diesem Korpus Informationen zur Häufigkeit herangezogen, um spätere Kategorisierungen auf eine möglichst quantitativ-valide Basis zu stellen. Herr Ferschke zeigt die wichtigsten Unterschiede zwischen korpuslinguistischen Herangehensweisen auf der einen Seite sowie strukturalistischen Untersuchungen und solchen, die der generativen Schule angehören, auf der anderen Seite auf. Er schließt seine Betrachtungen zu einer syntaktisch orientierten Angehensweise auf der Basis von patterns ab, die durch häufige syntaktische Muster auf bestimmte semantische Relationen schließen lassen (können). Der Verfasser stellt exemplarisch dar, wie diese patterns in einen CQLquery integriert werden können. Ebenso zeigt Herr Ferschke anhand von möglichen Konstituenten der Nominal- bzw. Präpositionalphrase, wie diese durch automatische Prozeduren im BNC identifiziert werden können. Das vierte Kapitel der vorliegenden Magisterarbeit widmet sich der Projektstudie. Es geht darum, dass Erkenntnisse der Prototypentheorie auf die Struktur von WordNet angewendet werden sollen. Mit Hilfe selbst entwickelter Software wird der Versuch unternommen, bestimmte kognitiv-relevante Ebenen der semantischen Beschreibungen zu identifizieren. Herr Ferschke verfolgt das Ziel, basic level objects innerhalb der Hierarchien von WordNet durch semi-automatische Prozeduren herauszufiltern. Seine Studie besteht aus zwei Teilen: In einem ersten voll automatischen Teil werden Wörter, die bestimmte semantische und quantitative Kriterien erfüllen, durch automatische Prozeduren identifiziert. Diese basic level objects werden im zweiten Teil des Projekts von Probanden in Bezug auf ihre Eigenschaften bewertet. Der Verfasser hat drei unterschiedliche semantische Bereiche ausgewählt, zu denen basic level objects bestimmt werden sollen: athletics, furniture, vehicle. In seinen Auswertungen stellt Herr Ferschke dar, welche potentiellen basic level objects von den Teilnehmern der Studie als solche ausgewählt wurden. Dabei werden sowohl Probleme angesprochen, die den Aufbau von WordNet betreffen und dadurch einen wesentlichen Einfluss auf die Auswahl der Wörter als basic level objects haben können. Ein zweites Problem, welches Herr Ferschke, diskutiert, ist die Sprachkompetenz der Probanden. Ein weiteres - vom Verfasser nicht genanntes Problem - besteht darin, inwiefern eine vorgegebene Wortdefinition die Bewertung der Studienteilnehmer beeinflusst hat. Ein nicht unwesentlicher Teil der Magisterarbeit besteht in der Konzeption und Umsetzung der Software für die Projektstudie. Dafür sind nicht nur detaillierte Kenntnisse aus dem Bereich der Informatik notwendig, sondern auch ein fundiertes Wissen im Bereich der Linguistik. Durch den Aufbau des Projekts macht Herr Ferschke sehr eindringlich klar, dass er beide Gebiete sehr gut beherrscht. Die vorliegende Arbeit ist aus linguistischer Sicht absolut fundiert und hervorragend dargestellt. Sie umfasst ein breites Spektrum linguistischer Theorien und Erklärungsmodelle und stellt die für dieses Thema wichtigen Aspekte umfassend dar. Die computerlinguistische Komponente ist ebenfalls als sehr gut zu beurteilen, zumal eine Verknüpfung zwischen der Prototypentheorie auf der einen Seite und WordNet auf der anderen Seite nicht ganz einfach ist. Das Problem liegt in erster Linie darin, die gegebene Struktur von WordNet für Aspekte der Prototypentheorie nutzbar zu machen. Dies ist Oliver Ferschke ohne Zweifel gelungen. Die vorliegende Magisterarbeit verdient die Note 'sehr gut' (1,0).

Page generated in 0.0439 seconds