41 |
Cross-lingual Information Retrieval On Turkish And English TextsBoynuegri, Akif 01 April 2010 (has links) (PDF)
In this thesis, cross-lingual information retrieval (CLIR) approaches are comparatively evaluated
for Turkish and English texts. As a complementary study, knowledge-based methods
for word sense disambiguation (WSD), which is one of the most important parts of the CLIR
studies, are compared for Turkish words.
Query translation and sense indexing based CLIR approaches are used in this study. In query
translation approach, we use automatic and manual word sense disambiguation methods and
Google translation service during translation of queries. In sense indexing based approach,
documents are indexed according to meanings of words instead of words themselves. Retrieval
of documents is performed according to meanings of the query words as well. During
the identification of intended meaning of query terms, manual and automatic word sense disambiguation
methods are used and compared to each other.
Knowledge based WSD methods that use different gloss enrichment techniques are compared
for Turkish words. Turkish WordNet is used as a primary knowledge base and English
WordNet and Turkish Wikipedia are employed as enrichment resources. Meanings of
words are more clearly identified by using semantic relations defined in WordNets and Turkish
Wikipedia. Also, during calculation of semantic relatedness of senses, cosine similarity
metric is used as an alternative metric to word overlap count. Effects of using cosine similarity
metric are observed for each WSD methods that use different knowledge bases.
|
42 |
Retrieving information from heterogeneous freight data sources to answer natural language queriesSeedah, Dan Paapanyin Kofi 09 February 2015 (has links)
The ability to retrieve accurate information from databases without an extensive knowledge of the contents and organization of each database is extremely beneficial to the dissemination and utilization of freight data. The challenges, however, are: 1) correctly identifying only the relevant information and keywords from questions when dealing with multiple sentence structures, and 2) automatically retrieving, preprocessing, and understanding multiple data sources to determine the best answer to user’s query. Current named entity recognition systems have the ability to identify entities but require an annotated corpus for training which in the field of transportation planning does not currently exist. A hybrid approach which combines multiple models to classify specific named entities was therefore proposed as an alternative. The retrieval and classification of freight related keywords facilitated the process of finding which databases are capable of answering a question. Values in data dictionaries can be queried by mapping keywords to data element fields in various freight databases using ontologies. A number of challenges still arise as a result of different entities sharing the same names, the same entity having multiple names, and differences in classification systems. Dealing with ambiguities is required to accurately determine which database provides the best answer from the list of applicable sources. This dissertation 1) develops an approach to identify and classifying keywords from freight related natural language queries, 2) develops a standardized knowledge representation of freight data sources using an ontology that both computer systems and domain experts can utilize to identify relevant freight data sources, and 3) provides recommendations for addressing ambiguities in freight related named entities. Finally, the use of knowledge base expert systems to intelligently sift through data sources to determine which ones provide the best answer to a user’s question is proposed. / text
|
43 |
Word meaning in context as a paraphrase distribution : evidence, learning, and inferenceMoon, Taesun, Ph. D. 25 October 2011 (has links)
In this dissertation, we introduce a graph-based model of instance-based, usage meaning that is cast as a problem of probabilistic inference. The main aim of this model is to provide a flexible platform that can be used to explore multiple hypotheses about usage meaning computation. Our model takes up and extends the proposals of Erk and Pado [2007] and McCarthy and Navigli [2009] by representing usage meaning as a probability distribution over potential paraphrases. We use undirected graphical models to infer this probability distribution for every content word in a given sentence. Graphical models represent complex probability distributions through a graph. In the graph, nodes stand for random variables, and edges stand for direct probabilistic interactions between them. The lack of edges between any two variables reflect independence assumptions. In our model, we represent each content word of the sentence through two adjacent nodes: the observed node represents the surface form of the word itself, and the hidden node represents its usage meaning. The distribution over values that we infer for the hidden node is a paraphrase distribution for the observed word. To encode the fact that lexical semantic information is exchanged between syntactic neighbors, the graph contains edges that mirror the dependency graph for the sentence. Further knowledge sources that influence the hidden nodes are represented through additional edges that, for example, connect to document topic. The integration of adjacent knowledge sources is accomplished in a standard way by multiplying factors and marginalizing over variables.
Evaluating on a paraphrasing task, we find that our model outperforms the current state-of-the-art usage vector model [Thater et al., 2010] on all parts of speech except verbs, where the previous model wins by a small margin. But our main focus is not on the numbers but on the fact that our model is flexible enough to encode different hypotheses about usage meaning computation. In particular, we concentrate on five questions (with minor variants):
- Nonlocal syntactic context: Existing usage vector models only use a word's direct syntactic neighbors for disambiguation or inferring some other meaning representation. Would it help to have contextual information instead "flow" along the entire dependency graph, each word's inferred meaning relying on the paraphrase distribution of its neighbors?
- Influence of collocational information: In some cases, it is intuitively plausible to use the selectional preference of a neighboring word towards the target to determine its meaning in context. How does modeling selectional preferences into the model affect performance?
- Non-syntactic bag-of-words context: To what extent can non-syntactic information in the form of bag-of-words context help in inferring meaning?
- Effects of parametrization: We experiment with two transformations of MLE. One interpolates various MLEs and another transforms it by exponentiating pointwise mutual information. Which performs better?
- Type of hidden nodes: Our model posits a tier of hidden nodes immediately adjacent the surface tier of observed words to capture dynamic usage meaning. We examine the model based on by varying the hidden nodes such that in one the nodes have actual words as values and in the other the nodes have nameless indexes as values. The former has the benefit of interpretability while the latter allows more standard parameter estimation.
Portions of this dissertation are derived from joint work between the author and Katrin Erk [submitted]. / text
|
44 |
Η χρήση σημασιολογικών δικτύων για τη διαχείριση του περιεχομένου του παγκόσμιου ιστού / Managing the web content through the use of semantic networksΣτάμου, Σοφία 25 June 2007 (has links)
Η παρούσα διατριβή πραγματεύεται την ενσωμάτωση ενός σημασιολογικού δικτύου λημμάτων σ’ ένα σύνολο εφαρμογών Διαδικτύου για την αποτελεσματική διαχείριση του περιεχομένου του Παγκόσμιου Ιστού. Τα δίκτυα σημασιολογικά συσχετισμένων λημμάτων αποτελούν ένα είδος ηλεκτρονικών λεξικών στα οποία καταγράφεται σημασιολογική πληροφορία για τα λήμματα που περιλαμβάνουν, όπου τα τελευταία αποθηκεύονται σε μια δενδρική δομή δεδομένων. Ο τρόπος δόμησης του περιεχομένου των σημασιολογικών δικτύων παρουσιάζει αρκετές ομοιότητες με την οργάνωση που ακολουθούν οι ιστοσελίδες στον Παγκόσμιο Ιστό, με αποτέλεσμα τα σημασιολογικά δίκτυα να αποτελούν έναν σημασιολογικό πόρο άμεσα αξιοποιήσιμο από ένα πλήθος εφαρμογών Διαδικτύου που καλούνται να διαχειριστούν αποδοτικά το πλήθος των δεδομένων που διακινούνται στον Παγκόσμιο Ιστό. Μετά από επισκόπηση των τεχνικών που παρουσιάζονται στη διεθνή βιβλιογραφία για τη διαχείριση του περιεχομένου του Παγκόσμιου Ιστού, προτείνεται και υλοποιείται ένα πρότυπο μοντέλο διαχείρισης ιστοσελίδων, το οποίο κάνοντας εκτεταμένη χρήση ενός εμπλουτισμένου σημασιολογικού δικτύου λημμάτων, εντοπίζει εννοιολογικές ομοιότητες μεταξύ του περιεχομένου διαφορετικών ιστοσελίδων και με βάση αυτές επιχειρεί και κατορθώνει την αυτοματοποιημένη και αποδοτική δεικτοδότηση, κατηγοριοποίηση και ταξινόμηση του πλήθους των δεδομένων του Παγκόσμιου Ιστού. Για την επίδειξη του μοντέλου διαχείρισης ιστοσελίδων που παρουσιάζεται, υιοθετούμε το μοντέλο πλοήγησης στους θεματικούς καταλόγους του Παγκόσμιου Ιστού και καταδεικνύουμε πειραματικά τη συμβολή των σημασιολογικών δικτύων σε όλα τα στάδια της δημιουργίας θεματικών καταλόγων Διαδικτύου. Συγκεκριμένα, εξετάζεται η συνεισφορά των σημασιολογικών δικτύων: (i) στον ορισμό και εμπλουτισμό των θεματικών κατηγοριών των καταλόγων του Παγκόσμιου Ιστού, (ii) στην επεξεργασία και αποσαφήνιση του περιεχομένου των ιστοσελίδων, (iii) στον αυτόματο εμπλουτισμό των θεματικών κατηγοριών ενός δικτυακού καταλόγου, (iv) στην ταξινόμηση των ιστοσελίδων που έχουν δεικτοδοτηθεί στις αντίστοιχες θεματικές κατηγορίες ενός καταλόγου, (v) στη διαχείριση των περιεχομένων των θεματικών καταλόγων με τρόπο που να διασφαλίζει την παροχή χρήσιμων ιστοσελίδων προς τους χρήστες, και τέλος (vi) στην αναζήτηση πληροφορίας στους θεματικούς καταλόγους του Παγκόσμιου Ιστού. Η επιτυχία του προτεινόμενου μοντέλου επιβεβαιώνεται από τα αποτελέσματα ενός συνόλου πειραματικών εφαρμογών που διενεργήθηκαν στο πλαίσιο της παρούσας διατριβής, όπου καταδεικνύεται η συμβολή των σημασιολογικών δικτύων στην αποτελεσματική διαχείριση των πολυάριθμων και δυναμικά μεταβαλλόμενων ιστοσελίδων του Παγκόσμιου Ιστού. Η σπουδαιότητα του προτεινόμενου μοντέλου διαχείρισης ιστοσελίδων, έγκειται στο ότι, εκτός από αυτόνομο εργαλείο διαχείρισης και οργάνωσης ιστοσελίδων, συνιστά το πρώτο επίπεδο επεξεργασίας σε ευρύτερο πεδίο εφαρμογών, όπως είναι η εξαγωγή περιλήψεων, η εξόρυξη πληροφορίας, η θεματικά προσανατολισμένη προσκομιδή ιστοσελίδων, ο υπολογισμός του ρυθμού μεταβολής των δεδομένων του Παγκόσμιου Ιστού, η ανίχνευση ιστοσελίδων με παραποιημένο περιεχόμενο, κτλ. / This dissertation addresses the incorporation of a semantic network into a set of Web-based applications for the effective management of Web content. Semantic networks are a kind of machine readable dictionaries, which encode semantic information for the lemmas they contain, where the latter are stored in a tree structure. Semantic networks store their contents in a similar way to the organization that Web pages exhibit on the Web graph; a feature that makes semantic networks readily usable by several Web applications that aim at the efficient management of the proliferating and constantly changing Web data. After an overview of the techniques that have been employed for managing the Web content, we propose and implement a novel Web data management model, which relies on an enriched semantic network for locating semantic similarities in the context of distinct Web pages. Based on these similarities, our model attempts and successfully achieves the automatic and effective indexing, categorization and ranking of the numerous pages that are available on the Web. For demonstrating the potential of our Web data management model, we adopt the navigation model in Web thematic directories and we experimentally show the contribution of semantic networks throughout the construction of Web catalogs. More specifically, we study the contribution of semantic networks in: (i) determining and enriching the thematic categories of Web directories, (ii) processing and disambiguating the contents of Web pages, (iii) automatically improving the thematic categories of Web directories, (iv) ordering Web pages that have been assigned in the respective categories of a Web directory, (v) managing the contents of Web directories in a way that ensures the availability of useful Web data to the directories’ users, and (vi) searching for information in the contents of Web directories. The contribution of our model is certified by the experimental results that we obtained from a numerous of testing applications that we run in the framework of our study. Obtained results demonstrate the contribution of semantic networks in the effective management of the dynamically evolving Web content. The practical outcome of the research presented herein, besides offering a fully-fledge infrastructure for the efficient manipulation and organization of the Web data, it can play a key role in the development of numerous applications, such as text summarization, information extraction, topical-focused crawling, measuring the Web’s evolution, spam detection, and so forth.
|
45 |
Using web texts for word sense disambiguationWang, Yuanyong, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links)
In all natural languages, ambiguity is a universal phenomenon. When a word has multiple meaning depending on its contexts it is called an ambiguous word. The process of determining the correct meaning of a word (formally named word sense) in a given context is word sense disambiguation(WSD). WSD is one of the most fundamental problems in natural language processing. If properly addressed, it could lead to revolutionary advancement in many other technologies such as text search engine technology, automatic text summarization and classification, automatic lexicon construction, machine translation and automatic learning agent technology. One difficulty that has always confronted WSD researchers is the lack of high quality sense specific information. For example, if the word "power" Immediately preceds the word "plant", it would strongly constrain the meaning of "plant" to be "an industrial facility". If "power" is replaced by the phrase "root of a", then the sense of "plant" is dictated to be "an organism" of the kingdom Planate. It is obvious that manually building a comprehensive sense specific information base for each sense of each word is impractical. Researchers also tried to extract such information from large dictionaries as well as manually sense tagged corpora. Most of the dictionaries used for WSD are not built for this purpose and have a lot of inherited peculiarities. While manual tagging is slow and costly, automatic tagging is not successful in providing a reliable performance. Furthermore, it is often the case that for a randomly chosen word (to be disambiguated), the sense specific context corpora that can be collected from dictionaries are not large enough. Therefore, manually building sense specific information bases or extraction of such information from dictionaries are not effective approaches to obtain sense specific information. A web text, due to its vast quantity and wide diversity, becomes an ideal source for extraction of large quantity of sense specific information. In this thesis, the impacts of Web texts on various aspects of WSD has been investigated. New measures and models are proposed to tame enormous amount of Web texts for the purpose of WSD. They are formally evaluated by experimenting their disambiguation performance on about 70 ambiguous nouns. The results are very encouraging and have helped revealing the great potential of using Web texts for WSD. The results are published in three papers at Australia national and international level (Wang&Hoffmann,2004,2005,2006)[42][43][44].
|
46 |
Interopérabilité Sémantique Multi-lingue des Ressources Lexicales en Données Liées Ouvertes / Semantic Interoperability of Multilingual Lexical Resources in Lexical Linked DataTchechmedjiev, Andon 14 October 2016 (has links)
Lorsqu’il s’agit la construction de ressources lexico-sémantiques multilingues, la première chose qui vient à l’esprit, et la nécessité que les ressources à alignées partagent le même format de données et la même représentations (interopérabilité représentationnelle). Avec l’apparition de standard tels que LMF et leur adaptation au web sémantique pour la production de ressources lexico- sémantiques multilingues en tant que données lexicales liées ouvertes (Ontolex), l’interopérabilité représentationnelle n’est plus un verrou majeur. Cependant, en ce qui concerne l’interopérabilité des alignements multilingues, le choix et la construction du pivot interlingue est l’un des obstacles principaux. Pour nombre de ressources (par ex. BabelNet, EuroWordNet), le choix est fait d’utiliser l’Anglais, ou une autre langue comme pivot interlingue. Ce choix mène à une perte de contraste dans les cas où des sens du Pivot ont des lexicalisations différentes dans la même acception dans plusieurs autres langues. L’utilisation d’une pivot à acceptions interlingues, solution proposée il y a déjà plus de 20 ans, pourrait être viable. Néanmoins, leur construction manuelle est trop ardue du fait du manque d’experts parlant assez de langues et leur construction automatique pose problème du fait de l’absence d’une formalisation et d’une caractérisation axiomatique permettant de garantir leur propriétés. Nous proposons dans cette thèse de d’abord formaliser l’architecture à pivot interlingue par acceptions, en développant une axiomatisation garantissant leurs propriétés. Nous proposons ensuite des algorithmes de construction initiale automatique en utilisant les propriétés combinatoires du graphe des alignements bilingues, mais aussi des algorithmes de mise à jour garantissant l’interopérabilité dynamique. Dans un deuxième temps, nous étudions de manière plus pratique sur DBNary, un extraction périodique de Wiktionary dans de nombreuses éditions de langues, afin de cerner les contraintes pratiques à l’application des algorithmes proposés. / When it comes to the construction of multilingual lexico-semantic resources, the first thing that comes to mind is that the resources we want to align, should share the same data model and format (representational interoperability). However, with the emergence of standards such as LMF and their implementation and widespread use for the production of resources as lexical linked data (Ontolex), representational interoperability has ceased to be a major challenge for the production of large-scale multilingual resources. However, as far as the interoperability of sense-level multi-lingual alignments is concerned, a major challenge is the choice of a suitable interlingual pivot. Many resources make the choice of using English senses as the pivot (e.g. BabelNet, EuroWordNet), although this choice leads to a loss of contrast between English senses that are lexicalized with a different words in other languages. The use of acception-based interlingual representations, a solution proposed over 20 years ago, could be viable. However, the manual construction of such language-independent pivot representations is very difficult due to the lack of expert speaking enough languages fluently and algorithms for their automatic constructions have never since materialized, mainly because of the lack of a formal axiomatic characterization that ensures the pre- servation of their correctness properties. In this thesis, we address this issue by first formalizing acception-based interlingual pivot architectures through a set of axiomatic constraints and rules that guarantee their correctness. Then, we propose algorithms for the initial construction and the update (dynamic interoperability) of interlingual acception-based multilingual resources by exploiting the combinatorial properties of pairwise bilingual translation graphs. Secondly, we study the practical considerations of applying our construction algorithms on a tangible resource, DBNary, a resource periodically extracted from Wiktionary in many languages in lexical linked data.
|
47 |
Desambiguação de autores em bibliotecas digitais utilizando redes sociais e programação genética / Author name disambiguation in digital libraries using social networks and genetic programmingLevin, Felipe Hoppe January 2010 (has links)
Bibliotecas digitais tornaram-se uma importante fonte de informação para comunidades científicas. Entretanto, por coletar dados de diferentes fontes, surge o problema de informações ambíguas ou duplicadas de nomes de autores. Métodos tradicionais de desambiguação de nomes utilizam informação sintática de atributos. Todavia, recentemente o uso de redes de relacionamentos, que traz informação semântica, tem sido estudado em desambiguação de dados. Em desambiguação de nomes de autores, relações de co-autoria podem ser usadas para criar uma rede social, que pode ser utilizada para melhorar métodos de desambiguação de nomes de autores. Esta dissertação apresenta um estudo do impacto de adicionar análise de redes sociais a métodos de desambiguação de nomes de autores baseados em informação sintática de atributos. Nós apresentamos uma abordagem de aprendizagem de máquina baseada em Programação Genética e a utilizamos para avaliar o impacto de adicionar análise de redes sociais a desambiguação de nomes de autores. Através de experimentos usando subconjuntos de bibliotecas digitais reais, nós demonstramos que o uso de análise de redes sociais melhora de forma significativa a qualidade dos resultados. Adicionalmente, nós demonstramos que as funções de casamento criadas por nossa abordagem baseada em Programação Genética são capazes de competir com métodos do estado da arte. / Digital libraries have become an important source of information for scientific communities. However, by gathering data from different sources, the problem of duplicate and ambiguous information about author names arises. Traditional methods of name disambiguation use syntactic attribute information. However, recently the use of relationship networks, which provides semantic information, has been studied in data disambiguation. In author name disambiguation, the co-authorship relations can be used to create a social network, which can be used to improve author name disambiguation methods. This dissertation presents a study of the impact of adding social network analysis to author name disambiguation methods based on syntactic attribute information. We present a machine learning approach based on Genetic Programming and use it to evaluate the impact of social network analysis in author name disambiguation. Through experiments using subsets of real digital libraries, we show that the use of social network analysis significantly improves the quality of results. Also, we demonstrate that match functions created by our Genetic Programming approach are able to compete with state-of-the-art methods.
|
48 |
Desambiguação de autores em bibliotecas digitais utilizando redes sociais e programação genética / Author name disambiguation in digital libraries using social networks and genetic programmingLevin, Felipe Hoppe January 2010 (has links)
Bibliotecas digitais tornaram-se uma importante fonte de informação para comunidades científicas. Entretanto, por coletar dados de diferentes fontes, surge o problema de informações ambíguas ou duplicadas de nomes de autores. Métodos tradicionais de desambiguação de nomes utilizam informação sintática de atributos. Todavia, recentemente o uso de redes de relacionamentos, que traz informação semântica, tem sido estudado em desambiguação de dados. Em desambiguação de nomes de autores, relações de co-autoria podem ser usadas para criar uma rede social, que pode ser utilizada para melhorar métodos de desambiguação de nomes de autores. Esta dissertação apresenta um estudo do impacto de adicionar análise de redes sociais a métodos de desambiguação de nomes de autores baseados em informação sintática de atributos. Nós apresentamos uma abordagem de aprendizagem de máquina baseada em Programação Genética e a utilizamos para avaliar o impacto de adicionar análise de redes sociais a desambiguação de nomes de autores. Através de experimentos usando subconjuntos de bibliotecas digitais reais, nós demonstramos que o uso de análise de redes sociais melhora de forma significativa a qualidade dos resultados. Adicionalmente, nós demonstramos que as funções de casamento criadas por nossa abordagem baseada em Programação Genética são capazes de competir com métodos do estado da arte. / Digital libraries have become an important source of information for scientific communities. However, by gathering data from different sources, the problem of duplicate and ambiguous information about author names arises. Traditional methods of name disambiguation use syntactic attribute information. However, recently the use of relationship networks, which provides semantic information, has been studied in data disambiguation. In author name disambiguation, the co-authorship relations can be used to create a social network, which can be used to improve author name disambiguation methods. This dissertation presents a study of the impact of adding social network analysis to author name disambiguation methods based on syntactic attribute information. We present a machine learning approach based on Genetic Programming and use it to evaluate the impact of social network analysis in author name disambiguation. Through experiments using subsets of real digital libraries, we show that the use of social network analysis significantly improves the quality of results. Also, we demonstrate that match functions created by our Genetic Programming approach are able to compete with state-of-the-art methods.
|
49 |
Desambiguação de autores em bibliotecas digitais utilizando redes sociais e programação genética / Author name disambiguation in digital libraries using social networks and genetic programmingLevin, Felipe Hoppe January 2010 (has links)
Bibliotecas digitais tornaram-se uma importante fonte de informação para comunidades científicas. Entretanto, por coletar dados de diferentes fontes, surge o problema de informações ambíguas ou duplicadas de nomes de autores. Métodos tradicionais de desambiguação de nomes utilizam informação sintática de atributos. Todavia, recentemente o uso de redes de relacionamentos, que traz informação semântica, tem sido estudado em desambiguação de dados. Em desambiguação de nomes de autores, relações de co-autoria podem ser usadas para criar uma rede social, que pode ser utilizada para melhorar métodos de desambiguação de nomes de autores. Esta dissertação apresenta um estudo do impacto de adicionar análise de redes sociais a métodos de desambiguação de nomes de autores baseados em informação sintática de atributos. Nós apresentamos uma abordagem de aprendizagem de máquina baseada em Programação Genética e a utilizamos para avaliar o impacto de adicionar análise de redes sociais a desambiguação de nomes de autores. Através de experimentos usando subconjuntos de bibliotecas digitais reais, nós demonstramos que o uso de análise de redes sociais melhora de forma significativa a qualidade dos resultados. Adicionalmente, nós demonstramos que as funções de casamento criadas por nossa abordagem baseada em Programação Genética são capazes de competir com métodos do estado da arte. / Digital libraries have become an important source of information for scientific communities. However, by gathering data from different sources, the problem of duplicate and ambiguous information about author names arises. Traditional methods of name disambiguation use syntactic attribute information. However, recently the use of relationship networks, which provides semantic information, has been studied in data disambiguation. In author name disambiguation, the co-authorship relations can be used to create a social network, which can be used to improve author name disambiguation methods. This dissertation presents a study of the impact of adding social network analysis to author name disambiguation methods based on syntactic attribute information. We present a machine learning approach based on Genetic Programming and use it to evaluate the impact of social network analysis in author name disambiguation. Through experiments using subsets of real digital libraries, we show that the use of social network analysis significantly improves the quality of results. Also, we demonstrate that match functions created by our Genetic Programming approach are able to compete with state-of-the-art methods.
|
50 |
Sémantická informace ze sítě FrameNet a možnosti jejího využití pro česká data / Semantic information from FrameNet and the possibility of its transfer to Czech dataLimburská, Adéla January 2016 (has links)
The thesis focuses on transferring FrameNet annotation from English to Czech and the possibilities of using the resulting data for automatic frame prediction in Czech. The first part, annotation transfer, has been performed in two ways. First, a parallel corpus of English sentences and their human created Czech translations (PCEDT) was used. Second, a much larger parallel corpus was created using ma- chine translation of FrameNet example sentences. This corpus was then used to transfer the annotation as well. The resulting data were partially evaluated and some of the automatically detectable errors were filtered out. Subsequently, the data were used as an input for two machine learning methods, decision trees and support vector machines. Since neither of the machine learning experiments brought impressive results, further manual correction of the data annotation was performed, which helped increase the accuracy of the prediction. However, as the accuracy reported in related papers is notably higher, the thesis also discusses dif- ferent approaches to feature selection and the possibility of further improvement of the prediction results using these methods. 1
|
Page generated in 0.0935 seconds