Global ETD Search

1	Combining text-based and vision-based semantics / Combining text-based and vision-based semantics Tran, Binh Giang January 2011 (has links) Learning and representing semantics is one of the most important tasks that significantly contribute to some growing areas, as successful stories in the recent survey of Turney and Pantel (2010). In this thesis, we present an in- novative (and first) framework for creating a multimodal distributional semantic model from state of the art text-and image-based semantic models. We evaluate this multimodal semantic model on simulating similarity judgements, concept clustering and the newly introduced BLESS benchmark. We also propose an effective algorithm, namely Parameter Estimation, to integrate text- and image- based features in order to have a robust multimodal system. By experiments, we show that our technique is very promising. Across all experiments, our best multimodal model claims the first position. By relatively comparing with other text-based models, we are justified to affirm that our model can stay in the top line with other state of the art models. We explore various types of visual features including SIFT and other color SIFT channels in order to have prelim- inary insights about how computer-vision techniques should be applied in the natural language processing domain. Importantly, in this thesis, we show evi- dences that adding visual features (as the perceptual information coming from...
2	Semantic disambiguation using Distributional Semantics / Semantic disambiguation using Distributional Semantics Prodanovic, Srdjan January 2012 (has links) Ve statistických modelů sémantiky jsou významy slov pouze na základě jejich distribuční vlastnosti.Základní zdroj je zde jeden slovník, který lze použít pro různé úkoly, kde se význam slov reprezentovány jako vektory v vektorového prostoru, a slovní podoby jako vzdálenosti mezi jejich vektorových osobnosti. Pomocí silných podobnosti, může vhodnost podmínek uvedených zejména v souvislosti se vypočítá a používá pro celou řadu úkolů, jeden z nich je slovo smysl Disambiguation. V této práci bylo vyšetřeno několik různých přístupů k modelům z vektorového prostoru a prováděny tak, aby k překročení vyhodnocení vlastního výkonu na Word Sense disambiguation úkolem Prague Dependency Treebank.
3	Unsupervised learning for text-to-speech synthesis Watts, Oliver Samuel January 2013 (has links) This thesis introduces a general method for incorporating the distributional analysis of textual and linguistic objects into text-to-speech (TTS) conversion systems. Conventional TTS conversion uses intermediate layers of representation to bridge the gap between text and speech. Collecting the annotated data needed to produce these intermediate layers is a far from trivial task, possibly prohibitively so for languages in which no such resources are in existence. Distributional analysis, in contrast, proceeds in an unsupervised manner, and so enables the creation of systems using textual data that are not annotated. The method therefore aids the building of systems for languages in which conventional linguistic resources are scarce, but is not restricted to these languages. The distributional analysis proposed here places the textual objects analysed in a continuous-valued space, rather than specifying a hard categorisation of those objects. This space is then partitioned during the training of acoustic models for synthesis, so that the models generalise over objects' surface forms in a way that is acoustically relevant. The method is applied to three levels of textual analysis: to the characterisation of sub-syllabic units, word units and utterances. Entire systems for three languages (English, Finnish and Romanian) are built with no reliance on manually labelled data or language-specific expertise. Results of a subjective evaluation are presented.
4	Shlukování textových dokumentů a jejich částí / Shlukování textových dokumentů a jejich částí Zápotocký, Radoslav January 2011 (has links) This thesis analyses use of vector-space model and data clustering approaches on parts of single document - on chapters, paragraphs and sentences. A simulation application (SimDIS), written in C# programming language is also part of this thesis. The application implements the adjusted model and provides tools for visualization of vectors and clusters.
5	Shlukování textových dokumentů a jejich částí / Shlukování textových dokumentů a jejich částí Zápotocký, Radoslav January 2011 (has links) This thesis analyses use of vector-space model and data clustering approaches on parts of single document - on chapters, paragraphs and sentences - to allow simple navigation between similar parts. A simulation application (SimDIS), written in C# programming language is also part of this thesis. The application implements the described model and provides tools for visualization of vectors and clusters.
6	Aspect Mining Using Model-Based Clustering Rand McFadden, Renata 01 January 2011 (has links) Legacy systems contain critical and complex business code that has been in use for a long time. This code is difficult to understand, maintain, and evolve, in large part due to crosscutting concerns: software system features, such as persistence, logging, and error handling, whose implementation is spread across multiple modules. Aspect-oriented techniques separate crosscutting concerns from the base code, using separate modules called aspects and, thus, simplifying the legacy code. Aspect mining techniques identify aspect candidates so that the legacy code can be refactored into aspects. This study investigated an automated aspect mining method in which a vector-space model clustering approach was used with model-based clustering. The vector-space model clustering approach has been researched for aspect mining using a number of different heuristic clustering methods and producing mixed results. Prior to this study, this model had not been researched with model-based algorithms, even though they have grown in popularity because they lend themselves to statistical analysis and show results that are as good as or better than heuristic clustering methods. This study investigated the effectiveness of model-based clustering for identifying aspects when compared against heuristic methods, such as k-means clustering and agglomerative hierarchical clustering, using six different vector-space models. The study's results indicated that model-based clustering can, in fact, be more effective than heuristic methods and showed good promise for aspect mining. In general, model-based algorithms performed better in not spreading the methods of the concerns across the multiple clusters but did not perform as well in not mixing multiple concerns in the same cluster. Model-based algorithms were also significantly better at partitioning the data such that, given an ordered list of clusters, fewer clusters and methods would need to be analyzed to find all the concerns. In addition, model-based algorithms automatically determined the optimal number of clusters, which was a great advantage over heuristic-based algorithms. Lastly, the study found that the new vector-space models performed better, relative to aspect mining, than previously defined vector-space models. aspect mining aspect-oriented programming crosscutting concerns model-based clustering vector-space model Computer Sciences
7	Grid-Enabled Automatic Web Page Classification Metikurke, Seema Sreenivasamurthy 12 June 2006 (has links) Much research has been conducted on the retrieval and classification of web-based information. A big challenge is the performance issue, especially for a classification algorithm returning results for a large set of data that is typical when accessing the Web. This thesis describes a grid-enabled approach for automatic web page classification. The basic approach is first described that uses a vector space model (VSM). An enhancement of the approach through the use of a genetic algorithm (GA) is then described. The enhanced approach can efficiently process candidate web pages from a number of web sites and classify them. A prototype is implemented and empirical studies are conducted. The contributions of this thesis are: 1) Application of grid computing to improve performance of both VSM and GA using VSM based web page classification; 2) Improvement of the VSM classification algorithm by applying GA that uniquely discovers a set of training web pages while also generating a near optimal parameter values set for VSM. Automatic Web Page Classification Vector Space Model Genetic Algorithm Grid Computing Computer Sciences
8	Search Queries in an Information Retrieval System for Arabic-Language Texts Albujasim, Zainab Majeed 01 January 2014 (has links) Information retrieval aims to extract from a large collection of data a subset of information that is relevant to user’s needs. In this study, we are interested in information retrieval in Arabic-Language text documents. We focus on the Arabic language, its morphological features that potentially impact the implementation and performance of an information retrieval system and its unique characters that are absent in the Latin alphabet and require specialized approaches. Specifically, we report on the design, implementation and evaluation of the search functionality using the Vector Space Model with several weighting schemes. Our implementation uses the ISRI stemming algorithms as the underlying stemming technique and the general Arabic stop word list for building inverted indices for Arabic-language documents. We evaluate our implementation on a corpus consisting of selected technical papers published in Arabic-language journals. We use the Open Journal Systems (OJS) from the Public Knowledge Project as a repository for the corpus used in the evaluation. We evaluate the performance of our implementation of the search using a classic recall/precision approach and compare it to one of the default multilingual search functions supported in the OJS. Our experimental analysis suggests that stemming is an effective technique for searches in Arabic-language texts that improves the quality of the information retrieval system. Open Journal Arabic language Vector Space Model information retrieval ranking schemes Databases and Information Systems
9	Nyckelordssökning : Baserat på Vector Space Model / Keyword search : Based on Vector Space Model Borg, Oskar January 2013 (has links) Då mängden information bara ökar, så ökar även behovet att ha åtkomst till informationen lättillgängligt. Detta skapar då ett behov för ett gränssnitt som kan söka bland informationen. I detta arbete har det undersökts om en implementation av Vector Space Model ger mera relevanta resultat jämfört mot en enklare implementation som inte baseras på Vector Space Model. Sökningen utförs i en relationsdatabas med ett inverterat index, databasen fylls med data ifrån internetforumet Stack Overflow. Genom att bygga en sökmotor som returnerade två olika resultatlistor för varje sökning så fick tio användare testa och utvärdera resultatens relevans. Resultatet av testerna visade att Vector Space Model ger mer relevanta resultat dock till en kostnad av söktiden. Vector Space Model Inventerad index Nyckelordssökning Sökmotor Computer Sciences Datavetenskap (datalogi)
10	Authorship classification using the Vector Space Model and kernel methods Westin, Emil January 2020 (has links) Authorship identification is the field of classifying a given text by its author based on the assumption that authors exhibit unique writing styles. This thesis investigates the semantic shortcomings of the vector space model by constructing a semantic kernel created from WordNet which is evaluated on the problem of authorship attribution. A multiclass SVM classifier is constructed using the one-versus-all strategy and evaluated in terms of precision, recall, accuracy and F1 scores. Results show that the use of the semantic scores from WordNet degrades the performance compared to using a linear kernel. Experiments are run to identify the best feature engineering configurations, showing that removing stopwords has a positive effect on the financial dataset Reuters while the Kaggle dataset consisting of short extracts of horror stories benefit from keeping the stopwords. vector space model semantic kernel support vector machine bag-of-words Probability Theory and Statistics Sannolikhetsteori och statistik

Search results