Global ETD Search

1	Measuring Semantic Relatedness Using Salient Encyclopedic Concepts Hassan, Samer 08 1900 (has links) While pragmatics, through its integration of situational awareness and real world relevant knowledge, offers a high level of analysis that is suitable for real interpretation of natural dialogue, semantics, on the other end, represents a lower yet more tractable and affordable linguistic level of analysis using current technologies. Generally, the understanding of semantic meaning in literature has revolved around the famous quote ``You shall know a word by the company it keeps''. In this thesis we investigate the role of context constituents in decoding the semantic meaning of the engulfing context; specifically we probe the role of salient concepts, defined as content-bearing expressions which afford encyclopedic definitions, as a suitable source of semantic clues to an unambiguous interpretation of context. Furthermore, we integrate this world knowledge in building a new and robust unsupervised semantic model and apply it to entail semantic relatedness between textual pairs, whether they are words, sentences or paragraphs. Moreover, we explore the abstraction of semantics across languages and utilize our findings into building a novel multi-lingual semantic relatedness model exploiting information acquired from various languages. We demonstrate the effectiveness and the superiority of our mono-lingual and multi-lingual models through a comprehensive set of evaluations on specialized synthetic datasets for semantic relatedness as well as real world applications such as paraphrase detection and short answer grading. Our work represents a novel approach to integrate world-knowledge into current semantic models and a means to cross the language boundary for a better and more robust semantic relatedness representation, thus opening the door for an improved abstraction of meaning that carries the potential of ultimately imparting understanding of natural language to machines. semantic relatedness latent semantic analysis explicit semantic analysis Wikipedia sematic similarity salient semantic analysis
2	The Role of Function, Homogeneity and Syntax in Creative Performance on the Uses of Objects Task Forster, Evelyn 24 February 2009 (has links) The Uses of Objects Task is a widely used assessment of creative performance, but it relies on subjective scoring methods for evaluation. A new version of the task was devised using Latent Semantic Analysis (LSA), a computational tool used to measure semantic distance. 135 participants provided as many creative uses for as they could for 20 separate objects. Responses were analyzed for strategy use, category switching, variety, and originality of responses, as well as subjective measure of creativity by independent raters. The LSA originality measure was more reliable than the subjective measure, and values averaged over participants correlated with both subjective evaluations and self-assessment of creativity. The score appeared to successfully isolate the creativity of the people themselves, rather than the potential creativity afforded by a given object. latent semantic analysis syntax creativity 0633
3	The Role of Function, Homogeneity and Syntax in Creative Performance on the Uses of Objects Task Forster, Evelyn 24 February 2009 (has links) The Uses of Objects Task is a widely used assessment of creative performance, but it relies on subjective scoring methods for evaluation. A new version of the task was devised using Latent Semantic Analysis (LSA), a computational tool used to measure semantic distance. 135 participants provided as many creative uses for as they could for 20 separate objects. Responses were analyzed for strategy use, category switching, variety, and originality of responses, as well as subjective measure of creativity by independent raters. The LSA originality measure was more reliable than the subjective measure, and values averaged over participants correlated with both subjective evaluations and self-assessment of creativity. The score appeared to successfully isolate the creativity of the people themselves, rather than the potential creativity afforded by a given object. latent semantic analysis syntax creativity 0633
4	Semantic Search with Information Integration Xian, Yikun, Zhang, Liu January 2011 (has links) Since the search engine was first released in 1993, the development has never been slow down and various search engines emerged to vied for popularity. However, current traditional search engines like Google and Yahoo! are based on key words which lead to results impreciseness and information redundancy. A new search engine with semantic analysis can be the alternate solution in the future. It is more intelligent and informative, and provides better interaction with users. This thesis discusses the detail on semantic search, explains advantages of semantic search over other key-word-based search and introduces how to integrate semantic analysis with common search engines. At the end of this thesis, there is an example of implementation of a simple semantic search engine. Semantic Analysis Search Engine NLP J2EE
5	Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers Anaya, Leticia H. 12 1900 (has links) In the Information Age, a proliferation of unstructured text electronic documents exists. Processing these documents by humans is a daunting task as humans have limited cognitive abilities for processing large volumes of documents that can often be extremely lengthy. To address this problem, text data computer algorithms are being developed. Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) are two text data computer algorithms that have received much attention individually in the text data literature for topic extraction studies but not for document classification nor for comparison studies. Since classification is considered an important human function and has been studied in the areas of cognitive science and information science, in this dissertation a research study was performed to compare LDA, LSA and humans as document classifiers. The research questions posed in this study are: R1: How accurate is LDA and LSA in classifying documents in a corpus of textual data over a known set of topics? R2: How accurate are humans in performing the same classification task? R3: How does LDA classification performance compare to LSA classification performance? To address these questions, a classification study involving human subjects was designed where humans were asked to generate and classify documents (customer comments) at two levels of abstraction for a quality assurance setting. Then two computer algorithms, LSA and LDA, were used to perform classification on these documents. The results indicate that humans outperformed all computer algorithms and had an accuracy rate of 94% at the higher level of abstraction and 76% at the lower level of abstraction. At the high level of abstraction, the accuracy rates were 84% for both LSA and LDA and at the lower level, the accuracy rate were 67% for LSA and 64% for LDA. The findings of this research have many strong implications for the improvement of information systems that process unstructured text. Document classifiers have many potential applications in many fields (e.g., fraud detection, information retrieval, national security, and customer management). Development and refinement of algorithms that classify text is a fruitful area of ongoing research and this dissertation contributes to this area. Latent Dirichlet allocation latent semantic analysis classifiers
6	Social Tag-based Community Recommendation Using Latent Semantic Analysis Akther, Aysha 07 September 2012 (has links) Collaboration and sharing of information are the basis of modern social web system. Users in the social web systems are establishing and joining online communities, in order to collectively share their content with a group of people having common topic of interest. Group or community activities have increased exponentially in modern social Web systems. With the explosive growth of social communities, users of social Web systems have experienced considerable difficulty with discovering communities relevant to their interests. In this study, we address the problem of recommending communities to individual users. Recommender techniques that are based solely on community affiliation, may fail to find a wide range of proper communities for users when their available data are insufficient. We regard this problem as tag-based personalized searches. Based on social tags used by members of communities, we first represent communities in a low-dimensional space, the so-called latent semantic space, by using Latent Semantic Analysis. Then, for recommending communities to a given user, we capture how each community is relevant to both user’s personal tag usage and other community members’ tagging patterns in the latent space. We specially focus on the challenging problem of recommending communities to users who have joined very few communities or having no prior community membership. Our evaluation on two heterogeneous datasets shows that our approach can significantly improve the recommendation quality. Community Recommendations Latent Semantic Analysis Recommender Systems Social Community
7	Scene Analysis Using Scale Invariant Feature Extraction and Probabilistic Modeling Shen, Yao 08 1900 (has links) Conventional pattern recognition systems have two components: feature analysis and pattern classification. For any object in an image, features could be considered as the major characteristic of the object either for object recognition or object tracking purpose. Features extracted from a training image, can be used to identify the object when attempting to locate the object in a test image containing many other objects. To perform reliable scene analysis, it is important that the features extracted from the training image are detectable even under changes in image scale, noise and illumination. Scale invariant feature has wide applications such as image classification, object recognition and object tracking in the image processing area. In this thesis, color feature and SIFT (scale invariant feature transform) are considered to be scale invariant feature. The classification, recognition and tracking result were evaluated with novel evaluation criterion and compared with some existing methods. I also studied different types of scale invariant feature for the purpose of solving scene analysis problems. I propose probabilistic models as the foundation of analysis scene scenario of images. In order to differential the content of image, I develop novel algorithms for the adaptive combination for multiple features extracted from images. I demonstrate the performance of the developed algorithm on several scene analysis tasks, including object tracking, video stabilization, medical video segmentation and scene classification. Scale invariant feature probabilistic latent semantic analysis particle filter
8	Contributions to music semantic analysis and its acceleration techniques / Contributions à l'analyse sémantique de la musique et de ses techniques d'accélération Gao, Boyang 15 December 2014 (has links) La production et la diffusion de musique numérisée ont explosé ces dernières années. Une telle quantité de données à traiter nécessite des méthodes efficaces et rapides pour l’analyse et la recherche automatique de musique. Cette thèse s’attache donc à proposer des contributions pour l’analyse sémantique de la musique, et en particulier pour la reconnaissance du genre musical et de l’émotion induite (ressentie par l’auditoire), à l’aide de descripteurs de bas-niveau sémantique mais également de niveau intermédiaire. En effet, le genre musical et l’émotion comptent parmi les concepts sémantiques les plus naturels perçus par les auditoires. Afin d’accéder aux propriétés sémantiques à partir des descripteurs bas-niveau, des modélisations basées sur des algorithmes de types K-means et GMM utilisant des BoW et Gaussian super vectors ont été envisagées pour générer des dictionnaires. Compte-tenu de la très importante quantité de données à traiter, l’efficacité temporelle ainsi que la précision de la reconnaissance sont des points critiques pour la modélisation des descripteurs de bas-niveau. Ainsi, notre première contribution concerne l’accélération des méthodes K-means, GMM et UMB-MAP, non seulement sur des machines indépendantes, mais également sur des clusters de machines. Afin d’atteindre une vitesse d’exécution la plus importante possible sur une machine unique, nous avons montré que les procédures d’apprentissage des dictionnaires peuvent être réécrites sous forme matricielle pouvant être accélérée efficacement grâce à des infrastructures de calcul parallèle hautement performantes telle que les multi-core CPU ou GPU. En particulier, en s’appuyant sur GPU et un paramétrage adapté, nous avons obtenu une accélération de facteur deux par rapport à une implémentation single thread. Concernant le problème lié au fait que les données ne peuvent pas être stockées dans la mémoire d’une seul ordinateur, nous avons montré que les procédures d’apprentissage des K-means et GMM pouvaient être divisées par un schéma Map-Reduce pouvant être exécuté sur des clusters Hadoop et Spark. En utilisant notre format matriciel sur ce type de clusters, une accélération de 5 à 10 fois a pu être obtenue par rapport aux librairies d’accélération de l’état de l’art. En complément des descripteurs audio bas-niveau, des descripteurs de niveau sémantique intermédiaire tels que l’harmonie de la musique sont également très importants puisqu’ils intègrent des informations d’un niveau d’abstraction supérieur à celles obtenues à partir de la simple forme d’onde. Ainsi, notre seconde contribution consiste en la modélisation de l’information liée aux notes détectées au sein du signal musical, en utilisant des connaissances sur les propriétés de la musique. Cette contribution s’appuie sur deux niveaux de connaissance musicale : le son des notes des instruments ainsi que les statistiques de co-occurrence et de transitions entre notes. Pour le premier niveau, un dictionnaire musical constitué de notes d’instruments a été élaboré à partir du synthétiseur Midi de Logic Pro 9. Basé sur ce dictionnaire, nous avons proposé un algorithme « Positive Constraint Matching Pursuit » (PCMP) pour réaliser la décomposition de la musique. Pour le second niveau, nous avons proposé une décomposition parcimonieuse intégrant les informations de statistiques d’occurrence des notes ainsi que les probabilités de co-occurrence pour guider la sélection des atomes du dictionnaire musical et pour construire un graphe à candidats multiples pour proposer des choix alternatifs lors des sélections successives. Pour la recherche du chemin global optimal de succession des notes, les probabilités de transitions entre notes ont également été incorporées. […] / Digitalized music production exploded in the past decade. Huge amount of data drives the development of effective and efficient methods for automatic music analysis and retrieval. This thesis focuses on performing semantic analysis of music, in particular mood and genre classification, with low level and mid level features since the mood and genre are among the most natural semantic concepts expressed by music perceivable by audiences. In order to delve semantics from low level features, feature modeling techniques like K-means and GMM based BoW and Gaussian super vector have to be applied. In this big data era, the time and accuracy efficiency becomes a main issue in the low level feature modeling. Our first contribution thus focuses on accelerating k-means, GMM and UBM-MAP frameworks, involving the acceleration on single machine and on cluster of workstations. To achieve the maximum speed on single machine, we show that dictionary learning procedures can elegantly be rewritten in matrix format that can be accelerated efficiently by high performance parallel computational infrastructures like multi-core CPU, GPU. In particular with GPU support and careful tuning, we have achieved two magnitudes speed up compared with single thread implementation. Regarding data set which cannot fit into the memory of individual computer, we show that the k-means and GMM training procedures can be divided into map-reduce pattern which can be executed on Hadoop and Spark cluster. Our matrix format version executes 5 to 10 times faster on Hadoop and Spark clusters than the state-of-the-art libraries. Beside signal level features, mid-level features like harmony of music, the most natural semantic given by the composer, are also important since it contains higher level of abstraction of meaning beyond physical oscillation. Our second contribution thus focuses on recovering note information from music signal with musical knowledge. This contribution relies on two levels of musical knowledge: instrument note sound and note co-occurrence/transition statistics. In the instrument note sound level, a note dictionary is firstly built i from Logic Pro 9. With the musical dictionary in hand, we propose a positive constraint matching pursuit (PCMP) algorithm to perform the decomposition. In the inter-note level, we propose a two stage sparse decomposition approach integrated with note statistical information. In frame level decomposition stage, note co-occurrence probabilities are embedded to guide atom selection and to build sparse multiple candidate graph providing backup choices for later selections. In the global optimal path searching stage, note transition probabilities are incorporated. Experiments on multiple data sets show that our proposed approaches outperform the state-of-the-art in terms of accuracy and recall for note recovery and music mood/genre classification. Musique numérisée Analyse sémantique Digitalized music Semantic analysis
9	The role of symbolism in Tshivenda discourse : a semantic analysis Nengovhela, Rofhiwa Emmanuel January 2010 (has links) Thesis (M.A.) -- Unversity of Limpopo, 2010. / N/A Symbolism Tshivenda discourse Semantic analysis Venda language Symbolism
10	Social Tag-based Community Recommendation Using Latent Semantic Analysis Akther, Aysha 07 September 2012 (has links) Collaboration and sharing of information are the basis of modern social web system. Users in the social web systems are establishing and joining online communities, in order to collectively share their content with a group of people having common topic of interest. Group or community activities have increased exponentially in modern social Web systems. With the explosive growth of social communities, users of social Web systems have experienced considerable difficulty with discovering communities relevant to their interests. In this study, we address the problem of recommending communities to individual users. Recommender techniques that are based solely on community affiliation, may fail to find a wide range of proper communities for users when their available data are insufficient. We regard this problem as tag-based personalized searches. Based on social tags used by members of communities, we first represent communities in a low-dimensional space, the so-called latent semantic space, by using Latent Semantic Analysis. Then, for recommending communities to a given user, we capture how each community is relevant to both user’s personal tag usage and other community members’ tagging patterns in the latent space. We specially focus on the challenging problem of recommending communities to users who have joined very few communities or having no prior community membership. Our evaluation on two heterogeneous datasets shows that our approach can significantly improve the recommendation quality. Community Recommendations Latent Semantic Analysis Recommender Systems Social Community

Search results