Global ETD Search

21	Clustering Multilingual Documents: A Latent Semantic Indexing Based Approach Lin, Chia-min 09 February 2006 (has links) Document clustering automatically organizes a document collection into distinct groups of similar documents on the basis of their contents. Most of existing document clustering techniques deal with monolingual documents (i.e., documents written in one language). However, with the trend of globalization and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for multilingual document clustering (MLDC). Motivated by its significance and need, this study designs a Latent Semantic Indexing (LSI) based MLDC technique. Our empirical evaluation results show that the proposed LSI-based multilingual document clustering technique achieves satisfactory clustering effectiveness, measured by both cluster recall and cluster precision. Document clustering Latent semantic analysis Latent semantic indexing Text mining Document management Multilingual document clustering
22	Probabilistic Latent Semantic Analysis Based Framework For Hybrid Social Recommender Systems Eryol, Erkin 01 June 2010 (has links) (PDF) Today, there are user annotated internet sites, user interaction logs, online user communities which are valuable sources of information concerning the personalized recommendation problem. In the literature, hybrid social recommender systems have been proposed to reduce the sparsity of the usage data by integrating the user related information sources together. In this thesis, a method based on probabilistic latent semantic analysis is used as a framework for a hybrid social recommendation system. Different data hybridization approaches on probabilistic latent semantic analysis are experimented. Based on this flexible probabilistic model, network regularization and model blending approaches are applied on probabilistic latent semantic analysis model as a solution for social trust network usage throughout the collaborative filtering process. The proposed model has outperformed the baseline methods in our experiments. As a result of the research, it is shown that the proposed methods successfully model the rating and social trust data together in a theoretically principled way. QA Probabilities 273-274.76
23	Text Summarization Using Latent Semantic Analysis Ozsoy, Makbule Gulcin 01 February 2011 (has links) (PDF) Text summarization solves the problem of presenting the information needed by a user in a compact form. There are different approaches to create well formed summaries in literature. One of the newest methods in text summarization is the Latent Semantic Analysis (LSA) method. In this thesis, different LSA based summarization algorithms are explained and two new LSA based summarization algorithms are proposed. The algorithms are evaluated on Turkish and English documents, and their performances are compared using their ROUGE scores.
24	Lietuvių kalbos semantinių požymių lentelės valdymo programinė įranga / Lithuanian language semantic attributes tables ruling software Boiko, Irena 11 June 2004 (has links) The purpose of this paper covered execution of one stage of semantic analysis compiuterization by development of a software able to improve the guality of automated translation. Such software "Lexes", the browser and editor routine of Lithuanian words and related to such words semantic attributes. Informatics .NET environment Duomenų bazė VisualBasic.NET Database Semantiniai požymiai Semantic analysis .NET aplinka Semantika Semantic attributes
25	Unsupervised induction of semantic roles Lang, Joel January 2012 (has links) In recent years, a considerable amount of work has been devoted to the task of automatic frame-semantic analysis. Given the relative maturity of syntactic parsing technology, which is an important prerequisite, frame-semantic analysis represents a realistic next step towards broad-coverage natural language understanding and has been shown to benefit a range of natural language processing applications such as information extraction and question answering. Due to the complexity which arises from variations in syntactic realization, data-driven models based on supervised learning have become the method of choice for this task. However, the reliance on large amounts of semantically labeled data which is costly to produce for every language, genre and domain, presents a major barrier to the widespread application of the supervised approach. This thesis therefore develops unsupervised machine learning methods, which automatically induce frame-semantic representations without making use of semantically labeled data. If successful, unsupervised methods would render manual data annotation unnecessary and therefore greatly benefit the applicability of automatic framesemantic analysis. We focus on the problem of semantic role induction, in which all the argument instances occurring together with a specific predicate in a corpus are grouped into clusters according to their semantic role. Our hypothesis is that semantic roles can be induced without human supervision from a corpus of syntactically parsed sentences, by leveraging the syntactic relations conveyed through parse trees with lexical-semantic information. We argue that semantic role induction can be guided by three linguistic principles. The first is the well-known constraint that semantic roles are unique within a particular frame. The second is that the arguments occurring in a specific syntactic position within a specific linking all bear the same semantic role. The third principle is that the (asymptotic) distribution over argument heads is the same for two clusters which represent the same semantic role. We consider two approaches to semantic role induction based on two fundamentally different perspectives on the problem. Firstly, we develop feature-based probabilistic latent structure models which capture the statistical relationships that hold between the semantic role and other features of an argument instance. Secondly, we conceptualize role induction as the problem of partitioning a graph whose vertices represent argument instances and whose edges express similarities between these instances. The graph thus represents all the argument instances for a particular predicate occurring in the corpus. The similarities with respect to different features are represented on different edge layers and accordingly we develop algorithms for partitioning such multi-layer graphs. We empirically validate our models and the principles they are based on and show that our graph partitioning models have several advantages over the feature-based models. In a series of experiments on both English and German the graph partitioning models outperform the feature-based models and yield significantly better scores over a strong baseline which directly identifies semantic roles with syntactic positions. In sum, we demonstrate that relatively high-quality shallow semantic representations can be induced without human supervision and foreground a promising direction of future research aimed at overcoming the problem of acquiring large amounts of lexicalsemantic knowledge. 006.35
26	A Reference Architecture for Providing Latent Semantic Analysis Applications in Distributed Systems. Diploma Thesis Dietl, Reinhard 12 1900 (has links) (PDF) With the increasing availability of storage and computing power, Latent Semantic Analysis (LSA) has gained more and more significance in practice over the last decade. This diploma thesis aims to develop a reference architecture which can be utilised to provide LSA based applications in a distributed system. It outlines the underlying problems of generation, processing and storage of large data objects resulting from LSA operations, the problems arising from bringing LSA into a distributed context, suggests an architecture for the software components necessary to perform the tasks, and evaluates the applicability to real world scenarios, including the implementation of a classroom scenario as a proof-of-concept. (author's abstract) / Series: Theses / Institute for Statistics and Mathematics
27	Supporting students in the analysis of case studies for professional ethics education 2015 January 1900 (has links) Intelligent tutoring systems and computer-supported collaborative environments have been designed to enhance human learning in various domains. While a number of solid techniques have been developed in the Artificial Intelligence in Education (AIED) field to foster human learning in fundamental science domains, there is still a lack of evidence about how to support learning in so-called ill-defined domains that are characterized by the absence of formal domain theories, uncertainty about best solution strategies and teaching practices, and learners' answers represented through text and argumentation. This dissertation investigates how to support students' learning in the ill-defined domain of professional ethics through a computer-based learning system. More specifically, it examines how to support students in the analysis of case studies, which is a common pedagogical practice in the ethics domain. This dissertation describes our design considerations and a resulting system called Umka. In Umka learners analyze case studies individually and collaboratively that pose some ethical or professional dilemmas. Umka provides various types of support to learners in the analysis task. In the individual analysis it provides various kinds of feedback to arguments of learners based on predefined system knowledge. In the collaborative analysis Umka fosters learners' interactions and self-reflection through system suggestions and a specifically designed visualization. The system suggestions offer learners the chance to consider certain helpful arguments of their peers, or to interact with certain helpful peers. The visualization highlights similarities and differences between the learners' positions, and illustrates the learners' level of acceptance of each other's positions. This dissertation reports on a series of experiments in which we evaluated the effectiveness of Umka's support features, and suggests several research contributions. Through this work, it is shown that despite the ill-definedness of the ethics domain, and the consequent complications of text processing and domain modelling, it is possible to build effective tutoring systems for supporting students' learning in this domain. Moreover, the techniques developed through this research for the ethics domain can be readily expanded to other ill-defined domains, where argument, qualitative analysis, metacognition and interaction over case studies are key pedagogical practices. intelligent tutoring systems ill-defined domains case studies latent semantic analysis ethics education collaboration visualization
28	Deep Web Collection Selection King, John Douglas January 2004 (has links) The deep web contains a massive number of collections that are mostly invisible to search engines. These collections often contain high-quality, structured information that cannot be crawled using traditional methods. An important problem is selecting which of these collections to search. Automatic collection selection methods try to solve this problem by suggesting the best subset of deep web collections to search based on a query. A few methods for deep Web collection selection have proposed in Collection Retrieval Inference Network system and Glossary of Servers Server system. The drawback in these methods is that they require communication between the search broker and the collections, and need metadata about each collection. This thesis compares three different sampling methods that do not require communication with the broker or metadata about each collection. It also transforms some traditional information retrieval based techniques to this area. In addition, the thesis tests these techniques using INEX collection for total 18 collections (including 12232 XML documents) and total 36 queries. The experiment shows that the performance of sample-based technique is satisfactory in average. information retrieval deep web collection selection singular value decomposition latent semantic analysis sampling query focused probabilistic
29	Educational Technology: A Comparison of Ten Academic Journals and the New Media Consortium Horizon Reports for the Period of 2000-2017 Morel, Gwendolyn 12 1900 (has links) This exploratory and descriptive study provides an increased understanding of the topics being explored in both published research and industry reporting in the field of educational technology. Although literature in the field is plentiful, the task of synthesizing the information for practical use is a massive undertaking. Latent semantic analysis was used to review journal abstracts from ten highly respected journals and the New Media Consortium Horizon Reports to identify trends within the publications. As part of the analysis, 25 topics and technologies were identified in the combined corpus of academic journals and Horizon Reports. The journals tended to focus on pedagogical issues whereas the Horizon Reports tended to focus on technological aspects in education. In addition to differences between publication types, trends over time are also described. Findings may assist researchers, practitioners, administrators, and policy makers with decision-making in their respective educational areas. educational technology content analysis Education, Technology Educational technology. Communication in education.
30	Ανάπτυξη συστήματος παροχής συστάσεων με χρήση τεχνικών σημασιολογικής ανάλυσης Τουλιάτος, Γεράσιμος 09 July 2013 (has links) Εξαιτίας του μεγάλου όγκου δεδομένων που υπάρχουν στον Παγκόσμιο Ιστό, η ανεύρεση της επιθυμητής πληροφορίας από ένα χρήστη μπορεί να αποδειχθεί χρονοβόρα. Διάφορα συστήματα προσωποποιημένης αναζήτησης έχουν προταθεί κατά καιρούς για να διευκολύνουν την επίλυση του συγκεκριμένου προβλήματος. Στόχος της παρούσας εργασίας ήταν η μελέτη διάφορων τεχνικών βελτίωσης των αποτελεσμάτων μιας αναζήτησης και η ανάπτυξη ενός συστήματος που θα προβλέπει την πληροφοριακή ανάγκη ενός χρήστη και θα του προτείνει ένα σύνολο από σελίδες που πιθανόν να τον ικανοποιούν. Επειδή το Web αποτελεί ένα πολύ μεγάλο σύστημα, η μελέτη μας ξεκινάει από το επίπεδο ιστοτόπου. Για την ανάπτυξη του συστήματός μας θα κάνουμε χρήση σημασιολογικών τεχνικών ανάλυσης. Πιο συγκεκριμένα, με χρήση μιας οντολογίας θα χαρακτηρίσουμε εννοιολογικά τις σελίδες ενός ιστοτόπου και επιπλέον θα χρησιμοποιήσουμε την οντολογία για να εκφράσουμε την πληροφοριακή ανάγκη του χρήστη. Κατά την περιήγησή του στον ιστότοπο ο χρήστης επιλέγει εκείνους τους συνδέσμους που θεωρεί ότι το φέρνουν πιο κοντά στο στόχο του. Εμείς, χαρακτηρίζουμε κάθε υπερσύνδεσμο με έννοιες που συνδέονται με το περιεχόμενο της σελίδας στην οποία αυτός δείχνει. Επειδή, ο κάθε χρήστης αναπαριστά την πληροφορία με ένα δικό του δίκτυο εννοιών, υιοθετήσαμε μια οντολογία που συγκεντρώνει αυτό που ονομάζουμε “κοινή γνώση” για ένα θέμα. Χρησιμοποιώντας, τις έννοιες από τους υπερσυνδέσμους που επέλεξε ο χρήστης, τις σχέσεις μεταξύ των εννοιών της οντολογίας εκτιμούμε τις πιθανές έννοιες – στόχους του χρήστη και προσδιορίζουμε με αυτόν τον τρόπο την πληροφοριακή του ανάγκη. Τέλος, κατατάσσουμε τις σελίδες ως προς τη εννοιολογική τους συσχέτιση με τα ενδιαφέροντα του χρήστη και δημιουργούμε τις προτάσεις μας. / Due to the large volume of data available on the Web, finding the desired information can be time consuming. Various personalized search systems have been proposed to help resolve this problem. The aim of this work was to study various techniques used to deal with the problem and also, develop a system that will predict a user's information need and propose a set of pages that might satisfy him. Because the Web is a very large system, our study starts at the level of a site. In developing our system we will make use of semantic analysis techniques. Specifically, we will use an ontology to describe the contents of the pages of a website and we will also use the ontology to express the information need of the user. While browsing, the user selects those links, that considers they will bring him closer to his goal. We characterize each link with concepts associated with the content of the page they point to. Because each user represents the information in his own concept network, we adopted an ontology to express what is said to be 'common knowledge' on a topic. Using the concepts of the hyperlinks that the user selected and the relations between the concepts of the ontology, we choose the possible concept that user has in mind, and thus determine his information needs. Finally, we rank the pages of the website against the estimated informational needs of the user, creating so our proposals. 025.042 5 Recommendation systems Semantic analysis

Search results