Global ETD Search

1	A General Framework For Classification and Similarity Measure of Spatial Relationship Hung, Tsung-Hsien 19 July 2007 (has links) none Image Retrieval Similarity Measure Spatial Relation
2	Time series data mining in systems biology Tapinos, Avraam January 2013 (has links) Analysis of time series data constitutes an important activity in many scientific disciplines. Over the last years there has been an increase in the collection of time series data in all scientific fields and disciplines, such as the industry and engineering. Due to the increasing size of the time series datasets, new automated time series data mining techniques have been devised for comparing time series data and present information in a logical and easily comprehensible structure.In systems biology in particular, time series are used to the study biological systems. The time series representations of a systems’ dynamics behaviour are multivariate time series. Time series are considered multivariate when they contain observations for more than one variable component. The biological systems’ dynamics time series contain observations for every feature component that is included in the system; they thus are multivariate time series. Recently, there has been an increasing interest in the collection of biological time series. It would therefore be beneficial for systems biologist to be able to compare these multivariate time series.Over the last decade, the field of time series analysis has attracted the attention of people from different scientific disciplines. A number of researchers from the data mining community focus their efforts on providing solutions on numerous problems regarding different time series data mining tasks. Different methods have been proposed for instance, for comparing, indexing and clustering, of univariate time series. Furthermore, different methods have been proposed for creating abstract representations of time series data and investigating the benefits of using these representations for data mining tasks.The introduction of more advanced computing resources facilitated the collection of multivariate time series, which has become common practise in various scientific fields. The increasing number of multivariate time series data triggered the demand for methods to compare them. A small number of well-suited methods have been proposed for comparing these multivariate time series data.All the currently available methods for multivariate time series comparison are more than adequate for comparing multivariate time series with the same dimensionality. However, they all suffer the same drawback. Current techniques cannot process multivariate time series with different dimensions. A proposed solution for comparing multivariate time series with arbitrary dimensions requires the creation of weighted averages. However, the accumulation of weights data is not always feasible.In this project, a new method is proposed which enables the comparison of multivariate time series with arbitrary dimensions. The particular method is evaluated on multivariate time series from different disciplines in order to test the methods’ applicability on data from different fields of science and industry. Lastly, the newly formed method is applied to perform different time series data mining analyses on a set of biological data. 006.3
3	Contribution au recalage d'images de modalités différentes à travers la mise en correspondance de nuages de points : Application à la télédétection Palmann, Christophe 23 June 2011 (has links) L'utilisation d'images de modalités différentes est très répandue dans la résolution de problèmes liés aux applications de la télédétection. La raison principale est que chaque image d'une certaine modalité contient des informations spécifiques qui peuvent être intégrées en un modèle unique, afin d'améliorer notre connaissance à propos d'une scène spécifique. A cause du grand volume de données disponibles, ces intégrations doivent être réalisées de manière automatique. Cependant, un problème apparaît dès les premiers stades du processus : la recherche, dans des images de modalités différentes, de régions en correspondance. Ce problème est difficile à résoudre car la décision de regrouper des régions doit nécessairement reposer sur la part d'information commune aux images, même si les modalités sont différentes. Dans cette thèse, nous nous proposons donc d'apporter une contribution à la résolution de ce problème / The use of several images of various modalities has been proved to be quite useful for solving problems arising in many different applications of remote sensing. The main reason is that each image of a given modality conveys its own part of specific information, which can be integrated into a single model in order to improve our knowledge on a given area. With the large amount of available data, any task of integration must be performed automatically. At the very first stage of an automated integration process, a rather direct problem arises : given a region of interest within a first image, the question is to find out its equivalent within a second image acquired over the same scene but with a different modality. This problem is difficult because the decision to match two regions must rely on the common part of information supported by the two images, even if their modalities are quite different. This is the problem that we wish to address in this thesis Recalage Multimodalité Mesure de similarité Registration Similarity measure Multimodality
4	A Semantic Graph Model for Text Representation and Matching in Document Mining Shaban, Khaled January 2006 (has links) The explosive growth in the number of documents produced daily necessitates the development of effective alternatives to explore, analyze, and discover knowledge from documents. Document mining research work has emerged to devise automated means to discover and analyze useful information from documents. This work has been mainly concerned with constructing text representation models, developing distance measures to estimate similarities between documents, and utilizing that in mining processes such as document clustering, document classification, information retrieval, information filtering, and information extraction. <br /><br /> Conventional text representation methodologies consider documents as bags of words and ignore the meanings and ideas their authors want to convey. It is this deficiency that causes similarity measures to fail to perceive contextual similarity of text passages due to the variation of the words the passages contain, or at least perceive contextually dissimilar text passages as being similar because of the resemblance of words the passages have. <br /><br /> This thesis presents a new paradigm for mining documents by exploiting semantic information of their texts. A formal semantic representation of linguistic inputs is introduced and utilized to build a semantic representation scheme for documents. The representation scheme is constructed through accumulation of syntactic and semantic analysis outputs. A new distance measure is developed to determine the similarities between contents of documents. The measure is based on inexact matching of attributed trees. It involves the computation of all distinct similarity common sub-trees, and can be computed efficiently. It is believed that the proposed representation scheme along with the proposed similarity measure will enable more effective document mining processes. <br /><br /> The proposed techniques to mine documents were implemented as vital components in a mining system. A case study of semantic document clustering is presented to demonstrate the working and the efficacy of the framework. Experimental work is reported, and its results are presented and analyzed. Electrical & Computer Engineering Document mining semantic understanding text representation similarity measure document clustering.
5	A Semantic Graph Model for Text Representation and Matching in Document Mining Shaban, Khaled January 2006 (has links) The explosive growth in the number of documents produced daily necessitates the development of effective alternatives to explore, analyze, and discover knowledge from documents. Document mining research work has emerged to devise automated means to discover and analyze useful information from documents. This work has been mainly concerned with constructing text representation models, developing distance measures to estimate similarities between documents, and utilizing that in mining processes such as document clustering, document classification, information retrieval, information filtering, and information extraction. <br /><br /> Conventional text representation methodologies consider documents as bags of words and ignore the meanings and ideas their authors want to convey. It is this deficiency that causes similarity measures to fail to perceive contextual similarity of text passages due to the variation of the words the passages contain, or at least perceive contextually dissimilar text passages as being similar because of the resemblance of words the passages have. <br /><br /> This thesis presents a new paradigm for mining documents by exploiting semantic information of their texts. A formal semantic representation of linguistic inputs is introduced and utilized to build a semantic representation scheme for documents. The representation scheme is constructed through accumulation of syntactic and semantic analysis outputs. A new distance measure is developed to determine the similarities between contents of documents. The measure is based on inexact matching of attributed trees. It involves the computation of all distinct similarity common sub-trees, and can be computed efficiently. It is believed that the proposed representation scheme along with the proposed similarity measure will enable more effective document mining processes. <br /><br /> The proposed techniques to mine documents were implemented as vital components in a mining system. A case study of semantic document clustering is presented to demonstrate the working and the efficacy of the framework. Experimental work is reported, and its results are presented and analyzed. Electrical & Computer Engineering Document mining semantic understanding text representation similarity measure document clustering.
6	A Mixed Approach for Multi-Label Document Classification Tsai, Shian-Chi 10 August 2010 (has links) Unlike single-label document classification, where each document exactly belongs to a single category, when the document is classified into two or more categories, known as multi-label file, how to classify such documents accurately has become a hot research topic in recent years. In this paper, we propose a algorithm named fuzzy similarity measure multi-label K nearest neighbors(FSMLKNN) which combines a fuzzy similarity measure with the multi-label K nearest neighbors(MLKNN) algorithm for multi-label document classification, the algorithm improved fuzzy similarity measure to calculate the similarity between a document and the center of cluster similarity, and proposed algorithm can significantly improve the performance and accuracy for multi-label document classification. In the experiment, we compare FSMLKNN and the existing classification methods, including decision tree C4.5, support vector machine(SVM) and MLKNN algorithm, the experimental results show that, FSMLKNN method is better than others. relevance score information retrieval Multi-Label document classification fuzzy similarity measure
7	A Neuro-Fuzzy Approach for Classificaion Lin, Wen-Sheng 08 September 2004 (has links) We develop a neuro-fuzzy network technique to extract TSK-type fuzzy rules from a given set of input-output data for classification problems. Fuzzy clusters are generated incrementally from the training data set, and similar clusters are merged dynamically together through input-similarity, output-similarity, and output-variance tests. The associated membership functions are defined with statistical means and deviations. Each cluster corresponds to a fuzzy IF-THEN rule, and the obtained rules can be further refined by a fuzzy neural network with a hybrid learning algorithm which combines a recursive SVD-based least squares estimator and the gradient descent method. The proposed technique has several advantages. The information about input and output data subspaces is considered simultaneously for cluster generation and merging. Membership functions match closely with and describe properly the real distribution of the training data points. Redundant clusters are combined and the sensitivity to the input order of training data is reduced. Besides, generation of the whole set of clusters from the scratch can be avoided when new training data are considered. fuzzy neural network TSK model. neuro-fuzzy gradient descent fuzzy rule similarity measure singular value decomposition
8	Automatic Source Code Classification : Classifying Source Code for a Case-Based Reasoning System Nordström, Markus January 2015 (has links) This work has investigated the possibility of classifying Java source code into cases for a case-based reasoning system. A Case-Based Reasoning system is a problem solving method in Artificial Intelligence that uses knowledge of previously solved problems to solve new problems. A case in case-based reasoning consists of two parts: the problem part and solution part. The problem part describes a problem that needs to be solved and the solution part describes how this problem was solved. In this work, the problem is described as a Java source file using words that describes the content in the source file and the solution is a classification of the source file along with the source code. To classify Java source code, a classification system was developed. It consists of four analyzers: type filter, documentation analyzer, syntactic analyzer and semantic analyzer. The type filter determines if a Java source file contains a class or interface. The documentation analyzer determines the level of documentation in asource file to see the usefulness of a file. The syntactic analyzer extracts statistics from the source code to be used for similarity, and the semantic analyzer extracts semantics from the source code. The finished classification system is formed as a kd-tree, where the leaf nodes contains the classified source files i.e. the cases. Furthermore, a vocabulary was developed to contain the domain knowledge about the Java language. The resulting kd-tree was found to be imbalanced when tested, as the majority of source files analyzed were placed inthe left-most leaf nodes. The conclusion from this was that using documentation as a part of the classification made the tree imbalanced and thus another way has to be found. This is due to the fact that source code is not documented to such an extent that it would be useful for this purpose. Artificial Intelligence Case-Based Reasoning CBR Vocabulary Classification Similarity measure Distance measure Java C++
9	Image cultural analytics through feature-based image exploration and extraction Naeimi, Parisa Unknown Date No description available. Image cultural analytics Image processing and retrieval Visual faceted search Similarity measure Flickr Video game IMDB
10	Ontology-based clustering in a Peer Data Management System Pires, Carlos Eduardo Santos 31 January 2009 (has links) Made available in DSpace on 2014-06-12T15:49:23Z (GMT). No. of bitstreams: 1 license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2009 / Faculdade de Amparo à Ciência e Tecnologia do Estado de Pernambuco / Os Sistemas P2P de Gerenciamento de Dados (PDMS) são aplicações P2P avançadas que permitem aos usuários consultar, de forma transparente, várias fontes de dados distribuídas, heterogêneas e autônomas. Cada peer representa uma fonte de dados e exporta seu esquema de dados completo ou apenas uma parte dele. Tal esquema, denominado esquema exportado, representa os dados a serem compartilhados com outros peers no sistema e é comumente descrito por uma ontologia. Os dois aspectos mais estudados sobre gerenciamento de dados em PDMS estão relacionados com mapeamentos entre esquemas e processamento de consultas. Estes aspectos podem ser melhorados se os peers estiverem eficientemente dispostos na rede overlay de acordo com uma abordagem baseada em semântica. Nesse contexto, a noção de comunidade semântica de peers é bastante importante visto que permite aproximar logicamente peers com interesses comuns sobre um tópico específico. Entretanto, devido ao comportamento dinâmico dos peers, a criação e manutenção de comunidades semânticas é um aspecto desafiador no estágio atual de desenvolvimento dos PDMS. O objetivo principal desta tese é propor um processo baseado em semântica para agrupar, de modo incremental, peers semanticamente similares que compõem comunidades em um PDMS. Nesse processo, os peers são agrupados de acordo com o respectivo esquema exportado (uma ontologia) e processos de gerenciamento de ontologias (por exemplo, matching e sumarização) são utilizados para auxiliar a conexão dos peers. Uma arquitetura de PDMS é proposta para facilitar a organização semântica dos peers na rede overlay. Para obter a similaridade semântica entre duas ontologias de peers, propomos uma medida de similaridade global como saída de um processo de ontology matching. Para otimizar o matching entre ontologias, um processo automático para sumarização de ontologias também é proposto. Um simulador foi desenvolvido de acordo com a arquitetura do PDMS. Os processos de gerenciamento de ontologias propostos também foram desenvolvidos e incluídos no simulador. Experimentações de cada processo no contexto do PDMS assim como os resultados obtidos a partir dos experimentos são apresentadas Peer-to-Peer Peer Data Management Systems Semantic Community Ontology Matching Ontology Summarization Similarity Measure

Search results