Global ETD Search

1	Combining text-based and vision-based semantics / Combining text-based and vision-based semantics Tran, Binh Giang January 2011 (has links) Learning and representing semantics is one of the most important tasks that significantly contribute to some growing areas, as successful stories in the recent survey of Turney and Pantel (2010). In this thesis, we present an in- novative (and first) framework for creating a multimodal distributional semantic model from state of the art text-and image-based semantic models. We evaluate this multimodal semantic model on simulating similarity judgements, concept clustering and the newly introduced BLESS benchmark. We also propose an effective algorithm, namely Parameter Estimation, to integrate text- and image- based features in order to have a robust multimodal system. By experiments, we show that our technique is very promising. Across all experiments, our best multimodal model claims the first position. By relatively comparing with other text-based models, we are justified to affirm that our model can stay in the top line with other state of the art models. We explore various types of visual features including SIFT and other color SIFT channels in order to have prelim- inary insights about how computer-vision techniques should be applied in the natural language processing domain. Importantly, in this thesis, we show evi- dences that adding visual features (as the perceptual information coming from...
2	Towards effective geographic ontology semantic similarity assessment Hess, Guillermo Nudelman January 2008 (has links) A cada dia cresce a importância da integração de informações geográficas, em virtude da facilidade de intercambiar dados através da Internet e do alto custo de produção deste tipo de informação. Com o advento da web semântica, o uso de ontologias para descrever informações geográficas está se tornando popular. Para permitir a integração, um dos estágios no qual muitas pesquisas estão focando é o chamado matching das ontologias geográficas. Matching consiste na medida de similaridade entre os elementos de duas ou mais ontologias geográficas. Estes elementos são chamados de conceitos e instâncias. O principal problema enfrentado no matching de ontologias é que estas podem ser descritas por diferentes pessoas (ou grupos), utilizando vocabulários diferentes e perspectivas variadas. No caso de ontologias geográficas os problemas são ainda maiores, em razão das particularidades da informação geográfica (geometria, localização espacial e relacionamentos espaciais), em função da falta de um modelo para descrição de ontologias geográficas amplamente adotado e, também, porque as ontologias são, muitas vezes, descritas em diferentes níveis de granularidade semântica. Estas particularidades das ontologias geográficas torna os matchers convencionais inadequados para o matching de ontologias geográficas. Por outro lado, os matchers existentes para o domínio geográfico são bastante limitados e somente funcionam para ontologias descritas em um modelo específico. Com o objetivo de superar essas limitações, neste trabalho são apresentados algoritmos e expressões (métricas) para medir a similaridade entre duas ontologias geográficas efetivamente, tanto em nível de instâncias quanto em nível de conceitos. Os algoritmos propostos combinam métricas para medir a similaridade considerando os aspectos não geográficos dos conceitos e instâncias com expressões criadas especificamente para tratar as características geográficas. Além disto, este trabalho também propõe um modelo para ontologia geográfica genérico, que pode servir como base para a criação de ontologias geográficas de forma padronizada. Este modelo é compatível com as recomendações do OGC e é a base para os algoritmos. Para validar estes algoritmos foi criada uma arquitetura de software chamada IG-MATCH a qual apresenta também a possibilidade de enriquecer a semântica das ontologias geográficas com relacionamentos topológicos e do tipo generalização/especialização através da análise de suas instâncias. / Integration of geographic information is becoming more important every day, due to the facility to exchange data through the Internet and the high cost to produce them. With the semantic web, the description of geographic information using ontologies is getting popular. To allow the integration, one of the steps in which many researches are focusing is the matching of geographic ontologies. A matching consists on measuring the similarity of the elements, namely either concepts or instances, of two (or more) given ontologies. The main problem with ontology matching is that the ontologies may be described by different communities, using different vocabularies and different perspectives. For geographic ontologies the difficulties may be even worse, for the particularities of the geographic information (geometry, location and spatial relationships) as well as due to the lack of a widely accepted geographic ontology model, and because the ontologies are usually described at different semantic granularities. The specificities of geographic ontologies make conventional matchers not suitable for matching geographic ontologies. On the other hand, the existing geographic ontology matchers are considerably limited in their functionality and deal with ontologies described in a particular perspective. To overcome the current limitations, in this work we present a number of similarity measurement expressions and algorithms to efficiently match two geographic ontologies, at both the concept and instance-level. These algorithms combine expressions used to assess the similarity of the so-called conventional features with expressions tailor made for covering the geographic particularities. Furthermore, this research also proposes a geographic ontology meta-model to serve as a basis for the development of geographic ontologies in order to standardize their description. This model is compliant with the OGC recommendations and is the basis upon which the algorithms are defined. For the evaluation of the algorithms, a software architecture called IG-MATCH was created with an additional feature of making possible to enrich the geographic ontologies with topological relationships and parent-child relationships by the analysis of the instances. Sistemas : Informacao geografica Banco : Dados geograficos Ontologias Geographic ontologies Semantic matching Similarity measurement
3	Towards effective geographic ontology semantic similarity assessment Hess, Guillermo Nudelman January 2008 (has links) A cada dia cresce a importância da integração de informações geográficas, em virtude da facilidade de intercambiar dados através da Internet e do alto custo de produção deste tipo de informação. Com o advento da web semântica, o uso de ontologias para descrever informações geográficas está se tornando popular. Para permitir a integração, um dos estágios no qual muitas pesquisas estão focando é o chamado matching das ontologias geográficas. Matching consiste na medida de similaridade entre os elementos de duas ou mais ontologias geográficas. Estes elementos são chamados de conceitos e instâncias. O principal problema enfrentado no matching de ontologias é que estas podem ser descritas por diferentes pessoas (ou grupos), utilizando vocabulários diferentes e perspectivas variadas. No caso de ontologias geográficas os problemas são ainda maiores, em razão das particularidades da informação geográfica (geometria, localização espacial e relacionamentos espaciais), em função da falta de um modelo para descrição de ontologias geográficas amplamente adotado e, também, porque as ontologias são, muitas vezes, descritas em diferentes níveis de granularidade semântica. Estas particularidades das ontologias geográficas torna os matchers convencionais inadequados para o matching de ontologias geográficas. Por outro lado, os matchers existentes para o domínio geográfico são bastante limitados e somente funcionam para ontologias descritas em um modelo específico. Com o objetivo de superar essas limitações, neste trabalho são apresentados algoritmos e expressões (métricas) para medir a similaridade entre duas ontologias geográficas efetivamente, tanto em nível de instâncias quanto em nível de conceitos. Os algoritmos propostos combinam métricas para medir a similaridade considerando os aspectos não geográficos dos conceitos e instâncias com expressões criadas especificamente para tratar as características geográficas. Além disto, este trabalho também propõe um modelo para ontologia geográfica genérico, que pode servir como base para a criação de ontologias geográficas de forma padronizada. Este modelo é compatível com as recomendações do OGC e é a base para os algoritmos. Para validar estes algoritmos foi criada uma arquitetura de software chamada IG-MATCH a qual apresenta também a possibilidade de enriquecer a semântica das ontologias geográficas com relacionamentos topológicos e do tipo generalização/especialização através da análise de suas instâncias. / Integration of geographic information is becoming more important every day, due to the facility to exchange data through the Internet and the high cost to produce them. With the semantic web, the description of geographic information using ontologies is getting popular. To allow the integration, one of the steps in which many researches are focusing is the matching of geographic ontologies. A matching consists on measuring the similarity of the elements, namely either concepts or instances, of two (or more) given ontologies. The main problem with ontology matching is that the ontologies may be described by different communities, using different vocabularies and different perspectives. For geographic ontologies the difficulties may be even worse, for the particularities of the geographic information (geometry, location and spatial relationships) as well as due to the lack of a widely accepted geographic ontology model, and because the ontologies are usually described at different semantic granularities. The specificities of geographic ontologies make conventional matchers not suitable for matching geographic ontologies. On the other hand, the existing geographic ontology matchers are considerably limited in their functionality and deal with ontologies described in a particular perspective. To overcome the current limitations, in this work we present a number of similarity measurement expressions and algorithms to efficiently match two geographic ontologies, at both the concept and instance-level. These algorithms combine expressions used to assess the similarity of the so-called conventional features with expressions tailor made for covering the geographic particularities. Furthermore, this research also proposes a geographic ontology meta-model to serve as a basis for the development of geographic ontologies in order to standardize their description. This model is compliant with the OGC recommendations and is the basis upon which the algorithms are defined. For the evaluation of the algorithms, a software architecture called IG-MATCH was created with an additional feature of making possible to enrich the geographic ontologies with topological relationships and parent-child relationships by the analysis of the instances. Sistemas : Informacao geografica Banco : Dados geograficos Ontologias Geographic ontologies Semantic matching Similarity measurement
4	Towards effective geographic ontology semantic similarity assessment Hess, Guillermo Nudelman January 2008 (has links) A cada dia cresce a importância da integração de informações geográficas, em virtude da facilidade de intercambiar dados através da Internet e do alto custo de produção deste tipo de informação. Com o advento da web semântica, o uso de ontologias para descrever informações geográficas está se tornando popular. Para permitir a integração, um dos estágios no qual muitas pesquisas estão focando é o chamado matching das ontologias geográficas. Matching consiste na medida de similaridade entre os elementos de duas ou mais ontologias geográficas. Estes elementos são chamados de conceitos e instâncias. O principal problema enfrentado no matching de ontologias é que estas podem ser descritas por diferentes pessoas (ou grupos), utilizando vocabulários diferentes e perspectivas variadas. No caso de ontologias geográficas os problemas são ainda maiores, em razão das particularidades da informação geográfica (geometria, localização espacial e relacionamentos espaciais), em função da falta de um modelo para descrição de ontologias geográficas amplamente adotado e, também, porque as ontologias são, muitas vezes, descritas em diferentes níveis de granularidade semântica. Estas particularidades das ontologias geográficas torna os matchers convencionais inadequados para o matching de ontologias geográficas. Por outro lado, os matchers existentes para o domínio geográfico são bastante limitados e somente funcionam para ontologias descritas em um modelo específico. Com o objetivo de superar essas limitações, neste trabalho são apresentados algoritmos e expressões (métricas) para medir a similaridade entre duas ontologias geográficas efetivamente, tanto em nível de instâncias quanto em nível de conceitos. Os algoritmos propostos combinam métricas para medir a similaridade considerando os aspectos não geográficos dos conceitos e instâncias com expressões criadas especificamente para tratar as características geográficas. Além disto, este trabalho também propõe um modelo para ontologia geográfica genérico, que pode servir como base para a criação de ontologias geográficas de forma padronizada. Este modelo é compatível com as recomendações do OGC e é a base para os algoritmos. Para validar estes algoritmos foi criada uma arquitetura de software chamada IG-MATCH a qual apresenta também a possibilidade de enriquecer a semântica das ontologias geográficas com relacionamentos topológicos e do tipo generalização/especialização através da análise de suas instâncias. / Integration of geographic information is becoming more important every day, due to the facility to exchange data through the Internet and the high cost to produce them. With the semantic web, the description of geographic information using ontologies is getting popular. To allow the integration, one of the steps in which many researches are focusing is the matching of geographic ontologies. A matching consists on measuring the similarity of the elements, namely either concepts or instances, of two (or more) given ontologies. The main problem with ontology matching is that the ontologies may be described by different communities, using different vocabularies and different perspectives. For geographic ontologies the difficulties may be even worse, for the particularities of the geographic information (geometry, location and spatial relationships) as well as due to the lack of a widely accepted geographic ontology model, and because the ontologies are usually described at different semantic granularities. The specificities of geographic ontologies make conventional matchers not suitable for matching geographic ontologies. On the other hand, the existing geographic ontology matchers are considerably limited in their functionality and deal with ontologies described in a particular perspective. To overcome the current limitations, in this work we present a number of similarity measurement expressions and algorithms to efficiently match two geographic ontologies, at both the concept and instance-level. These algorithms combine expressions used to assess the similarity of the so-called conventional features with expressions tailor made for covering the geographic particularities. Furthermore, this research also proposes a geographic ontology meta-model to serve as a basis for the development of geographic ontologies in order to standardize their description. This model is compliant with the OGC recommendations and is the basis upon which the algorithms are defined. For the evaluation of the algorithms, a software architecture called IG-MATCH was created with an additional feature of making possible to enrich the geographic ontologies with topological relationships and parent-child relationships by the analysis of the instances. Sistemas : Informacao geografica Banco : Dados geograficos Ontologias Geographic ontologies Semantic matching Similarity measurement
5	A top-down approach for creating and implementing data mining solutions Laurinen, P. (Perttu) 13 June 2006 (has links) Abstract The information age is characterized by ever-growing amounts of data surrounding us. By reproducing this data into usable knowledge we can start moving toward the knowledge age. Data mining is the science of transforming measurable information into usable knowledge. During the data mining process, the measurements pass through a chain of sophisticated transformations in order to acquire knowledge. Furthermore, in some applications the results are implemented as software solutions so that they can be continuously utilized. It is evident that the quality and amount of the knowledge formed is highly dependent on the transformations and the process applied. This thesis presents an application independent concept that can be used for managing the data mining process and implementing the acquired results as software applications. The developed concept is divided into two parts – solution formation and solution implementation. The first part presents a systematic way for finding a data mining solution from a set of measurement data. The developed approach allows for easier application of a variety of algorithms to the data, manages the work chain, and differentiates between the data mining tasks. The method is based on storage of the data between the main stages of the data mining process, where the different stages of the process are defined on the basis of the type of algorithms applied to the data. The efficiency of the process is demonstrated with a case study presenting new solutions for resistance spot welding quality control. The second part of the concept presents a component-based data mining application framework, called Smart Archive, designed for implementing the solution. The framework provides functionality that is common to most data mining applications and is especially suitable for implementing applications that process continuously acquired measurements. The work also proposes an efficient algorithm for utilizing cumulative measurement data in the history component of the framework. Using the framework, it is possible to build high-quality data mining applications with shorter development times by configuring the framework to process application-specific data. The efficiency of the framework is illustrated using a case study presenting the results and implementation principles of an application developed for predicting steel slab temperatures in a hot strip mill. In conclusion, this thesis presents a concept that proposes solutions for two fundamental issues of data mining, the creation of a working data mining solution from a set of measurement data and the implementation of it as a stand-alone application. data mining application development data mining process similarity measurement spot welding trajectory walking beam furnace
6	A study of model parameters for scaling up word to sentence similarity tasks in distributional semantics Milajevs, Dmitrijs January 2018 (has links) Representation of sentences that captures semantics is an essential part of natural language processing systems, such as information retrieval or machine translation. The representation of a sentence is commonly built by combining the representations of the words that the sentence consists of. Similarity between words is widely used as a proxy to evaluate semantic representations. Word similarity models are well-studied and are shown to positively correlate with human similarity judgements. Current evaluation of models of sentential similarity builds on the results obtained in lexical experiments. The main focus is how the lexical representations are used, rather than what they should be. It is often assumed that the optimal representations for word similarity are also optimal for sentence similarity. This work discards this assumption and systematically looks for lexical representations that are optimal for similarity measurement between sentences. We find that the best representation for word similarity is not always the best for sentence similarity and vice versa. The best models in word similarity tasks perform best with additive composition. However, the best result on compositional tasks is achieved with Kroneckerbased composition. There are representations that are equally good in both tasks when used with multiplicative composition. The systematic study of the parameters of similarity models reveals that the more information lexical representations contain, the more attention should be paid to noise. In particular, the word vectors in models with the feature size at the magnitude of the vocabulary size should be sparse, but if a small number of context features is used then the vectors should be dense. Given the right lexical representations, compositional operators achieve state-of-the-art performance, improving over models that use neural-word embeddings. To avoid overfitting, either several test datasets should be used or parameter selection should be based on parameters' average behaviours.
7	A novel approach for continuous speech tracking and dynamic time warping : adaptive framing based continuous speech similarity measure and dynamic time warping using Kalman filter and dynamic state model Khan, Wasiq January 2014 (has links) Dynamic speech properties such as time warping, silence removal and background noise interference are the most challenging issues in continuous speech signal matching. Among all of them, the time warped speech signal matching is of great interest and has been a tough challenge for the researchers. An adaptive framing based continuous speech tracking and similarity measurement approach is introduced in this work following a comprehensive research conducted in the diverse areas of speech processing. A dynamic state model is introduced based on system of linear motion equations which models the input (test) speech signal frame as a unidirectional moving object along the template speech signal. The most similar corresponding frame position in the template speech is estimated which is fused with a feature based similarity observation and the noise variances using a Kalman filter. The Kalman filter provides the final estimated frame position in the template speech at current time which is further used for prediction of a new frame size for the next step. In addition, a keyword spotting approach is proposed by introducing wavelet decomposition based dynamic noise filter and combination of beliefs. The Dempster’s theory of belief combination is deployed for the first time in relation to keyword spotting task. Performances for both; speech tracking and keyword spotting approaches are evaluated using the statistical metrics and gold standards for the binary classification. Experimental results proved the superiority of the proposed approaches over the existing methods.
8	TOWARDS TIME-AWARE COLLABORATIVE FILTERING RECOMMENDATION SYSTEM Dawei Wang (9216029) 12 October 2021 (has links) <div><div><div><p>As technological capacity to store and exchange information progress, the amount of available data grows explosively, which can lead to information overload. The dif- ficulty of making decisions effectively increases when one has too much information about that issue. Recommendation systems are a subclass of information filtering systems that aim to predict a user’s opinion or preference of topic or item, thereby providing personalized recommendations to users by exploiting historic data. They are widely used in e-commerce such as Amazon.com, online movie streaming com- panies such as Netflix, and social media networks such as Facebook. Memory-based collaborative filtering (CF) is one of the recommendation system methods used to predict a user’s rating or preference by exploring historic ratings, but without in- corporating any content information about users or items. Many studies have been conducted on memory-based CFs to improve prediction accuracy, but none of them have achieved better prediction accuracy than state-of-the-art model-based CFs. Fur- thermore, A product or service is not judged only by its own characteristics but also by the characteristics of other products or services offered concurrently. It can also be judged by anchoring based on users’ memories. Rating or satisfaction is viewed as a function of the discrepancy or contrast between expected and obtained outcomes documented as contrast effects. Thus, a rating given to an item by a user is a compar- ative opinion based on the user’s past experiences. Therefore, the score of ratings can be affected by the sequence and time of ratings. However, in traditional CFs, pairwise similarities measured between items do not consider time factors such as the sequence of rating, which could introduce biases caused by contrast effects. In this research, we proposed a new approach that combines both structural and rating-based similarity measurement used in memory-based CFs. We found that memory-based CF using combined similarity measurement can achieve better prediction accuracy than model-based CFs in terms of lower MAE and reduce memory and time by using less neighbors than traditional memory-based CFs on MovieLens and Netflix datasets. We also proposed techniques to reduce the biases caused by those user comparing, anchoring and adjustment behaviors by introducing the time-aware similarity measurements used in memory-based CFs. At last, we introduced novel techniques to identify, quantify, and visualize user preference dynamics and how it could be used in generating dynamic recommendation lists that fits each user’s current preferences.</p></div></div></div> Operations Research Pattern Recognition and Data Mining machine Learning recommendation system collaborative filtering Similarity measurement
9	A Novel Approach for Continuous Speech Tracking and Dynamic Time Warping. Adaptive Framing Based Continuous Speech Similarity Measure and Dynamic Time Warping using Kalman Filter and Dynamic State Model Khan, Wasiq January 2014 (has links) Dynamic speech properties such as time warping, silence removal and background noise interference are the most challenging issues in continuous speech signal matching. Among all of them, the time warped speech signal matching is of great interest and has been a tough challenge for the researchers. An adaptive framing based continuous speech tracking and similarity measurement approach is introduced in this work following a comprehensive research conducted in the diverse areas of speech processing. A dynamic state model is introduced based on system of linear motion equations which models the input (test) speech signal frame as a unidirectional moving object along the template speech signal. The most similar corresponding frame position in the template speech is estimated which is fused with a feature based similarity observation and the noise variances using a Kalman filter. The Kalman filter provides the final estimated frame position in the template speech at current time which is further used for prediction of a new frame size for the next step. In addition, a keyword spotting approach is proposed by introducing wavelet decomposition based dynamic noise filter and combination of beliefs. The Dempster’s theory of belief combination is deployed for the first time in relation to keyword spotting task. Performances for both; speech tracking and keyword spotting approaches are evaluated using the statistical metrics and gold standards for the binary classification. Experimental results proved the superiority of the proposed approaches over the existing methods. / The appendices files are not available online.
10	Harmonisation de l'information géo-scientifique de bases de données industrielles par mesures automatiques de ressemblance / Harmonization of geo-scientific information in industrial data bases, thanks to automatic similarity metrics Fuga, Alba 05 January 2017 (has links) Pour automatiser l’harmonisation des bases de données industrielles de navigation sismique, une méthodologie et un logiciel ont été mis en place. La méthodologie d’Automatisation des Mesures de Ressemblance (AMR), permet de modéliser et hiérarchiser les critères de comparaison servant de repères pour l’automatisation. Accompagné d’un ensemble de seuils de tolérance, le modèle hiérarchisé a été utilisé comme filtre à tamis dans le processus de classification automatique permettant de trouver rapidement les données fortement similaires. La similarité est mesurée par un ensemble de métriques élémentaires, aboutissant à des scores numériques, puis elle est mesurée de manière plus globale et contextuelle, notamment suivant plusieurs échelles : entre les attributs, entre les données, et entre les groupes. Ces évaluations de la similarité permettent à la fois au système expert de présenter des analyses précises automatisées et à l’expert géophysicien de réaliser des interprétations multicritères en faisant en environ deux jours le travail qu’il faisait en trois semaines. Les stratégies de classification automatique sont quant à elles adaptables à différentes problématiques, à l’harmonisation des données, mais aussi à la réconciliation des données ou au géo-référencement de documents techniques. Le Logiciel Automatique de Comparaisons (LAC) est une implantation de l’AMR réalisée pour les services de Data Management et de Documentation Technique de TOTAL. L’outil industrialisé est utilisé depuis trois ans, mais n’est plus en maintenance informatique aujourd’hui malgré son usage. Les nouvelles fonctionnalités d'imagerie de base de données qui ont été développées dans cette thèse n'y sont pas encore intégrées, mais devraient permettre une meilleure visualisation des phénomènes. Cette dernière manière de représenter les données, fondée sur la mesure de similarité, permet d’avoir une image assez claire de données lourdes car complexes tout en permettant de lire des informations nécessaires à l’harmonisation et à l’évaluation de la qualité des bases. Ne pourrait-on pas chercher à caractériser, comparer, analyser, gérer les flux entrants et sortants des bases de données, suivre leurs évolutions et tirer des modes d’apprentissage automatique à partir du développement de cette imagerie ? / In order to harmonize industrial seismic navigation data bases, a methodology and a software have been developed. The methodology of Similarity Measurement Automation provides protocols to build a model and a hierarchy for the comparison criteria that shall be used as points of reference for the automation. With its tolerance set of thresholds, the model has been used as a scaled filter within the automatic classification process which aim is to find as quickly as possible very similar data. Similarity is measured by combinations of elementary metrics giving scores, and also by a global and contextual procedure, giving access to three levels of results: similarity between attributes, between individuals, and between groups. Accurate automated analyses of the expert system as well as human interpretations on multiple criteria are now possible thanks to these similarity estimations, reducing to two days instead of three weeks the work of a geophysicist. Classification strategies have been designed to suit the different data management issues, as well as harmonization, reconciliation or geo-referencing. The methodology has been implemented in software for automatic comparisons named LAC, and developed for Data Management and Technical Documentation services in TOTAL. The software has been industrialized and has been used for three years, even if now there is no technical maintenance anymore. The last data base visualization functionalities that have been developed have not been integrated yet to the software, but shall provide a better visualization of the phenomena. This latest way to visualize data is based on similarity measurement and obtains an image of complex and voluminous data clear enough. It also puts into relief information useful for harmonization and data quality evaluation. Would it be possible to characterize, compare, analyze and manage data flows, to monitor their evolution and figure out new machine learning methods by developing further this kind of data base imaging? Mesures de similarité Système de filtrage à tamis Classification automatique Harmonisation Imagerie de bases de données Système expert Similarity measurement Industrial data bases Expert system 526.028

Search results