11 |
Conception et évaluation de techniques d'interaction pour l'exploration de données complexes dans de larges espaces d'affichage / Desing and evaluation of interaction techniques for exploring complexe data in large display-spacesSaïdi, Houssem Eddine 16 October 2018 (has links)
Les données d'aujourd'hui deviennent de plus en plus complexes à cause de la forte croissance de leurs volumes ainsi que leur multidimensionnalité. Il devient donc nécessaire d'explorer des environnements d'affichage qui aillent au-delà du simple affichage de données offert par les moniteurs traditionnels et ce, afin de fournir une plus grande surface d'affichage ainsi que des techniques d'interaction plus performantes pour l'exploration de données. Les environnements correspondants à cette description sont les suivants : Les écrans large ; les environnements multi-écrans (EME) composés de plusieurs écrans hétérogènes spatialement distribués (moniteurs, smartphones, tablettes, table interactive ...) ; les environnements immersifs. Dans ce contexte, l'objectif de ces travaux de thèse est de concevoir et d'évaluer des solutions d'interaction originales, efficaces et adaptées à chacun des trois environnements cités précédemment. Une première contribution de nos travaux consiste en Split-focus : une interface de visualisation et d'interaction qui exploite les facilités offertes par les environnements multi-écrans dans la visualisation de données multidimensionnelles au travers d'une interface overview + multi-detail multi-écrans. Bien que plusieurs techniques d'interaction offrent plus d'une vue détaillée en simultané, le nombre optimal de vues détaillées n'a pas été étudié. Dans ce type d'interface, le nombre de vues détaillées influe grandement sur l'interaction : avoir une seule vue détaillée offre un grand espace d'affichage mais ne permet qu'une exploration séquentielle de la vue d'ensemble?; avoir plusieurs vues détaillées réduit l'espace d'affichage dans chaque vue mais permet une exploration parallèle de la vue d'ensemble. Ce travail explore le bénéfice de diviser la vue détaillée d'une interface overview + detail pour manipuler de larges graphes à travers une étude expérimentale utilisant la technique Split-focus. Split-focus est une interface overview + multi-détails permettant d'avoir une vue d'ensemble sur un grand écran et plusieurs vues détaillées (1,2 ou 4) sur une tablette. [...] / Today's ever-growing data is becoming increasingly complex due to its large volume and high dimensionality: it thus becomes crucial to explore interactive visualization environments that go beyond the traditional desktop in order to provide a larger display area and offer more efficient interaction techniques to manipulate the data. The main environments fitting the aforementioned description are: large displays, i.e. an assembly of displays amounting to a single space; Multi-display Environments (MDEs), i.e. a combination of heterogeneous displays (monitors, smartphones/tablets/wearables, interactive tabletops...) spatially distributed in the environment; and immersive environments, i.e. systems where everything can be used as a display surface, without imposing any bound between displays and immersing the user within the environment. The objective of our work is to design and experiment original and efficient interaction techniques well suited for each of the previously described environments. First, we focused on the interaction with large datasets on large displays. We specifically studied simultaneous interaction with multiple regions of interest of the displayed visualization. We implemented and evaluated an extension of the traditional overview+detail interface to tackle this problem: it consists of an overview+detail interface where the overview is displayed on a large screen and multiple detailed views are displayed on a tactile tablet. The interface allows the user to have up to four detailed views of the visualization at the same time. We studied its usefulness as well as the optimal number of detailed views that can be used efficiently. Second, we designed a novel touch-enabled device, TDome, to facilitate interactions in Multi- display environments. The device is composed of a dome-like base and provides up to 6 degrees of freedom, a touchscreen and a camera that can sense the environment. [...]
|
12 |
Fast and Scalable Outlier Detection with Metric Access Methods / Detecção Rápida e Escalável de Casos de Exceção com Métodos de Acesso MétricoBispo Junior, Altamir Gomes 25 July 2019 (has links)
It is well-known that the existing theoretical models for outlier detection make assumptions that may not reflect the true nature of outliers in every real application. This dissertation describes an empirical study performed on unsupervised outlier detection using 8 algorithms from the state-of-the-art and 8 datasets that refer to a variety of real-world tasks of practical relevance, such as spotting cyberattacks, clinical pathologies and abnormalities occurring in nature. We present our lowdown on the results obtained, pointing out to the strengths and weaknesses of each technique from the application specialists point of view, which is a shift from the designer-based point of view that is commonly adopted. Many of the techniques had unfeasibly high runtime requirements or failed to spot what the specialists consider as outliers in their own data. To tackle this issue, we propose MetricABOD: a novel ABOD-based algorithm that makes the analysis up to thousands of times faster, still being in average 26% more accurate than the most accurate related work. This improvement is tantamount to practical outlier detection in many real-world applications for which the existing methods present unstable accuracy or unfeasible runtime requirements. Finally, we studied two collections of text data to show that our MetricABOD works also for adimensional, purely metric data. / É conhecido e notável que os modelos teóricos existentes empregados na detecção de outliers realizam assunções que podem não refletir a verdadeira natureza dos outliers em cada aplicação. Esta dissertação descreve um estudo empírico sobre detecção de outliers não-supervisionada usando 8 algoritmos do estado-da-arte e 8 conjuntos de dados que foram extraídos de uma variedade de tarefas do mundo real de relevância prática, tais como a detecção de ataques cibernéticos, patologias clínicas e anormalidades naturais. Apresentam-se considerações sobre os resultados obtidos, apontando os pontos positivos e negativos de cada técnica do ponto de vista do especialista da aplicação, o que representa uma mudança do embasamento rotineiro no ponto de vista do desenvolvedor da técnica. A maioria das técnicas estudadas apresentou requerimentos de tempo impraticáveis ou falhou em encontrar o que os especialistas consideram como outliers nos conjuntos de dados confeccionados por eles próprios. Para lidar-se com esta questão, foi desenvolvido o método MetricABOD: um novo algoritmo baseado no ABOD que torna a análise milhares de vezes mais veloz, sendo ainda em média 26% mais acurada do que o trabalho relacionado mais acurado. Esta melhoria equivale a tornar a busca por outliers uma tarefa factível em muitas aplicações do mundo real para as quais os métodos existentes apresentam resultados instáveis ou requerimentos de tempo impassíveis de realização. Finalmente, foram também estudadas duas coleções de dados adimensionais para mostrar que o novo MetricABOD funciona também para dados puramente métricos.
|
13 |
Utilização de condições de contorno para combinação de múltiplos descritores em consultas por similaridadeBarroso, Rodrigo Fernandes 14 March 2014 (has links)
Made available in DSpace on 2016-06-02T19:06:16Z (GMT). No. of bitstreams: 1
6270.pdf: 1934927 bytes, checksum: f1e2441b9a2d898dfdbfdefc98c82a23 (MD5)
Previous issue date: 2014-03-14 / Universidade Federal de Sao Carlos / Complex data, like images, face semantic problems in your queries that might compromise results quality. Such problems have their source on the differences found between the semantic interpretation of the data and its low level machine language. In this representation are utilized feature vectors that describe intrinsic characteristics (like color, shape and texture) into qualifying attributes. Analyzing the similarity in complex data, perceives that these intrinsic characteristics complemented the representation of data, as well as is carried out by human perception and for this reason the use of multiple descriptors tend to improve the ability of discrimination data. In this context, another relevant fact is that in a data set, some subsets may present essential specific intrinsic characteristics to better show their rest of the data elements. Based in such premises, this work proposes the use of boundary conditions to identify these subsets and then use the best descriptor combination balancing for each of these, aiming to decrease the existing semantic gap in similarity queries. Throughout the conducted experiments the use of the proposed technique had better results when compared to use individual descriptor using the same boundary conditions and also using descriptors combination for the whole set without the use of boundary conditions. / Dados complexos, como imagens, enfrentam problemas semânticos em suas consultas que comprometem a qualidade dos resultados. Esses problemas são caracterizados pela divergência entre a interpretação semântica desses dados e a forma como são representados computacionalmente em características de baixo nível. Nessa representação são utilizados vetores de características que descrevem características intrínsecas (como cor, forma e textura) em atributos qualificadores. Ao analisar a similaridade em dados complexos percebe-se que essas características intrínsecas se complementam na representação do dado, bem como é realizada pela percepção humana e por este motivo a utilização de múltiplos descritores tende a melhorar a capacidade de discriminação dos dados. Nesse contexto, outro fato relevante é que em um conjunto de dados, alguns subconjuntos podem apresentar características intrínsecas específicas essenciais que melhor evidenciam seus elementos do restante dos dados. Com base nesses preceitos, este trabalho propõe a utilização de condições de contorno para delimitar estes subconjuntos e determinar o melhor balanceamento de múltiplos descritores para cada um deles, com o objetivo de diminuir o gap semântico nas consultas por similaridade. Em todos os experimentos realizados a utilização da técnica proposta sempre apresentou melhores resultados. Em comparação a utilização de descritores individuais com as mesmas condições de contorno e sem condições de contorno, e também a combinação de descritores para o conjunto todo sem a utilização de condições de contorno.
|
14 |
An?lise de Agrupamentos Com Base na Teoria da Informa??o: Uma Abordagem RepresentativaAra?jo, Daniel Sabino Amorim de 18 March 2013 (has links)
Made available in DSpace on 2014-12-17T14:55:09Z (GMT). No. of bitstreams: 1
DanielSAA_TESE_inicio_pag67.pdf: 3521346 bytes, checksum: 030bba7c8ca800b8151b345676b6759c (MD5)
Previous issue date: 2013-03-18 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / Currently, one of the biggest challenges for the field of data mining is to perform
cluster analysis on complex data. Several techniques have been proposed but, in general,
they can only achieve good results within specific areas providing no consensus of what
would be the best way to group this kind of data. In general, these techniques fail due
to non-realistic assumptions about the true probability distribution of the data. Based on
this, this thesis proposes a new measure based on Cross Information Potential that uses
representative points of the dataset and statistics extracted directly from data to measure
the interaction between groups. The proposed approach allows us to use all advantages of
this information-theoretic descriptor and solves the limitations imposed on it by its own
nature. From this, two cost functions and three algorithms have been proposed to perform
cluster analysis. As the use of Information Theory captures the relationship between different
patterns, regardless of assumptions about the nature of this relationship, the proposed
approach was able to achieve a better performance than the main algorithms in literature.
These results apply to the context of synthetic data designed to test the algorithms in
specific situations and to real data extracted from problems of different fields / Atualmente, um dos maiores desafios para o campo de minera??o de dados ? realizar
a an?lise de agrupamentos em dados complexos. At? o momento, diversas t?cnicas foram
propostas mas, em geral, elas s? conseguem atingir bons resultados dentro de dom?nios
espec?ficos, n?o permitindo, dessa maneira, que exista um consenso de qual seria a melhor
forma para agrupar dados. Essas t?cnicas costumam falhar por fazer suposi??es nem sempre
realistas sobre a distribui??o de probabilidade que modela os dados. Com base nisso,
o trabalho proposto neste documento cria uma nova medida baseada no Potencial de Informa??o
Cruzado que utiliza pontos representativos do conjunto de dados e a estat?stica
extra?da diretamente deles para medir a intera??o entre grupos. A abordagem proposta
permite usar todas as vantagens desse descritor de informa??o e contorna as limita??es
impostas a ele pela sua pr?pria forma de funcionamento. A partir disso, duas fun??es
custo de otimiza??o e tr?s algoritmos foram constru?dos para realizar a an?lise de agrupamentos.
Como o uso de Teoria da Informa??o permite capturar a rela??o entre diferentes
padr?es, independentemente de suposi??es sobre a natureza dessa rela??o, a abordagem
proposta foi capaz de obter um desempenho superior aos principais algoritmos citados
na literatura. Esses resultados valem tanto para o contexto de dados sint?ticos desenvolvidos
para testar os algoritmos em situa??es espec?ficas quanto em dados extra?dos de
problemas reais de diferentes naturezas
|
15 |
Statistical Estimation of Software Reliability and Failure-causing EffectShu, Gang 02 September 2014 (has links)
No description available.
|
16 |
Soluções aproximadas para algoritmos escaláveis de mineração de dados em domínios de dados complexos usando GPGPU / On approximate solutions to scalable data mining algorithms for complex data problems using GPGPUMamani, Alexander Victor Ocsa 22 September 2011 (has links)
A crescente disponibilidade de dados em diferentes domínios tem motivado o desenvolvimento de técnicas para descoberta de conhecimento em grandes volumes de dados complexos. Trabalhos recentes mostram que a busca em dados complexos é um campo de pesquisa importante, já que muitas tarefas de mineração de dados, como classificação, detecção de agrupamentos e descoberta de motifs, dependem de algoritmos de busca ao vizinho mais próximo. Para resolver o problema da busca dos vizinhos mais próximos em domínios complexos muitas abordagens determinísticas têm sido propostas com o objetivo de reduzir os efeitos da maldição da alta dimensionalidade. Por outro lado, algoritmos probabilísticos têm sido pouco explorados. Técnicas recentes relaxam a precisão dos resultados a fim de reduzir o custo computacional da busca. Além disso, em problemas de grande escala, uma solução aproximada com uma análise teórica sólida mostra-se mais adequada que uma solução exata com um modelo teórico fraco. Por outro lado, apesar de muitas soluções exatas e aproximadas de busca e mineração terem sido propostas, o modelo de programação em CPU impõe restrições de desempenho para esses tipos de solução. Uma abordagem para melhorar o tempo de execução de técnicas de recuperação e mineração de dados em várias ordens de magnitude é empregar arquiteturas emergentes de programação paralela, como a arquitetura CUDA. Neste contexto, este trabalho apresenta uma proposta para buscas kNN de alto desempenho baseada numa técnica de hashing e implementações paralelas em CUDA. A técnica proposta é baseada no esquema LSH, ou seja, usa-se projeções em subespac¸os. O LSH é uma solução aproximada e tem a vantagem de permitir consultas de custo sublinear para dados em altas dimensões. Usando implementações massivamente paralelas melhora-se tarefas de mineração de dados. Especificamente, foram desenvolvidos soluções de alto desempenho para algoritmos de descoberta de motifs baseados em implementações paralelas de consultas kNN. As implementações massivamente paralelas em CUDA permitem executar estudos experimentais sobre grandes conjuntos de dados reais e sintéticos. A avaliação de desempenho realizada neste trabalho usando GeForce GTX470 GPU resultou em um aumento de desempenho de até 7 vezes, em média sobre o estado da arte em buscas por similaridade e descoberta de motifs / The increasing availability of data in diverse domains has created a necessity to develop techniques and methods to discover knowledge from huge volumes of complex data, motivating many research works in databases, data mining and information retrieval communities. Recent studies have suggested that searching in complex data is an interesting research field because many data mining tasks such as classification, clustering and motif discovery depend on nearest neighbor search algorithms. Thus, many deterministic approaches have been proposed to solve the nearest neighbor search problem in complex domains, aiming to reduce the effects of the well-known curse of dimensionality. On the other hand, probabilistic algorithms have been slightly explored. Recently, new techniques aim to reduce the computational cost relaxing the quality of the query results. Moreover, in large-scale problems, an approximate solution with a solid theoretical analysis seems to be more appropriate than an exact solution with a weak theoretical model. On the other hand, even though several exact and approximate solutions have been proposed, single CPU architectures impose limits on performance to deliver these kinds of solution. An approach to improve the runtime of data mining and information retrieval techniques by an order-of-magnitude is to employ emerging many-core architectures such as CUDA-enabled GPUs. In this work we present a massively parallel kNN query algorithm based on hashing and CUDA implementation. Our method, based on the LSH scheme, is an approximate method which queries high-dimensional datasets with sub-linear computational time. By using the massively parallel implementation we improve data mining tasks, specifically we create solutions for (soft) realtime time series motif discovery. Experimental studies on large real and synthetic datasets were carried out thanks to the highly CUDA parallel implementation. Our performance evaluation on GeForce GTX 470 GPU resulted in average runtime speedups of up to 7x on the state-of-art of similarity search and motif discovery solutions
|
17 |
Multi-utilisation de données complexes et hétérogènes : application au domaine du PLM pour l’imagerie biomédicale / Multi-use of complex and heterogenous data : application in the domain of PLM for biomedical imagingPham, Cong Cuong 15 June 2017 (has links)
L’émergence des technologies de l’information et de la communication (TIC) au début des années 1990, notamment internet, a permis de produire facilement des données et de les diffuser au reste du monde. L’essor des bases de données, le développement des outils applicatifs et la réduction des coûts de stockage ont conduit à l’augmentation quasi exponentielle des quantités de données au sein de l’entreprise. Plus les données sont volumineuses, plus la quantité d’interrelations entre données augmente. Le grand nombre de corrélations (visibles ou cachées) entre données rend les données plus entrelacées et complexes. Les données sont aussi plus hétérogènes, car elles peuvent venir de plusieurs sources et exister dans de nombreux formats (texte, image, audio, vidéo, etc.) ou à différents degrés de structuration (structurées, semi-structurées, non-structurées). Les systèmes d’information des entreprises actuelles contiennent des données qui sont plus massives, complexes et hétérogènes. L’augmentation de la complexité, la globalisation et le travail collaboratif font qu’un projet industriel (conception de produit) demande la participation et la collaboration d’acteurs qui viennent de plusieurs domaines et de lieux de travail. Afin d’assurer la qualité des données, d’éviter les redondances et les dysfonctionnements des flux de données, tous les acteurs doivent travailler sur un référentiel commun partagé. Dans cet environnement de multi-utilisation de données, chaque utilisateur introduit son propre point de vue quand il ajoute de nouvelles données et informations techniques. Les données peuvent soit avoir des dénominations différentes, soit ne pas avoir des provenances vérifiables. Par conséquent, ces données sont difficilement interprétées et accessibles aux autres acteurs. Elles restent inexploitées ou non exploitées au maximum afin de pouvoir les partager et/ou les réutiliser. L’accès aux données (ou la recherche de données), par définition est le processus d’extraction des informations à partir d’une base de données en utilisant des requêtes, pour répondre à une question spécifique. L’extraction des informations est une fonction indispensable pour tout système d’information. Cependant, cette dernière n’est jamais facile car elle représente toujours un goulot majeur d’étranglement pour toutes les organisations (Soylu et al. 2013). Dans l’environnement de données complexes, hétérogènes et de multi-utilisation de données, fournir à tous les utilisateurs un accès facile et simple aux données devient plus difficile pour deux raisons : - Le manque de compétences techniques. Pour formuler informatiquement une requête complexe (les requêtes conjonctives), l’utilisateur doit connaitre la structuration de données, c’est-à-dire la façon dont les données sont organisées et stockées dans la base de données. Quand les données sont volumineuses et complexes, ce n’est pas facile d’avoir une compréhension approfondie sur toutes les dépendances et interrelations entre données, même pour les techniciens du système d’information. De plus, cette compréhension n’est pas forcément liée au savoir et savoir-faire du domaine et il est donc, très rare que les utilisateurs finaux possèdent les compétences suffisantes. - Différents points de vue des utilisateurs. Dans l’environnement de multi-utilisation de données, chaque utilisateur introduit son propre point de vue quand il ajoute des nouvelles données et informations techniques. Les données peuvent être nommées de manières très différentes et les provenances de données ne sont pas suffisamment fournies. / The emergence of Information and Comunication Technologies (ICT) in the early 1990s, especially the Internet, made it easy to produce data and disseminate it to the rest of the world. The strength of new Database Management System (DBMS) and the reduction of storage costs have led to an exponential increase of volume data within entreprise information system. The large number of correlations (visible or hidden) between data makes them more intertwined and complex. The data are also heterogeneous, as they can come from many sources and exist in many formats (text, image, audio, video, etc.) or at different levels of structuring (structured, semi-structured, unstructured). All companies now have to face with data sources that are more and more massive, complex and heterogeneous.technical information. The data may either have different denominations or may not have verifiable provenances. Consequently, these data are difficult to interpret and accessible by other actors. They remain unexploited or not maximally exploited for the purpose of sharing and reuse. Data access (or data querying), by definition, is the process of extracting information from a database using queries to answer a specific question. Extracting information is an indispensable function for any information system. However, the latter is never easy but it always represents a major bottleneck for all organizations (Soylu et al. 2013). In the environment of multiuse of complex and heterogeneous, providing all users with easy and simple access to data becomes more difficult for two reasons : - Lack of technical skills : In order to correctly formulate a query a user must know the structure of data, ie how the data is organized and stored in the database. When data is large and complex, it is not easy to have a thorough understanding of all the dependencies and interrelationships between data, even for information system technicians. Moreover, this understanding is not necessarily linked to the domain competences and it is therefore very rare that end users have sufficient theses such skills. - Different user perspectives : In the multi-use environment, each user introduces their own point of view when adding new data and technical information. Data can be namedin very different ways and data provenances are not sufficiently recorded. Consequently, they become difficultly interpretable and accessible by other actors since they do not have sufficient understanding of data semantics. The thesis work presented in this manuscript aims to improve the multi-use of complex and heterogeneous data by expert usiness actors by providing them with a semantic and visual access to the data. We find that, although the initial design of the databases has taken into account the logic of the domain (using the entity-association model for example), it is common practice to modify this design in order to adapt specific techniques needs. As a result, the final design is often a form that diverges from the original conceptual structure and there is a clear distinction between the technical knowledge needed to extract data and the knowledge that the expert actors have to interpret, process and produce data (Soylu et al. 2013). Based on bibliographical studies about data management tools, knowledge representation, visualization techniques and Semantic Web technologies (Berners-Lee et al. 2001), etc., in order to provide an easy data access to different expert actors, we propose to use a comprehensive and declarative representation of the data that is semantic, conceptual and integrates domain knowledge closeed to expert actors.
|
18 |
Uma abordagem de teste estrutural de uma transformações M2T baseada em hipergrafosAbade, André da Silva 05 January 2016 (has links)
Submitted by Aelson Maciera (aelsoncm@terra.com.br) on 2017-05-03T20:33:15Z
No. of bitstreams: 1
DissASA.pdf: 6143481 bytes, checksum: ae99305f43474756b358bade1f0bd0c7 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-05-04T13:50:02Z (GMT) No. of bitstreams: 1
DissASA.pdf: 6143481 bytes, checksum: ae99305f43474756b358bade1f0bd0c7 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-05-04T13:50:10Z (GMT) No. of bitstreams: 1
DissASA.pdf: 6143481 bytes, checksum: ae99305f43474756b358bade1f0bd0c7 (MD5) / Made available in DSpace on 2017-05-04T13:53:49Z (GMT). No. of bitstreams: 1
DissASA.pdf: 6143481 bytes, checksum: ae99305f43474756b358bade1f0bd0c7 (MD5)
Previous issue date: 2016-01-05 / Não recebi financiamento / Context: MDD (Model-Driven Development) is a software development paradigm in which the main artefacts are models, from which source code or other artefacts are generated. Even though MDD allows different views of how to decompose a problem and how to design a software to solve it, this paradigm introduces new challenges related to the input models, transformations and output artefacts. Problem Statement: Thus, software testing is a fundamental activity to reveal defects and improve confidence in the software products developed in this context. Several techniques and testing criteria have been proposed and investigated. Among them, functional testing has been extensively explored primarily in the M2M (Model-to-Model) transformations, while structural testing for M2T (Model-to-Text) transformations still poses challenges and lacks appropriate approaches. Objective: This work aims to to present a proposal for the structural testing of M2T transformations through the characterisation of input models as complex data, templates and output artefacts involved in this process. Method: The proposed approach was organised in five phases. Its strategy proposes that the complex data (grammars and metamodels) are represented by directed hypergraphs, allowing that a combinatorial-based traversal algorithm creates subsets of the input models that will be used as test cases for the M2T transformations. In this perspective, we carried out two exploratory studies with the specific purpose of feasibility analysis of the proposed approach.
Results and Conclusion: The evaluation of results from the exploratory studies, through the analysis of some testing coverage criteria, demonstrated the relevance and feasibility of the approach for characterizing complex data for M2T transformations testing. Moreover, structuring the testing strategy in phases enables the revision and adjustment of activities, in addition to assisting the replication of the approach within different applications that make use of the MDD paradigm. / Contexto: O MDD (Model-Driven Development ou Desenvolvimento Dirigido por Modelos) e um paradigma de desenvolvimento de software em que os principais artefatos são os modelos, a partir dos quais o código ou outros artefatos são gerados. Esse paradigma, embora possibilite diferentes visões de como decompor um problema e projetar um software para soluciona-lo, introduz novos desafios, qualificados pela complexidade dos modelos de entrada, as transformações e os artefatos de saída. Definição do Problema: Dessa forma, o teste de software e uma atividade fundamental para revelar defeitos e aumentar a confiança nos produtos de software desenvolvidos nesse contexto. Diversas técnicas e critérios de teste vem sendo propostos e investigados. Entre eles, o teste funcional tem sido bastante explorado primordialmente nas transformações M2M (Model-to-Model ou Modelo para Modelo), enquanto que o teste estrutural em transformações M2T (Model-to-Text ou Modelo para Texto) ainda possui alguns desafios e carência de novas abordagens. Objetivos: O objetivo deste trabalho e apresentar uma proposta para o teste estrutural de transformações M2T, por meio da caracterização dos dados complexos dos modelos de entrada, templates e artefatos de saída envolvidos neste processo. Metodologia: A abordagem proposta foi organizada em cinco fases e sua estratégia propõe que os dados complexos (gramáticas e metamodelos) sejam representados por meio de hipergrafos direcionados, permitindo que um algoritmo de percurso em hipergrafos, usando combinatória, crie subconjuntos dos modelos de entrada que serão utilizados como casos de teste para as transformações M2T. Nesta perspectiva, realizou-se dois estudos exploratórios com propósito específico da analise de viabilidade quanto a abordagem proposta. Resultados: A avaliação dos estudos exploratórios proporcionou, por meio da analise dos critérios de cobertura aplicados, um conjunto de dados que demonstram a relevância e viabilidade da abordagem quanto a caracterização de dados complexos para os testes em transformações M2T. A segmentação das estratégias em fases possibilita a revisão e adequação das atividades do processo, além de auxiliar na replicabilidade da abordagem em diferentes aplicações que fazem uso do paradigma MDD.
|
19 |
Active XML Data Warehouses for Intelligent, On-line Decision Support / Entrepôts de données XML actifs pour la décision intelligente en ligneSalem, Rashed 23 March 2012 (has links)
Un système d'aide à la décision (SIAD) est un système d'information qui assiste lesdécideurs impliqués dans les processus de décision complexes. Les SIAD modernesont besoin d'exploiter, en plus de données numériques et symboliques, des donnéeshétérogènes (données texte, données multimédia, ...) et provenant de sources diverses(comme le Web). Nous qualifions ces données complexes. Les entrepôts dedonnées forment habituellement le socle des SIAD. Ils permettent d'intégrer des données provenant de diverses sources pour appuyer le processus décisionnel. Cependant, l'avènement de données complexes impose une nouvelle vision de l'entreposagedes données, y compris de l'intégration des données, de leur stockage et de leuranalyse. En outre, les exigences d'aujourd'hui imposent l'intégration des donnéescomplexes presque en temps réel, pour remplacer le processus ETL traditionnel(Extraction, Transformation et chargement). Le traitement en temps réel exige unprocessus ETL plus actif. Les tâches d'intégration doivent réagir d'une façon intelligente, c'est-à-dire d'une façon active et autonome pour s'adapter aux changementsrencontrés dans l'environnement d'intégration des données, notamment au niveaudes sources de données.Dans cette thèse, nous proposons des solutions originales pour l'intégration dedonnées complexes en temps réel, de façon active et autonome. En eet, nous avons conçu une approche générique basé sur des métadonnées, orientée services et orienté évènements pour l'intégration des données complexes. Pour prendre en charge lacomplexité des données, notre approche stocke les données complexes à l'aide d'unformat unie en utilisant une approche base sur les métadonnées et XML. Nous noustraitons également la distribution de données et leur l'interopérabilité en utilisantune approche orientée services. Par ailleurs, pour considérer le temps réel, notreapproche stocke non seulement des données intégrées dans un référentiel unie,mais présente des fonctions d'intégration des données a la volée. Nous appliquonségalement une approche orientée services pour observer les changements de donnéespertinentes en temps réel. En outre, l'idée d'intégration des données complexes defaçon active et autonome, nous proposons une méthode de fouille dans les évènements.Pour cela, nous proposons un algorithme incrémentiel base sur XML pourla fouille des règles d'association a partir d’évènements. Ensuite, nous denissonsdes règles actives a l'aide des données provenant de la fouille d'évènements an deréactiver les tâches d'intégration.Pour valider notre approche d'intégration de données complexes, nous avons développé une plateforme logicielle, à savoir AX-InCoDa ((Active XML-based frameworkfor Integrating Complex Data). AX-InCoDa est une application Web implémenté à l'aide d'outils open source. Elle exploite les standards du Web (comme les services Web et XML) et le XML actif pour traiter la complexité et les exigences temps réel. Pour explorer les évènements stockés dans base d'évènement, nous avons proposons une méthode de fouille d'évènements an d'assurer leur autogestion.AX-InCoDa est enrichi de règles actives L'ecacite d'AX-InCoDa est illustrée par une étude de cas sur des données médicales. En, la performance de notre algorithme de fouille d'évènements est démontrée expérimentalement. / A decision support system (DSS) is an information system that supports decisionmakers involved in complex decision-making processes. Modern DSSs needto exploit data that are not only numerical or symbolic, but also heterogeneouslystructured (e.g., text and multimedia data) and coming from various sources (e.g,the Web). We term such data complex data. Data warehouses are casually usedas the basis of such DSSs. They help integrate data from a variety of sourcesto support decision-making. However, the advent of complex data imposes anothervision of data warehousing including data integration, data storage and dataanalysis. Moreover, today's requirements impose integrating complex data in nearreal-time rather than with traditional snapshot and batch ETL (Extraction, Transformationand Loading). Real-time and near real-time processing requires a moreactive ETL process. Data integration tasks must react in an intelligent, i.e., activeand autonomous way, to encountered changes in the data integration environment,especially data sources.In this dissertation, we propose novel solutions for complex data integration innear real-time, actively and autonomously. We indeed provide a generic metadatabased,service-oriented and event-driven approach for integrating complex data.To address data complexity issues, our approach stores heterogeneous data into aunied format using a metadata-based approach and XML. We also tackle datadistribution and interoperability using a service-oriented approach. Moreover, toaddress near real-time requirements, our approach stores not only integrated datainto a unied repository, but also functions to integrate data on-the-y. We also apply a service-oriented approach to track relevant data changes in near real-time.Furthermore, the idea of integrating complex data actively and autonomously revolvesaround mining logged events of data integration environment. For this sake,we propose an incremental XML-based algorithm for mining association rules fromlogged events. Then, we de ne active rules upon mined data to reactivate integrationtasks.To validate our approach for managing complex data integration, we develop ahigh-level software framework, namely AX-InCoDa (Active XML-based frameworkfor Integrating Complex Data). AX-InCoDa is implemented as Web application usingopen-source tools. It exploits Web standards (e.g., XML and Web services) andActive XML to handle complexity issues and near real-time requirements. Besidewarehousing logged events into an event repository to be mined for self-managingpurposes, AX-InCoDa is enriched with active rules. AX-InCoDa's feasibility is illustratedby a healthcare case study. Finally, the performance of our incremental eventmining algorithm is experimentally demonstrated.
|
20 |
Proposition d'un cadre pour l'analyse automatique, l'interprétation et la recherche interactive d'images de bande dessinée / A framework for the automated analysis, interpretation and interactive retrieval of comic books' imagesGuérin, Clément 24 November 2014 (has links)
Le paysage numérique de la culture française et mondiale subit de grands bouleversements depuis une quinzaine d’années avec des mutations historiques des médias, de leur format traditionnel au format numérique, tirant avantageusement parti des nouveaux moyens de communication et des dispositifs mobiles aujourd’hui popularisés. Aux côtés de formes culturelles ayant achevé, ou étant en passe d’achever, leur transition vers le numérique, la bande dessinée tâtonne encore pour trouver sa place dans l’espace du tout dématérialisé. En parallèle de l’émergence de jeunes auteurs créant spécifiquement pour ces nouveaux supports de lecture que sont ordinateurs, tablettes et smartphones, plusieurs acteurs du monde socio-économique s’intéressent à la valorisation du patrimoine existant. Les efforts se concentrent autant sur une démarche d’adaptation des œuvres aux nouveaux paradigmes de lecture que sur celle d’une indexation de leur contenu facilitant la recherche d’informations dans des bases d’albums numérisés ou dans des collections d’œuvres rares. La problématique est double, il s’agit premièrement d’être en mesure d’identifier la structure d’une planche de bande dessinée en se basant sur des extractions de primitives, issues d’une analyse d’image, validées et corrigées grâce à l’action conjointe de deux ontologies, la première manipulant les extractions d’images bas-niveau, la deuxième modélisant les règles de composition classiques de la bande dessinée franco-belge. Dans un second temps l’accent est mis sur l’enrichissement sémantique des éléments identifiés comme composants individuels d’une planche en s’appuyant sur les relations spatiales qu’ils entretiennent les uns avec les autres ainsi que sur leurs caractéristiques physiques intrinsèques. Ces annotations peuvent porter sur des éléments seuls (place d’une case dans la séquence de lecture) ou sur des liens entre éléments (texte prononcé par un personnage). / Since the beginning of the twenty-first century, the cultural industry, both in France and worldwide, has been through a massive and historical mutation. They have had to adapt to the emerging digital technology represented by the Internet and the new handheld devices such as smartphones and tablets. Although some industries successfully transfered a piece of their activity to the digital market and are about to find a sound business model, the comic books industry keeps looking for the right solution and has not yet produce anything as convincing as the music or movie offers. While many new young authors and writers use their creativity to produce specifically digital designed pieces of art, some other minds are focused on the preservation and the development of the already existing heritage. So far, efforts have been concentrated on the transfer from printed to digital support, with a special attention given to their specific features and how they can be used to create new reading conventions. There has also been some concerns about the content indexing, which is a hard task regarding the large amount of data created since the very beginning of the comics history. From a scientific point of view, there are several issues related to these goals. First, it implies to be able to identify the underlying structure of a comic books page. This comes through the extraction of the page's components, their validation and their correction based on the representation and reasoning capacities of two ontologies. The first one focus on the representation of the image analysis concepts and the second one represents the comic books domain knowledge. Secondly, a special attention is given to the semantic enhancement of the extracted elements, based on their spatial relations to each others and on their own characteristics. These annotations can be related to elements only (e.g. the position of a panel in the reading sequence), or to the bound between several elements (e.g. the text pronounced by a character).
|
Page generated in 0.0568 seconds