• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 45
  • 13
  • 7
  • 2
  • 1
  • 1
  • Tagged with
  • 82
  • 82
  • 33
  • 25
  • 16
  • 15
  • 13
  • 13
  • 12
  • 12
  • 12
  • 12
  • 11
  • 11
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Classificação de dados estacionários e não estacionários baseada em grafos / Graph-based classification for stationary and non-stationary data

João Roberto Bertini Júnior 24 January 2011 (has links)
Métodos baseados em grafos consistem em uma poderosa forma de representação e abstração de dados que proporcionam, dentre outras vantagens, representar relações topológicas, visualizar estruturas, representar grupos de dados com formatos distintos, bem como, fornecer medidas alternativas para caracterizar os dados. Esse tipo de abordagem tem sido cada vez mais considerada para solucionar problemas de aprendizado de máquina, principalmente no aprendizado não supervisionado, como agrupamento de dados, e mais recentemente, no aprendizado semissupervisionado. No aprendizado supervisionado, por outro lado, o uso de algoritmos baseados em grafos ainda tem sido pouco explorado na literatura. Este trabalho apresenta um algoritmo não paramétrico baseado em grafos para problemas de classificação com distribuição estacionária, bem como sua extensão para problemas que apresentam distribuição não estacionária. O algoritmo desenvolvido baseia-se em dois conceitos, a saber, 1) em uma estrutura chamada grafo K-associado ótimo, que representa o conjunto de treinamento como um grafo esparso e dividido em componentes; e 2) na medida de pureza de cada componente, que utiliza a estrutura do grafo para determinar o nível de mistura local dos dados em relação às suas classes. O trabalho também considera problemas de classificação que apresentam alteração na distribuição de novos dados. Este problema caracteriza a mudança de conceito e degrada o desempenho do classificador. De modo que, para manter bom desempenho, é necessário que o classificador continue aprendendo durante a fase de aplicação, por exemplo, por meio de aprendizado incremental. Resultados experimentais sugerem que ambas as abordagens apresentam vantagens na classificação de dados em relação aos algoritmos testados / Graph-based methods consist in a powerful form for data representation and abstraction which provides, among others advantages, representing topological relations, visualizing structures, representing groups of data with distinct formats, as well as, supplying alternative measures to characterize data. Such approach has been each time more considered to solve machine learning related problems, mainly concerning unsupervised learning, like clustering, and recently, semi-supervised learning. However, graph-based solutions for supervised learning tasks still remain underexplored in literature. This work presents a non-parametric graph-based algorithm suitable for classification problems with stationary distribution, as well as its extension to cope with problems of non-stationary distributed data. The developed algorithm relies on the following concepts, 1) a graph structure called optimal K-associated graph, which represents the training set as a sparse graph separated into components; and 2) the purity measure for each component, which uses the graph structure to determine local data mixture level in relation to their classes. This work also considers classification problems that exhibit modification on distribution of data flow. This problem qualifies concept drift and worsens any static classifier performance. Hence, in order to maintain accuracy performance, it is necessary for the classifier to keep learning during application phase, for example, by implementing incremental learning. Experimental results, concerning both algorithms, suggest that they had presented advantages over the tested algorithms on data classification tasks
72

Odpovídání na otázky nad strukturovanými daty / Question Answering over Structured Data

Birger, Mark January 2017 (has links)
Tato práce se zabývá problematikou odpovídání na otázky nad strukturovanými daty. Ve většině případů jsou strukturovaná data reprezentována pomocí propojených grafů, avšak ukrytí koncové struktury dát je podstatné pro využití podobných systémů jako součástí rozhraní s přirozeným jazykem. Odpovídající systém byl navržen a vyvíjen v rámci této práce. V porovnání s tradičními odpovídajícími systémy, které jsou založené na lingvistické analýze nebo statistických metodách, náš systém zkoumá poskytnutý graf a ve výsledků generuje sémantické vazby na základě vstupních párů otázka-odpověd'. Vyvíjený systém je nezávislý na struktuře dát, ale pro účely vyhodnocení jsme využili soubor dát z Wikidata a DBpedia. Kvalita výsledného systému a zkoumaného přístupu byla vyhodnocena s využitím připraveného datasetu a standartních metrik.
73

Représentation d'images hiérarchique multi-critère / Hierarchical multi-feature image representation

Randrianasoa, Tianatahina Jimmy Francky 08 December 2017 (has links)
La segmentation est une tâche cruciale en analyse d’images. L’évolution des capteurs d’acquisition induit de nouvelles images de résolution élevée, contenant des objets hétérogènes. Il est aussi devenu courant d’obtenir des images d’une même scène à partir de plusieurs sources. Ceci rend difficile l’utilisation des méthodes de segmentation classiques. Les approches de segmentation hiérarchiques fournissent des solutions potentielles à ce problème. Ainsi, l’Arbre Binaire de Partitions (BPT) est une structure de données représentant le contenu d’une image à différentes échelles. Sa construction est généralement mono-critère (i.e. une image, une métrique) et fusionne progressivement des régions connexes similaires. Cependant, la métrique doit être définie a priori par l’utilisateur, et la gestion de plusieurs images se fait en regroupant de multiples informations issues de plusieurs bandes spectrales dans une seule métrique. Notre première contribution est une approche pour la construction multicritère d’un BPT. Elle établit un consensus entre plusieurs métriques, permettant d’obtenir un espace de segmentation hiérarchique unifiée. Par ailleurs, peu de travaux se sont intéressés à l’évaluation de ces structures hiérarchiques. Notre seconde contribution est une approche évaluant la qualité des BPTs en se basant sur l’analyse intrinsèque et extrinsèque, suivant des exemples issus de vérités-terrains. Nous discutons de l’utilité de cette approche pour l’évaluation d’un BPT donné mais aussi de la détermination de la combinaison de paramètres adéquats pour une application précise. Des expérimentations sur des images satellitaires mettent en évidence la pertinence de ces approches en segmentation d’images. / Segmentation is a crucial task in image analysis. Novel acquisition devices bring new images with higher resolutions, containing more heterogeneous objects. It becomes also easier to get many images of an area from different sources. This phenomenon is encountered in many domains (e.g. remote sensing, medical imaging) making difficult the use of classical image segmentation methods. Hierarchical segmentation approaches provide solutions to such issues. Particularly, the Binary Partition Tree (BPT) is a hierarchical data-structure modeling an image content at different scales. It is built in a mono-feature way (i.e. one image, one metric) by merging progressively similar connected regions. However, the metric has to be carefully thought by the user and the handling of several images is generally dealt with by gathering multiple information provided by various spectral bands into a single metric. Our first contribution is a generalized framework for the BPT construction in a multi-feature way. It relies on a strategy setting up a consensus between many metrics, allowing us to obtain a unified hierarchical segmentation space. Surprisingly, few works were devoted to the evaluation of hierarchical structures. Our second contribution is a framework for evaluating the quality of BPTs relying both on intrinsic and extrinsic quality analysis based on ground-truth examples. We also discuss about the use of this evaluation framework both for evaluating the quality of a given BPT and for determining which BPT should be built for a given application. Experiments using satellite images emphasize the relevance of the proposed frameworks in the context of image segmentation.
74

Apprentissage machine efficace : théorie et pratique

Delalleau, Olivier 03 1900 (has links)
Malgré des progrès constants en termes de capacité de calcul, mémoire et quantité de données disponibles, les algorithmes d'apprentissage machine doivent se montrer efficaces dans l'utilisation de ces ressources. La minimisation des coûts est évidemment un facteur important, mais une autre motivation est la recherche de mécanismes d'apprentissage capables de reproduire le comportement d'êtres intelligents. Cette thèse aborde le problème de l'efficacité à travers plusieurs articles traitant d'algorithmes d'apprentissage variés : ce problème est vu non seulement du point de vue de l'efficacité computationnelle (temps de calcul et mémoire utilisés), mais aussi de celui de l'efficacité statistique (nombre d'exemples requis pour accomplir une tâche donnée). Une première contribution apportée par cette thèse est la mise en lumière d'inefficacités statistiques dans des algorithmes existants. Nous montrons ainsi que les arbres de décision généralisent mal pour certains types de tâches (chapitre 3), de même que les algorithmes classiques d'apprentissage semi-supervisé à base de graphe (chapitre 5), chacun étant affecté par une forme particulière de la malédiction de la dimensionalité. Pour une certaine classe de réseaux de neurones, appelés réseaux sommes-produits, nous montrons qu'il peut être exponentiellement moins efficace de représenter certaines fonctions par des réseaux à une seule couche cachée, comparé à des réseaux profonds (chapitre 4). Nos analyses permettent de mieux comprendre certains problèmes intrinsèques liés à ces algorithmes, et d'orienter la recherche dans des directions qui pourraient permettre de les résoudre. Nous identifions également des inefficacités computationnelles dans les algorithmes d'apprentissage semi-supervisé à base de graphe (chapitre 5), et dans l'apprentissage de mélanges de Gaussiennes en présence de valeurs manquantes (chapitre 6). Dans les deux cas, nous proposons de nouveaux algorithmes capables de traiter des ensembles de données significativement plus grands. Les deux derniers chapitres traitent de l'efficacité computationnelle sous un angle différent. Dans le chapitre 7, nous analysons de manière théorique un algorithme existant pour l'apprentissage efficace dans les machines de Boltzmann restreintes (la divergence contrastive), afin de mieux comprendre les raisons qui expliquent le succès de cet algorithme. Finalement, dans le chapitre 8 nous présentons une application de l'apprentissage machine dans le domaine des jeux vidéo, pour laquelle le problème de l'efficacité computationnelle est relié à des considérations d'ingénierie logicielle et matérielle, souvent ignorées en recherche mais ô combien importantes en pratique. / Despite constant progress in terms of available computational power, memory and amount of data, machine learning algorithms need to be efficient in how they use them. Although minimizing cost is an obvious major concern, another motivation is to attempt to design algorithms that can learn as efficiently as intelligent species. This thesis tackles the problem of efficient learning through various papers dealing with a wide range of machine learning algorithms: this topic is seen both from the point of view of computational efficiency (processing power and memory required by the algorithms) and of statistical efficiency (n umber of samples necessary to solve a given learning task).The first contribution of this thesis is in shedding light on various statistical inefficiencies in existing algorithms. Indeed, we show that decision trees do not generalize well on tasks with some particular properties (chapter 3), and that a similar flaw affects typical graph-based semi-supervised learning algorithms (chapter 5). This flaw is a form of curse of dimensionality that is specific to each of these algorithms. For a subclass of neural networks, called sum-product networks, we prove that using networks with a single hidden layer can be exponentially less efficient than when using deep networks (chapter 4). Our analyses help better understand some inherent flaws found in these algorithms, and steer research towards approaches that may potentially overcome them. We also exhibit computational inefficiencies in popular graph-based semi-supervised learning algorithms (chapter 5) as well as in the learning of mixtures of Gaussians with missing data (chapter 6). In both cases we propose new algorithms that make it possible to scale to much larger datasets. The last two chapters also deal with computational efficiency, but in different ways. Chapter 7 presents a new view on the contrastive divergence algorithm (which has been used for efficient training of restricted Boltzmann machines). It provides additional insight on the reasons why this algorithm has been so successful. Finally, in chapter 8 we describe an application of machine learning to video games, where computational efficiency is tied to software and hardware engineering constraints which, although often ignored in research papers, are ubiquitous in practice.
75

Development of predictive analysis solutions for the ESD robustness of integrated circuits in advanced CMOS technologies / Développement de solutions d’analyse prédictive pour la robustesse ESD des circuits intégrés en technologies CMOS avancées

Viale, Benjamin 29 November 2017 (has links)
Les circuits intégrés (CI) devenant de plus en plus complexes et vulnérables face aux décharges électrostatiques (ESD pour ElectroStatic Discharge), la capacité à vérifier de manière fiable la présence de défauts de conception ESD sur des puces comptant plusieurs milliards de transistors avant tout envoi en fabrication est devenu un enjeu majeur dans l’industrie des semi-conducteurs. Des outils commerciaux automatisés de dessin électronique (EDA pour Electronic Design Automation) et leur flot de vérification associé permettent d’effectuer différents types de contrôles qui se sont révélés être efficaces pour des circuits avec une architecture classique. Cependant, ils souffrent de limitations lorsqu’ils sont confrontés à des architectures inhabituelles, dites custom. De plus, ces méthodes de vérification sont généralement effectuées tard dans le flot de conception, rendant toute rectification de dessin coûteuse en termes d’efforts correctifs et de temps. Cette thèse de doctorat propose une méthodologie de vérification ESD systématique et multi-échelle introduite dans un outil appelé ESD IP Explorer qui a été spécifiquement implémenté pour couvrir le flot de conception dans sa globalité et pour adresser des circuits dits custom. Il est composé d’un module de reconnaissance et d’un module de vérification. Le module de reconnaissance identifie tout d’abord et de manière automatisée les structures de protection ESD, embarquées sur silicium dans le circuit intégré pour améliorer leur robustesse ESD, selon un mécanisme de reconnaissance topologique. Le module de vérification convertit ensuite le réseau de protection ESD, formé des structures de protection ESD, en un graphe dirigé. Finalement, une analyse ESD quasi-statique reposant sur des algorithmes génériques issus de la théorie des graphes est effectuée sur la globalité du circuit à vérifier. Des algorithmes d’apprentissage automatique ont été employés pour prédire les comportements quasi-statiques des protections ESD à partir des paramètres d’instance de leurs composants élémentaires sous la forme d’une liste d’interconnexions. L’avantage ici est qu’aucune simulation électrique n’est requise pendant toute la durée d’exécution d’ESD IP Explorer, ce qui simplifie l’architecture de l’outil et accélère l’analyse. Les efforts d’implémentation ont été concentrés sur la compatibilité d’ESD IP Explorer avec le nœud technologique 28nm FD-SOI (pour Fully Depleted Silicon On Insulator). L’outil de vérification développé a été utilisé avec succès pour l’analyse d’un circuit incorporant des parties numériques et à signaux mixtes et comprenant plus de 1,5 milliard de transistors en seulement quelques heures. Des circuits custom qui n’ont pas pu être vérifiés au moyen d’outils de vérification traditionnels du fait de problèmes d’incompatibilité ont également pu être soumis à analyse grâce à ESD IP Explorer. / As Integrated Circuits (ICs) become more complex and susceptible to ElectroStatic Discharges (ESD), the ability to reliably verify the presence of ESD design weaknesses over a multi-billion transistor chip prior to the tape-out is a major topic in the semiconductor industry. Commercial tools dedicated to Electronic Design Automation (EDA) and related verification flows are in charge of providing checks that have been proven to be efficient for circuits with a mainstream architecture. However, they suffer limitations when confronted with custom designs. Moreover, these verification methods are often run late in the design flow, making any design re-spin costly in terms of corrective efforts and time. This Ph. D. thesis proposes a systematic and scalable ESD verification methodology embodied in a tool called ESD IP Explorer that has been specifically implemented to cover the entire design flow and to comply with custom circuit architectures. It is composed of a recognition module and a verification module. The recognition module first automatically identifies ESD protection structures, embedded in integrated circuits to enhance their ESD hardness, according to a topology-aware recognition mechanism. The verification module then converts the ESD protection network that is formed by ESD protection structures into a directed graph. There, technology-independent and graph-based verification mechanisms perform a chip-scale quasistatic ESD analysis. Machine learning algorithms have been used in order to infer the quasistatic behavior of ESD IPs from the netlist instance parameters of their primary devices. This approach has the advantage that no simulation is required during the execution of ESD IP Explorer, which makes the tool architecture simpler and improves execution times. Implementation efforts pertained to the compliance of ESD IP Explorer with the 28nm Fully Depleted Silicon On Insulator (FD-SOI) technology node. The developed verification tool has been used to successfully analyze a digital and mixed-signal circuit prototype counting more than 1.5 billion transistors in several hours, as well as custom designs that could not be analyzed by means of traditional verification tools due to incompatibility issues.
76

Apprentissage machine efficace : théorie et pratique

Delalleau, Olivier 03 1900 (has links)
Malgré des progrès constants en termes de capacité de calcul, mémoire et quantité de données disponibles, les algorithmes d'apprentissage machine doivent se montrer efficaces dans l'utilisation de ces ressources. La minimisation des coûts est évidemment un facteur important, mais une autre motivation est la recherche de mécanismes d'apprentissage capables de reproduire le comportement d'êtres intelligents. Cette thèse aborde le problème de l'efficacité à travers plusieurs articles traitant d'algorithmes d'apprentissage variés : ce problème est vu non seulement du point de vue de l'efficacité computationnelle (temps de calcul et mémoire utilisés), mais aussi de celui de l'efficacité statistique (nombre d'exemples requis pour accomplir une tâche donnée). Une première contribution apportée par cette thèse est la mise en lumière d'inefficacités statistiques dans des algorithmes existants. Nous montrons ainsi que les arbres de décision généralisent mal pour certains types de tâches (chapitre 3), de même que les algorithmes classiques d'apprentissage semi-supervisé à base de graphe (chapitre 5), chacun étant affecté par une forme particulière de la malédiction de la dimensionalité. Pour une certaine classe de réseaux de neurones, appelés réseaux sommes-produits, nous montrons qu'il peut être exponentiellement moins efficace de représenter certaines fonctions par des réseaux à une seule couche cachée, comparé à des réseaux profonds (chapitre 4). Nos analyses permettent de mieux comprendre certains problèmes intrinsèques liés à ces algorithmes, et d'orienter la recherche dans des directions qui pourraient permettre de les résoudre. Nous identifions également des inefficacités computationnelles dans les algorithmes d'apprentissage semi-supervisé à base de graphe (chapitre 5), et dans l'apprentissage de mélanges de Gaussiennes en présence de valeurs manquantes (chapitre 6). Dans les deux cas, nous proposons de nouveaux algorithmes capables de traiter des ensembles de données significativement plus grands. Les deux derniers chapitres traitent de l'efficacité computationnelle sous un angle différent. Dans le chapitre 7, nous analysons de manière théorique un algorithme existant pour l'apprentissage efficace dans les machines de Boltzmann restreintes (la divergence contrastive), afin de mieux comprendre les raisons qui expliquent le succès de cet algorithme. Finalement, dans le chapitre 8 nous présentons une application de l'apprentissage machine dans le domaine des jeux vidéo, pour laquelle le problème de l'efficacité computationnelle est relié à des considérations d'ingénierie logicielle et matérielle, souvent ignorées en recherche mais ô combien importantes en pratique. / Despite constant progress in terms of available computational power, memory and amount of data, machine learning algorithms need to be efficient in how they use them. Although minimizing cost is an obvious major concern, another motivation is to attempt to design algorithms that can learn as efficiently as intelligent species. This thesis tackles the problem of efficient learning through various papers dealing with a wide range of machine learning algorithms: this topic is seen both from the point of view of computational efficiency (processing power and memory required by the algorithms) and of statistical efficiency (n umber of samples necessary to solve a given learning task).The first contribution of this thesis is in shedding light on various statistical inefficiencies in existing algorithms. Indeed, we show that decision trees do not generalize well on tasks with some particular properties (chapter 3), and that a similar flaw affects typical graph-based semi-supervised learning algorithms (chapter 5). This flaw is a form of curse of dimensionality that is specific to each of these algorithms. For a subclass of neural networks, called sum-product networks, we prove that using networks with a single hidden layer can be exponentially less efficient than when using deep networks (chapter 4). Our analyses help better understand some inherent flaws found in these algorithms, and steer research towards approaches that may potentially overcome them. We also exhibit computational inefficiencies in popular graph-based semi-supervised learning algorithms (chapter 5) as well as in the learning of mixtures of Gaussians with missing data (chapter 6). In both cases we propose new algorithms that make it possible to scale to much larger datasets. The last two chapters also deal with computational efficiency, but in different ways. Chapter 7 presents a new view on the contrastive divergence algorithm (which has been used for efficient training of restricted Boltzmann machines). It provides additional insight on the reasons why this algorithm has been so successful. Finally, in chapter 8 we describe an application of machine learning to video games, where computational efficiency is tied to software and hardware engineering constraints which, although often ignored in research papers, are ubiquitous in practice.
77

Fouille de graphes et classification de graphes : application à l’analyse de plans cadastraux / Graph Mining and Graph Classification : application to cadastral map analysis

Raveaux, Romain 25 November 2010 (has links)
Les travaux présentés dans ce mémoire de thèse abordent sous différents angles très intéressants, un sujet vaste et ambitieux : l’interprétation de plans cadastraux couleurs.Dans ce contexte, notre approche se trouve à la confluence de différentes thématiques de recherche telles que le traitement du signal et des images, la reconnaissance de formes, l’intelligence artificielle et l’ingénierie des connaissances. En effet, si ces domaines scientifiques diffèrent dans leurs fondements, ils sont complémentaires et leurs apports respectifs sont indispensables pour la conception d’un système d’interprétation. Le centre du travail est le traitement automatique de documents cadastraux du 19e siècle. La problématique est traitée dans le cadre d'un projet réunissant des historiens, des géomaticiens et des informaticiens. D'une part nous avons considéré le problème sous un angle systémique, s'intéressant à toutes les étapes de la chaîne de traitements mais aussi avec un souci évident de développer des méthodologies applicables dans d'autres contextes. Les documents cadastraux ont été l'objet de nombreuses études mais nous avons su faire preuve d'une originalité certaine, mettant l'accent sur l'interprétation des documents et basant notre étude sur des modèles à base de graphes. Des propositions de traitements appropriés et de méthodologies ont été formulées. Le souci de comblé le gap sémantique entre l’image et l’interprétation a reçu dans le cas des plans cadastraux étudiés une réponse. / This thesis tackles the problem of technical document interpretationapplied to ancient and colored cadastral maps. This subject is on the crossroadof different fields like signal or image processing, pattern recognition, artificial intelligence,man-machine interaction and knowledge engineering. Indeed, each of thesedifferent fields can contribute to build a reliable and efficient document interpretationdevice. This thesis points out the necessities and importance of dedicatedservices oriented to historical documents and a related project named ALPAGE.Subsequently, the main focus of this work: Content-Based Map Retrieval within anancient collection of color cadastral maps is introduced.
78

Um estudo comparativo de modelos baseados em estatísticas textuais, grafos e aprendizado de máquina para sumarização automática de textos em português

Leite, Daniel Saraiva 21 December 2010 (has links)
Made available in DSpace on 2016-06-02T19:05:48Z (GMT). No. of bitstreams: 1 3512.pdf: 1897835 bytes, checksum: 598f309a846cb201fe8f13be0f2e37da (MD5) Previous issue date: 2010-12-21 / Automatic text summarization has been of great interest in Natural Language Processing due to the need of processing a huge amount of information in short time, which is usually delivered through distinct media. Thus, large-scale methods are of utmost importance for synthesizing and making access to information simpler. They aim at preserving relevant content of the sources with little or no human intervention. Building upon the extractive summarizer SuPor and focusing on texts in Portuguese, this MsC work aimed at exploring varied features for automatic summarization. Computational methods especially driven towards textual statistics, graphs and machine learning have been explored. A meaningful extension of the SuPor system has resulted from applying such methods and new summarization models have thus been delineated. These are based either on each of the three methodologies in isolation, or are hybrid. In this dissertation, they are generically named after the original SuPor as SuPor-2. All of them have been assessed by comparing them with each other or with other, well-known, automatic summarizers for texts in Portuguese. The intrinsic evaluation tasks have been carried out entirely automatically, aiming at the informativeness of the outputs, i.e., the automatic extracts. They have also been compared with other well-known automatic summarizers for Portuguese. SuPor-2 results show a meaningful improvement of some SuPor-2 variations. The most promising models may thus be made available in the future, for generic use. They may also be embedded as tools for varied Natural Language Processing purposes. They may even be useful for other related tasks, such as linguistic studies. Portability to other languages is possible by replacing the resources that are language-dependent, namely, lexicons, part-of-speech taggers and stop words lists. Models that are supervised have been so far trained on news corpora. In spite of that, training for other genres may be carried out by interested users using the very same interfaces supplied by the systems. / A tarefa de Sumarização Automática de textos tem sido de grande importância dentro da área de Processamento de Linguagem Natural devido à necessidade de se processar gigantescos volumes de informação disponibilizados nos diversos meios de comunicação. Assim, mecanismos em larga escala para sintetizar e facilitar o acesso a essas informações são de extrema importância. Esses mecanismos visam à preservação do conteúdo mais relevante e com pouca ou nenhuma intervenção humana. Partindo do sumarizador extrativo SuPor e contemplando o Português, este trabalho de mestrado visou explorar variadas características de sumarização pela utilização de métodos computacionais baseados em estatísticas textuais, grafos e aprendizado de máquina. Esta exploração consistiu de uma extensão significativa do SuPor, pela definição de novos modelos baseados nessas três abordagens de forma individual ou híbrida. Por serem originários desse sistema, manteve-se a relação com seu nome, o que resultou na denominação genérica SuPor-2. Os diversos modelos propostos foram, então, comparados entre si em diversos experimentos, avaliando-se intrínseca e automaticamente a informatividade dos extratos produzidos. Foram realizadas também comparações com outros sistemas conhecidos para o Português. Os resultados obtidos evidenciam uma melhora expressiva de algumas variações do SuPor-2 em relação aos demais sumarizadores extrativos existentes para o Português. Os sistemas que se evidenciaram superiores podem ser disponibilizados no futuro para utilização geral por usuários comuns ou ainda para utilização como ferramentas em outras tarefas do Processamento de Língua Natural ou em áreas relacionadas. A portabilidade para outras línguas é possível com a substituição dos recursos dependentes de língua, como léxico, etiquetadores morfossintáticos e stoplist Os modelos supervisionados foram treinados com textos jornalísticos até o momento. O treino para outros gêneros pode ser feito pelos usuários interessados através dos próprios sistemas desenvolvidos
79

Agrupamento de sequências de miRNA utilizando aprendizado não-supervisionado baseado em grafos

Kasahara, Viviani Akemi 12 August 2016 (has links)
Submitted by Izabel Franco (izabel-franco@ufscar.br) on 2016-10-11T17:36:54Z No. of bitstreams: 1 DissVAK.pdf: 4608619 bytes, checksum: 3022034b9035e4e8caf1195902d24581 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-21T13:03:21Z (GMT) No. of bitstreams: 1 DissVAK.pdf: 4608619 bytes, checksum: 3022034b9035e4e8caf1195902d24581 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-21T13:03:27Z (GMT) No. of bitstreams: 1 DissVAK.pdf: 4608619 bytes, checksum: 3022034b9035e4e8caf1195902d24581 (MD5) / Made available in DSpace on 2016-10-21T13:03:34Z (GMT). No. of bitstreams: 1 DissVAK.pdf: 4608619 bytes, checksum: 3022034b9035e4e8caf1195902d24581 (MD5) Previous issue date: 2016-08-12 / Não recebi financiamento / Cluster analysis is the organization of a collection of patterns into clusters based on similarity which is determined by using properties of data. Clustering techniques can be useful in a variety of knowledge domains such as biotechnology, computer vision, document retrieval and many others. An interesting area of biology involves the concept of microRNAs (miRNAs) that are approximately 22 nucleotide-long non-coding RNA molecules that play important roles in gene regulation. Clustering miRNA sequences can help to understand and explore sequences belonging to the same cluster that has similar biological functions. This research work investigates and explores seven unsupervised clustering algorithms based on graphs that can be divided into three categories: algorithm based on region of influence, algorithm based on minimum spanning tree and spectral algorithm. To assess the contribution of the proposed algorithms, data from miRNA families stored in the online miRBase database were used in the conducted experiments. The results of these experiments were presented, analysed and evaluated using clustering validation indexes as well as visual analysis. / A análise de agrupamento é uma organização de coleção de padrões em grupos, baseando-se na similaridade das propriedades pertencentes aos dados. A técnica de agrupamento pode ser utilizado em muitas áreas de conhecimento como biotecnologia, visão computacional, recuperação de documentos, entre outras. Uma área interessante da biologia envolve o conceito de microRNAs (miRNAs), que são moléculas não-codificadas de RNA com aproximadamente 22 nucleotídeos e que desempenham um papel importante na regulação dos genes. O agrupamento de sequências de miRNA podem ajudar em sua exploração e entendimento, pois as sequências que pertencem ao mesmo grupo possuem uma função biológica similar. Esse trabalho explora e investiga sete algoritmos de agrupamentos não-supervisionados baseados em grafos que podem ser divididos em três categorias: algoritmos baseados em região de influência, algoritmos baseados em árvore spanning minimal e algoritmo espectral. Para avaliar a contribuição dos algoritmos propostos, os experimentos conduzidos utilizaram os dados das famílias de miRNAs disponíveis no banco de dados denominado miRBase. Os resultados dos experimentos foram apresentados, analisados e avaliados usando índices de validação de agrupamento e análise visual.
80

A Framework for Secure Structural Adaptation

Saman Nariman, Goran January 2018 (has links)
A (self-) adaptive system is a system that can dynamically adapt its behavior or structure during execution to "adapt" to changes to its environment or the system itself. From a security standpoint, there has been some research pertaining to (self-) adaptive systems in general but not enough care has been shown towards the adaptation itself. Security of systems can be reasoned about using threat models to discover security issues in the system. Essentially that entails abstracting away details not relevant to the security of the system in order to focus on the important aspects related to security. Threat models often enable us to reason about the security of a system quantitatively using security metrics. The structural adaptation process of a (self-) adaptive system occurs based on a reconfiguration plan, a set of steps to follow from the initial state (configuration) to the final state. Usually, the reconfiguration plan consists of multiple strategies for the structural adaptation process and each strategy consists of several steps steps with each step representing a specific configuration of the (self-) adaptive system. Different reconfiguration strategies have different security levels as each strategy consists of a different sequence configuration with different security levels. To the best of our knowledge, there exist no approaches which aim to guide the reconfiguration process in order to select the most secure available reconfiguration strategy, and the explicit security of the issues associated with the structural reconfiguration process itself has not been studied. In this work, based on an in-depth literature survey, we aim to propose several metrics to measure the security of configurations, reconfiguration strategies and reconfiguration plans based on graph-based threat models. Additionally, we have implemented a prototype to demonstrate our approach and automate the process. Finally, we have evaluated our approach based on a case study of our making. The preliminary results tend to expose certain security issues during the structural adaptation process and exhibit the effectiveness of our proposed metrics.

Page generated in 0.2504 seconds