Global ETD Search

31	Aprendizado de estruturas de dependência entre fenótipos da síndrome metabólica em estudos genômicos / Structure learning of the metabolic syndrome phenotypes network in family genomic studies Lilian Skilnik Wilk 26 June 2017 (has links) Introdução: O número de estudos relacionados à Síndrome Metabólica (SM) vem aumentando nos últimos anos, muitas vezes motivados pelo aumento do número de casos de sobrepeso/obesidade e diabetes Tipo II levando ao desenvolvimento de doenças cardiovasculares e, como consequência, infarto agudo do miocárdio e AVC, dentre outros desfechos desfavoráveis. A SM é uma doença multifatorial composta de cinco características, porém, para que um indivíduo seja diagnosticado com ela, possuir pelo menos três dessas características torna-se condição suficiente. Essas cinco características são: Obesidade visceral, caracterizada pelo aumento da circunferência da cintura, Glicemia de jejum elevada, Triglicérides aumentado, HDL-colesterol reduzido, Pressão Arterial aumentada. Objetivo: Estabelecer a rede de associações entre os fenótipos que compõem a Síndrome Metabólica através do aprendizado de estruturas de dependência, decompor a rede em componentes de correlação genética e ambiental e avaliar o efeito de ajustes por covariáveis e por variantes genéticas exclusivamente relacionadas à cada um dos fenótipos da rede. Material e Métodos: A amostra do estudo corresponderá a 79 famílias da cidade mineira de Baependi, composta por 1666 indivíduos. O aprendizado de estruturas de redes será feito por meio da Teoria de Grafos e Modelos de Equações Estruturais envolvendo o modelo linear misto poligênico para determinar as relações de dependência entre os fenótipos que compõem a Síndrome Metabólica / Introduction: The number of studies related to Metabolic Syndrome (MetS) has been increasing in the last years, encouraged by the increase on the overweight / obesity and Type II Diabetes cases, leading to the development of cardiovascular disease and, therefore, acute myocardial infarction and stroke, and others unfavorable outcomes. MetS is a multifactorial disease containing five characteristics, however, for an individual to be diagnosed with MetS, he/she may have at least three of them. These characteristics are: Truncal Obesity, characterized by increasing on the waist circumference, increasing on Fasting Blood Glucose, increasing on Triglycerides, decreasing on HDL cholesterol and increasing on Blood Pressure. Aims: Establish the best association network between MetS phenotypes through structured dependency learning between phenotypes considering genetic variants exclusively related to each phenotype. Materials and Methods: The study sample is composed of 79 families, 1666 individuals of a city in a rural area of Brazil, called Beapendi. Structured learning will use graph theory and Structural Equations Models to establish the dependency relations between MetS phenotypes Aprendizado de Estruturas Dados de Famílias Grafos Acíclicos Direcionados Grafos Não Direcionados Propriedades de Markov Síndrome Metabólica SNPs Acyclic Directed Graphs Family Data Markov Properties Metabolic Syndrome SNPs Structural Equation Models Structured Learning Undirected Graphs
32	Inférence de réseaux de régulation orientés pour les facteurs de transcription d'Arabidopsis thaliana et création de groupes de co-régulation / Inference of directed regulatory networks on the transcription factors of Arabidopsis thaliana and setting up of co-regulation groups Vasseur, Yann 08 December 2017 (has links) Dans cette thèse, nous cherchons à caractériser les facteurs de transcription de la plante Arabidopsis thaliana, gènes importants pour la régulation de l'expression du génome. À l'aide de données d'expression, notre objectif biologique est de classer ces facteurs de transcription en groupes de gènes co-régulateurs et en groupes de gènes co-régulés. Nous procédons en deux phases pour y parvenir. La première phase consiste à construire un réseau de régulation entre les facteurs de transcription. La seconde phase consiste en la classification des facteurs de transcription selon les liens de régulation établis par ce réseau. D'un point de vue statistique, les facteurs de transcription sont les variables et les données d'expression sont les observations. Nous représentons le réseau à inférer par un graphe orienté dont les nœuds sont les variables. L'estimation de ses arêtes est vue comme un problème de sélection de variables en grande dimension avec un faible nombre d'unités statistiques. Nous traitons ce problème à l'aide de régressions linéaires pénalisées de type LASSO. Une approche préliminaire qui consiste à sélectionner un ensemble de variables du chemin de régularisation par le biais de critères de vraisemblance pénalisée s'avère être instable et fournit trop de variables explicatives. Pour contrecarrer cela, nous proposons et mettons en compétition deux procédures de sélection, adaptées au problème de la haute dimension et mêlant régression linéaire pénalisée et rééchantillonnage. L'estimation des différents paramètres de ces procédures a été effectuée dans le but d'obtenir des ensembles de variables stables. Nous évaluons la stabilité des résultats à l'aide de jeux de données simulés selon notre modèle graphique. Nous faisons appel ensuite à une méthode de classification non supervisée sur chacun des graphes orientés obtenus pour former des groupes de nœuds vus comme contrôleurs et des groupes de nœuds vus comme contrôlés. Pour évaluer la proximité entre les classifications doubles des nœuds obtenus sur différents graphes, nous avons développé un indice de comparaison de couples de partition dont nous éprouvons et promouvons la pertinence. D'un point de vue pratique, nous proposons une méthode de simulation en cascade, exigée par la complexité de notre modèle et inspirée du bootstrap paramétrique, pour simuler des jeux de données en accord avec notre modèle. Nous avons validé notre modèle en évaluant la proximité des classifications obtenues par application de la procédure statistique sur les données réelles et sur ces données simulées. / This thesis deals with the characterisation of key genes in gene expression regulation, called transcription factors, in the plant Arabidopsis thaliana. Using expression data, our biological goal is to cluster transcription factors in groups of co-regulator transcription factors, and in groups of co-regulated transcription factors. To do so, we propose a two-step procedure. First, we infer the network of regulation between transcription factors. Second, we cluster transcription factors based on their connexion patterns to other transcriptions factors.From a statistical point of view, the transcription factors are the variables and the samples are the observations. The regulatory network between the transcription factors is modelled using a directed graph, where variables are nodes. The estimation of the nodes can be interpreted as a problem of variables selection. To infer the network, we perform LASSO type penalised linear regression. A preliminary approach selects a set of variable along the regularisation path using penalised likelihood criterion. However, this approach is unstable and leads to select too many variables. To overcome this difficulty, we propose to put in competition two selection procedures, designed to deal with high dimension data and mixing linear penalised regression and subsampling. Parameters estimation of the two procedures are designed to lead to select stable set of variables. Stability of results is evaluated on simulated data under a graphical model. Subsequently, we use an unsupervised clustering method on each inferred oriented graph to detect groups of co-regulators and groups of co-regulated. To evaluate the proximity between the two classifications, we have developed an index of comparaison of pairs of partitions whose relevance is tested and promoted. From a practical point of view, we propose a cascade simulation method required to respect the model complexity and inspired from parametric bootstrap, to simulate data under our model. We have validated our model by inspecting the proximity between the two classifications on simulated and real data. Grande dimension Réseaux de gènes Sélection de modèles Régression pénalisée Classification de graphes orientés High dimension Gene networks Model selection Penalized regression Directed graphs clustering Comparison index for pairs of partitions
33	Reducing software complexity by hidden structure analysis : Methods to improve modularity and decrease ambiguity of a software system Bjuhr, Oscar, Segeljakt, Klas January 2016 (has links) Software systems can be represented as directed graphs where components are nodes and dependencies between components are edges. Improvement in system complexity and reduction of interference between development teams can be achieved by applying hidden structure analysis. However, since systems can contain thousands of dependencies, a concrete method for selecting which dependencies that are most beneficial to remove is needed. In this thesis two solutions to this problem are introduced; dominator- and cluster analysis. Dominator analysis examines the cost/gain ratio of detaching individual components from a cyclic group. Cluster analysis finds the most beneficial subgroups to split in a cyclic group. The aim of the methods is to reduce the size of cyclic groups, which are sets of co- dependent components. As a result, the system architecture will be less prone to propagating errors, caused by modifications of components. Both techniques derive from graph theory and data science but have not been applied to the area of hidden structures before. A subsystem at Ericsson is used as a testing environment. Specific dependencies in the structure which might impede the development process have been discovered. The outcome of the thesis is four to-be scenarios of the system, displaying the effect of removing these dependencies. The to-be scenarios show that the architecture can be significantly improved by removing few direct dependencies. / Mjukvarusystem kan representeras som riktade grafer där komponenter är noder och beroenden mellan komponenter är kanter. Förbättrad systemkomplexitet och minskad mängd störningar mellan utvecklingsteam kan åstadkommas genom att applicera teorin om gömda beroende. Eftersom system kan innehålla tusentals beroenden behövs en konkret metod för att hitta beroenden i systemet som är fördelaktiga att ta bort. I den här avhandlingen presenteras två lösningar till problemet; dominator- och klusteranalys. Dominatoranalys undersöker kostnad/vinst ration av att ta bort individuella komponenter i systemet från en cyklisk grupp. Klusteranalys hittar de mest lönsamma delgrupperna att klyva isär i en cyklisk grupp. Metodernas mål är att minska storleken på cykliska grupper. Cykliska grupper är uppsättningar av komponenter som är beroende av varandra. Som resultat blir systemarkitekturen mindre benägen till propagering av fel, orsakade av modifiering av komponenter. Båda metoderna härstammar från grafteori och datavetenskap men har inte applicerats på området kring gömda strukturer tidigare. Ett subsystem på Ericsson användes som testmiljö. Specifika beroenden i strukturen som kan vara hämmande för utvecklingsprocessen har identifierats. Resultatet av avhandlingen är fyra potentiella framtidsscenarion av systemet som visualiserar effekten av att ta bort de funna beroendena. Framtidsscenariona visar att arkitekturen kan förbättras markant genom att avlägsna ett fåtal direkta beroenden. DSM VSM Hidden Structure Analysis Directed Graphs Graph Theory Data Science DSM VSM Analys av Gömda Strukturer Riktade Grafer Grafteori Datavetenskap Elektroteknik och elektronik
34	Spatial analysis of invasive alien plant distribution patterns and processes using Bayesian network-based data mining techniques Dlamini, Wisdom Mdumiseni Dabulizwe 03 1900 (has links) Invasive alien plants have widespread ecological and socioeconomic impacts throughout many parts of the world, including Swaziland where the government declared them a national disaster. Control of these species requires knowledge on the invasion ecology of each species including how they interact with the invaded environment. Species distribution models are vital for providing solutions to such problems including the prediction of their niche and distribution. Various modelling approaches are used for species distribution modelling albeit with limitations resulting from statistical assumptions, implementation and interpretation of outputs. This study explores the usefulness of Bayesian networks (BNs) due their ability to model stochastic, nonlinear inter-causal relationships and uncertainty. Data-driven BNs were used to explore patterns and processes influencing the spatial distribution of 16 priority invasive alien plants in Swaziland. Various BN structure learning algorithms were applied within the Weka software to build models from a set of 170 variables incorporating climatic, anthropogenic, topo-edaphic and landscape factors. While all the BN models produced accurate predictions of alien plant invasion, the globally scored networks, particularly the hill climbing algorithms, performed relatively well. However, when considering the probabilistic outputs, the constraint-based Inferred Causation algorithm which attempts to generate a causal BN structure, performed relatively better. The learned BNs reveal that the main pathways of alien plants into new areas are ruderal areas such as road verges and riverbanks whilst humans and human activity are key driving factors and the main dispersal mechanism. However, the distribution of most of the species is constrained by climate particularly tolerance to very low temperatures and precipitation seasonality. Biotic interactions and/or associations among the species are also prevalent. The findings suggest that most of the species will proliferate by extending their range resulting in the whole country being at risk of further invasion. The ability of BNs to express uncertain, rather complex conditional and probabilistic dependencies and to combine multisource data makes them an attractive technique for species distribution modeling, especially as joint invasive species distribution models (JiSDM). Suggestions for further research are provided including the need for rigorous invasive species monitoring, data stewardship and testing more BN learning algorithms. / Environmental Sciences / D. Phil. (Environmental Science) Bayesian network Data mining Directed acyclic graph Ecology Geographic information system Habitat Invasive alien plant Knowledge discovery Machine learning Species distribution model 581.62096887 Spatial analysis (Statistics) Bayesian field theory Acyclic models Data mining -- Data processing Directed graphs
35	Noções de grafos dirigidos, cadeias de Markov e as buscas do Google Oliveira, José Carlos Francisco de 30 August 2014 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / This paper has as its main purpose to highlight some mathematical concepts, which are behind the ranking given by a research made on the website mostly used in the world: Google. At the beginning, we briefly approached some High School’s concepts, such as: Matrices, Linear Systems and Probability. After that, we presented some basic notions related to Directed Graphs and Markov Chains of Discrete Time. From this last one, we gave more emphasis to the Steady State Vector because it ensures foreknowledge results from long-term. These concepts are extremely important to our paper, because they will be used to explain the involvement of Mathematic behind the web search “Google”. Then, we tried to detail the ranking operation of the search pages on Google, i.e., how the results of a research are classified, determining which results are presented in a sequential way in order of relevance. Finally we obtained “PageRank”, an algorithm which creates what we call Google’s Matrices and ranks the pages of a search. We finished making a brief comment about the historical arising of the web searches, from their founders to the rise and hegemony of Google. / O presente trabalho tem como objetivo destacar alguns conceitos matemáticos que estão por trás do ranqueamento dado por uma pesquisa feita no site de busca mais usados do mundo, o “Google”. Inicialmente abordamos de forma breve alguns conteúdos da matemática do ensino médio, a exemplo de: matrizes, sistemas lineares, probabilidades. Em seguida são introduzidas noções básicas de grafos dirigidos e cadeias de Markov de tempo discreto; essa última, é dada uma ênfase ao vetor estado estacionário, por ele garantir resultados de previsão de longo prazo. Esses conceitos são de grande importância em nosso trabalho, pois serão usados para explicar o envolvimento da matemática por trás do site de buscas “Google”. Na sequência, buscamos detalhar o funcionamento do ranqueamento das páginas de uma busca no “Google”, isto é, como são classificados os resultados de uma pesquisa, determinando quais resultados serão apresentados de modo sequencial em ordem de relevância. Finalmente, chegamos na obtenção do “PageRank”, algoritmo que gera a chamada Matriz do Google e ranqueia as páginas de uma busca. Encerramos com um breve histórico do surgimento dos sites de buscas, desde os seus fundadores até a ascensão e hegemonia do Google. Matemática Processos de Markov Matrizes (Matemática) Sistemas lineares Probabilidades Sites da Web Google Ferramentas de busca na Web Grafos dirigidos Passeios aleatórios Cadeias de Markov PageRank Vetor estado estacionário Buscador Google Matrices Linear systems Probability Directed graphs Random walks Markov chains Steady state vector Google search engine CIENCIAS EXATAS E DA TERRA::MATEMATICA
36	[en] EVOCATIVE METHODOLOGY FOR CAUSAL MAPPING AND ITS PERSPECTIVE IN THE OPERATIONS MANAGEMENT WITH INTERNET-BASED APPLICATIONS FOR SUPPLY CHAIN MANAGEMENT AND SERVICE MANAGEMENT / [pt] METODOLOGIA EVOCATIVA PARA MAPEAMENTO CAUSAL E SUA PERSPECTIVA NA GERÊNCIA DE OPERAÇÕES COM APLICAÇÕES VIA INTERNET EM GESTÃO DA CADEIA DE SUPRIMENTO E ADMINISTRAÇÃO DE SERVIÇOS 25 August 2004 (has links) [pt] A compreensão dos atuais processos produtivos é essencial neste momento em que o conhecimento tornou-se um importante gerador de valor. Uma visão holística dos conhecimentos que estão disseminados, de forma dispersa, entre profissionais, consultores e acadêmicos é necessária para a síntese de novas teorias da produção. Pesquisadores de gerência de operações freqüentemente usam mapeamento causal como um mecanismo para construir e comunicar teorias, particularmente em suporte à pesquisa empírica. As abordagens mais usuais para capturar dados cognitivos para um mapa causal são brainstorming e entrevistas, os quais exigem muito tempo e apresentam um significativo custo em sua implementação. Esta tese visa gerar uma metodologia (Metodologia Evocativa para Mapeamento Causal - ECMM) voltada para aplicação em pesquisa sobre gerência de operações para coletar e estruturar dados disseminados de forma desagregada, como conhecimento e experiência profissional e acadêmica, contidos nas opiniões de um grande número de especialistas dispersos demograficamente e geograficamente. Isto é alcançado evocando opiniões, codificando-as em variáveis e reduzindo o grupo em conceitos e relações. Tem-se uma especial preocupação em conseguir este objetivo em tempo factível e com baixo custo. A coleta de dados é assíncrona, via Internet, possui dois ou três turnos (à semelhança do método Delfos). A análise de dados usa codificação, técnica de grupamento hierárquica e escalamento multidimensional para identificar conceitos na forma de mapas cognitivos. A ECMM foi ilustrada com aplicações que demonstram sua viabilidade. Aplicou-se nas áreas de gestão da cadeia de suprimento (SCM) e administração de serviços (SM) com a participação de aproximadamente 1.300 respondentes de empresas e universidades de quase 100 países. Dentre os desdobramentos para pesquisas futuras propõe-se aplicar nas áreas de ECMM em SCM e SM visando a uni-las em um tema: gestão da cadeia de suprimento de serviços. / [en] The understanding of the present productive processes is essential at this moment when knowledge became an important value creator. A holistic vision of the pieces of knowledge that are spread out and dispersed among practitioners, consultants and academics is necessary for the synthesis of new theories of production. Operations management researchers often use causal mapping as a key tool for building and communicating theory, particularly in support of empirical research. The widely accepted approaches for capturing cognitive data for a causal map are informal brainstorming and interviews, which require a time- consuming and significant cost of implementation. This dissertation aims at creating a methodology (Evocative Causal Mapping Methodology - ECMM) intended for use in operations management research for collecting and structuring dispersed data spread out as practical and research knowledge, and experience contained in the opinions of a large number of specialists demographically and geographically scattered. This is accomplished by evoking opinions, encoding them into variables and reducing the resulting set to concepts and relationships. A special concern is to achieve this goal in a feasible time and cost- efficient way. ECMM consists of two or three round, Delphi- like, Internet-based asynchronous data collection, and a data analysis that uses a coding panel of experts, hierarchical cluster analysis and multidimensional scaling for identifying concepts on cognitive map formats. Applications illustrate ECMM and demonstrate its feasibility. They were developed on supply chain management (SCM) and service management (SM) involving about 1,300 respondents of companies and universities of about 100 countries. Among possible unfolding future studies, this dissertation proposes to apply ECMM in SCM and SM aiming at unifying them into a single topic: service supply chain management. [pt] MAPEAMENTO CAUSAL [pt] ADMINISTRACAO DE SERVICOS [pt] GERENCIA DE OPERACOES [pt] PESQUISA VIA INTERNET [pt] METODOS DE PESQUISA [pt] GRAFOS DIRECIONADOS [pt] GRAFOS NAO-DIRECIONADOS [pt] MAPAS COGNITIVOS [en] CAUSAL MAPPING [en] SERVICE MANAGEMENT [en] OPERATIONS MANAGEMENT [en] INTERNET-BASED SURVEY [en] RESEARCH METHODS [en] DIRECTED GRAPHS [en] UNDIRECTED GRAPHS [en] COGNITIVE MAPS
37	Structural Similarity: Applications to Object Recognition and Clustering Curado, Manuel 03 September 2018 (has links) In this thesis, we propose many developments in the context of Structural Similarity. We address both node (local) similarity and graph (global) similarity. Concerning node similarity, we focus on improving the diffusive process leading to compute this similarity (e.g. Commute Times) by means of modifying or rewiring the structure of the graph (Graph Densification), although some advances in Laplacian-based ranking are also included in this document. Graph Densification is a particular case of what we call graph rewiring, i.e. a novel field (similar to image processing) where input graphs are rewired to be better conditioned for the subsequent pattern recognition tasks (e.g. clustering). In the thesis, we contribute with an scalable an effective method driven by Dirichlet processes. We propose both a completely unsupervised and a semi-supervised approach for Dirichlet densification. We also contribute with new random walkers (Return Random Walks) that are useful structural filters as well as asymmetry detectors in directed brain networks used to make early predictions of Alzheimer's disease (AD). Graph similarity is addressed by means of designing structural information channels as a means of measuring the Mutual Information between graphs. To this end, we first embed the graphs by means of Commute Times. Commute times embeddings have good properties for Delaunay triangulations (the typical representation for Graph Matching in computer vision). This means that these embeddings can act as encoders in the channel as well as decoders (since they are invertible). Consequently, structural noise can be modelled by the deformation introduced in one of the manifolds to fit the other one. This methodology leads to a very high discriminative similarity measure, since the Mutual Information is measured on the manifolds (vectorial domain) through copulas and bypass entropy estimators. This is consistent with the methodology of decoupling the measurement of graph similarity in two steps: a) linearizing the Quadratic Assignment Problem (QAP) by means of the embedding trick, and b) measuring similarity in vector spaces. The QAP problem is also investigated in this thesis. More precisely, we analyze the behaviour of $m$-best Graph Matching methods. These methods usually start by a couple of best solutions and then expand locally the search space by excluding previous clamped variables. The next variable to clamp is usually selected randomly, but we show that this reduces the performance when structural noise arises (outliers). Alternatively, we propose several heuristics for spanning the search space and evaluate all of them, showing that they are usually better than random selection. These heuristics are particularly interesting because they exploit the structure of the affinity matrix. Efficiency is improved as well. Concerning the application domains explored in this thesis we focus on object recognition (graph similarity), clustering (rewiring), compression/decompression of graphs (links with Extremal Graph Theory), 3D shape simplification (sparsification) and early prediction of AD. / Ministerio de Economía, Industria y Competitividad (Referencia TIN2012-32839 BES-2013-064482) Graph densification Cut similarity Spectral clustering Dirichlet problems Random walkers Commute Times Graph algorithms Regular Partition Szemeredi Alzheimer's disease Graphs Return Random Walk Net4lap Directed graphs Spectral graph theory Graph entropy Mutual information Manifold alignment m-Best Graph Matching Binary-Tree Partitions QAP Graph sparsification Shape simplification Alpha shapes

Page generated in 0.041 seconds