Global ETD Search

91	Reverse Engineering of Temporal Gene Expression Data Using Dynamic Bayesian Networks And Evolutionary Search Salehi, Maryam 17 September 2008 (has links) Capturing the mechanism of gene regulation in a living cell is essential to predict the behavior of cell in response to intercellular or extra cellular factors. Such prediction capability can potentially lead to development of improved diagnostic tests and therapeutics [21]. Amongst reverse engineering approaches that aim to model gene regulation are Dynamic Bayesian Networks (DBNs). DBNs are of particular interest as these models are capable of discovering the causal relationships between genes while dealing with noisy gene expression data. At the same time, the problem of discovering the optimum DBN model, makes structure learning of DBN a challenging topic. This is mainly due to the high dimensionality of the search space of gene expression data that makes exhaustive search strategies for identifying the best DBN structure, not practical. In this work, for the first time the application of a covariance-based evolutionary search algorithm is proposed for structure learning of DBNs. In addition, the convergence time of the proposed algorithm is improved compared to the previously reported covariance-based evolutionary search approaches. This is achieved by keeping a fixed number of good sample solutions from previous iterations. Finally, the proposed approach, M-CMA-ES, unlike gradient-based methods has a high probability to converge to a global optimum. To assess how efficient this approach works, a temporal synthetic dataset is developed. The proposed approach is then applied to this dataset as well as Brainsim dataset, a well known simulated temporal gene expression data [58]. The results indicate that the proposed method is quite efficient in reconstructing the networks in both the synthetic and Brainsim datasets. Furthermore, it outperforms other algorithms in terms of both the predicted structure accuracy and the mean square error of the reconstructed time series of gene expression data. For validation purposes, the proposed approach is also applied to a biological dataset composed of 14 cell-cycle regulated genes in yeast Saccharomyces Cerevisiae. Considering the KEGG1 pathway as the target network, the efficiency of the proposed reverse engineering approach significantly improves on the results of two previous studies of yeast cell cycle data in terms of capturing the correct interactions. / Thesis (Master, Computing) -- Queen's University, 2008-09-09 11:35:33.312 Dynamic Bayesian Networks Gene Expression Analysis
92	Modelling soil bulk density using data-mining and expert knowledge Taalab, Khaled Paul January 2013 (has links) Data about the spatial variation of soil attributes is required to address a great number of environmental issues, such as improving water quality, flood mitigation, and determining the effects of the terrestrial carbon cycle. The need for a continuum of soils data is problematic, as it is only possible to observe soil attributes at a limited number of locations, beyond which, prediction is required. There is, however, disparity between the way in which much of the existing information about soil is recorded and the format in which the data is required. There are two primary methods of representing the variation in soil properties, as a set of distinct classes or as a continuum. The former is how the variation in soils has been recorded historically by the soil survey, whereas the latter is how soils data is typically required. One solution to this issue is to use a soil-landscape modelling approach which relates the soil to the wider landscape (including topography, land-use, geology and climatic conditions) using a statistical model. In this study, the soil-landscape modelling approach has been applied to the prediction of soil bulk density (Db). The original contribution to knowledge of the study is demonstrating that producing a continuous surface of Db using a soil-landscape modelling approach is that a viable alternative to the ‘classification’ approach which is most frequently used. The benefit of this method is shown in relation to the prediction of soil carbon stocks, which can be predicted more accurately and with less uncertainty. The second part of this study concerns the inclusion of expert knowledge within the soil-landscape modelling approach. The statistical modelling approaches used to predict Db are data driven, hence it is difficult to interpret the processes which the model represents. In this study, expert knowledge is used to predict Db within a Bayesian network modelling framework, which structures knowledge in terms of probability. This approach creates models which can be more easily interpreted and consequently facilitate knowledge discovery, it also provides a method for expert knowledge to be used as a proxy for empirical data. The contribution to knowledge of this section of the study is twofold, firstly, that Bayesian networks can be used as tools for data-mining to predict a continuous soil attribute such as Db and that in lieu of data, expert knowledge can be used to accurately predict landscape-scale trends in the variation of Db using a Bayesian modelling approach. 631.4
93	Polytopes Arising from Binary Multi-way Contingency Tables and Characteristic Imsets for Bayesian Networks Xi, Jing 01 January 2013 (has links) The main theme of this dissertation is the study of polytopes arising from binary multi-way contingency tables and characteristic imsets for Bayesian networks. Firstly, we study on three-way tables whose entries are independent Bernoulli ran- dom variables with canonical parameters under no three-way interaction generalized linear models. Here, we use the sequential importance sampling (SIS) method with the conditional Poisson (CP) distribution to sample binary three-way tables with the sufficient statistics, i.e., all two-way marginal sums, fixed. Compared with Monte Carlo Markov Chain (MCMC) approach with a Markov basis (MB), SIS procedure has the advantage that it does not require expensive or prohibitive pre-computations. Note that this problem can also be considered as estimating the number of lattice points inside the polytope defined by the zero-one and two-way marginal constraints. The theorems in Chapter 2 give the parameters for the CP distribution on each column when it is sampled. In this chapter, we also present the algorithms, the simulation results, and the results for Samson’s monks data. Bayesian networks, a part of the family of probabilistic graphical models, are widely applied in many areas and much work has been done in model selections for Bayesian networks. The second part of this dissertation investigates the problem of finding the optimal graph by using characteristic imsets, where characteristic imsets are defined as 0-1 vector representations of Bayesian networks which are unique up to Markov equivalence. Characteristic imset polytopes are defined as the convex hull of all characteristic imsets we consider. It was proven that the problem of finding optimal Bayesian network for a specific dataset can be converted to a linear programming problem over the characteristic imset polytope [51]. In Chapter 3, we first consider characteristic imset polytopes for all diagnosis models and show that these polytopes are direct product of simplices. Then we give the combinatorial description of all edges and all facets of these polytopes. At the end of this chapter, we generalize these results to the characteristic imset polytopes for all Bayesian networks with a fixed underlying ordering of nodes. Chapter 4 includes discussion and future work on these two topics. Sequential importance sampling Conditional Poisson Counting prob- lem Learning Bayesian networks Characteristic imset polytope Statistics and Probability
94	Automatic Driver Fatigue Monitoring Using Hidden Markov Models and Bayesian Networks Rashwan, Abdullah 11 December 2013 (has links) The automotive industry is growing bigger each year. The central concern for any automotive company is driver and passenger safety. Many automotive companies have developed driver assistance systems, to help the driver and to ensure driver safety. These systems include adaptive cruise control, lane departure warning, lane change assistance, collision avoidance, night vision, automatic parking, traffic sign recognition, and driver fatigue detection. In this thesis, we aim to build a driver fatigue detection system that advances the research in this area. Using vision in detecting driver fatigue is commonly the key part for driver fatigue detection systems. We have decided to investigate different direction. We examine the driver's voice, heart rate, and driving performance to assess fatigue level. The system consists of three main modules: the audio module, the heart rate and other signals module, and the Bayesian network module. The audio module analyzes an audio recording of a driver and tries to estimate the level of fatigue for the driver. A Voice Activity Detection (VAD) module is used to extract driver speech from the audio recording. Mel-Frequency Cepstrum Coefficients, (MFCC) features are extracted from the speech signal, and then Support Vector Machines (SVM) and Hidden Markov Models (HMM) classifiers are used to detect driver fatigue. Both classifiers are tuned for best performance, and the performance of both classifiers is reported and compared. The heart rate and other signals module uses heart rate, steering wheel position, and the positions of the accelerator, brake, and clutch pedals to detect the level of fatigue. These signals' sample rates are then adjusted to match, allowing simple features to be extracted from the signals, and SVM and HMM classifiers are used to detect fatigue level. The performance of both classifiers is reported and compared. Bayesian networks' abilities to capture dependencies and uncertainty make them a sound choice to perform the data fusion. Prior information (Day/Night driving and previous decision) is also incorporated into the network to improve the final decision. The accuracies of the audio and heart rate and other signals modules are used to calculate certain CPTs for the Bayesian network, while the rest of the CPTs are calculated subjectively. The inference queries are calculated using the variable elimination algorithm. For those time steps where the audio module decision is absent, a window is defined and the last decision within this window is used as a current decision. The performance of the system is assessed based on the average accuracy per second. A dataset was built to train and test the system. The experimental results show that the system is very promising. The performance of the system was assessed based on the average accuracy per second; the total accuracy of the system is 90.5%. The system design can be easily improved by easily integrating more modules into the Bayesian network. Driver fatigue detection Hidden Markov Models Support Vector Machines Bayesian networks
95	Automated recognition of handwritten mathematics MacLean, Scott January 2014 (has links) Most software programs that deal with mathematical objects require input expressions to be linearized using somewhat awkward and unfamiliar string-based syntax. It is natural to desire a method for inputting mathematics using the same two-dimensional syntax employed with pen and paper, and the increasing prevalence of pen- and touch-based interfaces causes this topic to be of practical as well as theoretical interest. Accurately recognizing two-dimensional mathematical notation is a difficult problem that requires not only theoretical advancement over the traditional theories of string-based languages, but also careful consideration of runtime efficiency, data organization, and other practical concerns that arise during system construction. This thesis describes the math recognizer used in the MathBrush pen-math system. At a high level, the two-dimensional syntax of mathematical writing is formalized using a relational grammar. Rather than reporting a single recognition result, all recognizable interpretations of the input are simultaneously represented in a data structure called a parse forest. Individual interpretations may be extracted from the forest and reported one by one as the user requests them. These parsing techniques necessitate robust tree scoring functions, which themselves rely on several lower-level recognition processes for stroke grouping, symbol recognition, and spatial relation classification. The thesis covers the recognition, parsing, and scoring aspects of the MathBrush recognizer, as well as the algorithms and assumptions necessary to combine those systems and formalisms together into a useful and efficient software system. The effectiveness of the resulting system is measured through two accuracy evaluations. One evaluation uses a novel metric based on user effort, while the other replicates the evaluation process of an international accuracy competition. The evaluations show that not only is the performance of the MathBrush recognizer improving over time, but it is also significantly more accurate than other academic recognition systems. pattern recognition handwriting recognition grammars classifier combination bayesian networks 2-d parsing
96	On the On-line Tools for Treatment of Deterioration in Industrial Processes Karlsson, Christer January 2008 (has links) For industrial processes high availability and efficiency are important goals in plant operation. This thesis presents studies and development of tools for on-line treatment of process deterioration and model and sensor errors in order to achieve these goals. Deterioration of measurement devices, process components and process models has caused economical losses, plant failure and human losses. The development of on-line methods to prevent such losses is of special interest and has been conducted at The Department of Energy Technology, Mälardalen University. Important technological obstacles to implementing automatic on-line methods have been identified, such as data selection for adaptation and adaptation of data-driven models to new states. A new method has been developed for decision support by combining artificial intelligence methods and heat and mass balance models, and concepts are proposed for decision support in order to detect developing faults and to conduct appropriate maintenance actions. The methods have been implemented in simulation environment and evaluated on real process data when available. The results can be sumarised as successful development of a decision support method on a steam turbine by combining artificial neural networks and Bayesian networks, and identification of important obstacles for automation of methods for adaptation of heat and mass balance process models and data-driven models when they are subject to deterioration. Deterioration Industrial Processes Sensors Process models Neural networks Bayesian networks Data reconciliation Steam turbines Boilers
97	Bayesian Networks and Geographical Information Systems for Environmental Risk Assessment for Oil and Gas Site Development Varela Gonzalez, Patricia Ysolda 03 October 2013 (has links) The objective of this work is to develop a Bayesian Network (BN) model to produce environmental risk maps for oil and gas site developments and to demonstrate the model’s scalability from a point to a collection of points. To reach this objective, a benchmark BN model was formulated as a “proof of concept” using Aquifers, Ecoregions and Land Use / Land Cover maps as local and independent input variables. This model was then used to evaluate the probabilistic geographical distribution of the Environmental Sensibility of Oil and Gas (O&G) developments for a given study area. A Risk index associated with the development of O&G operation activities based on the spatial environmental sensibility was also mapped. To facilitate the Risk assessment, these input variables (maps) were discretized into three hazard levels: high, moderate and low. A Geographical Information System (GIS) platform was used (ESRI ArcMap 10), to gather, modify and display the data for the analysis. Once the variables were defined and the hazard data was included on feature classes (layer shapefile format), Python 2.6 software was used as the computational platform to calculate the probabilistic state of all the Bayesian Network’s variables. This allowed to define Risk scenarios both on prognostic and diagnostic analysis and to measure the impact of changes or interventions in terms of uncertainty. The resulting Python – ESRI ArcMap computational script was called “BN+GIS, which populated maps describing the spatial variability of the states of the Environmental Sensibility and of the corresponding Risk index. The latter in particular, represents a tool for decision makers to choose the most suitable location for placing a drilling rig, since it integrates three fundamental environmental variables. Also, results show that is possible to back propagate the information from the Environmental Sensibility to define the inherent triggering scenarios (hazard variables). A case of study is presented to illustrate the applicability of the proposed methodology on a specific geographical setting. The Barnett Shale was chosen as a benchmark study area because sufficient information on this region was available, and the importance that it holds on the latest developments of unconventional plays in the country. The main contribution of this work relies in combining Bayesian Networks and GIS to define environmental Risk scenarios that can facilitate decision-making for O&G stakeholders such as land owners, industry operators, regulators and Non-Governmental Organizations (NGOs), before and during the development of a given site. Bayesian Networks Geographical Information Systems Environmental Sensibility O&G development Risk Assessment
98	Bayesian networks for high-dimensional data with complex mean structure. Kasza, Jessica Eleonore January 2010 (has links) In a microarray experiment, it is expected that there will be correlations between the expression levels of different genes under study. These correlation structures are of great interest from both biological and statistical points of view. From a biological perspective, the identification of correlation structures can lead to an understanding of genetic pathways involving several genes, while the statistical interest, and the emphasis of this thesis, lies in the development of statistical methods to identify such structures. However, the data arising from microarray studies is typically very high-dimensional, with an order of magnitude more genes being considered than there are samples of each gene. This leads to difficulties in the estimation of the dependence structure of all genes under study. Graphical models and Bayesian networks are often used in these situations, providing flexible frameworks in which dependence structures for high-dimensional data sets can be considered. The current methods for the estimation of dependence structures for high-dimensional data sets typically assume the presence of independent and identically distributed samples of gene expression values. However, often the data available will have a complex mean structure and additional components of variance. Given such data, the application of methods that assume independent and identically distributed samples may result in incorrect biological conclusions being drawn. In this thesis, methods for the estimation of Bayesian networks for gene expression data sets that contain additional complexities are developed and implemented. The focus is on the development of score metrics that take account of these complexities for use in conjunction with score-based methods for the estimation of Bayesian networks, in particular the High-dimensional Bayesian Covariance Selection algorithm. The necessary theory relating to Gaussian graphical models and Bayesian networks is reviewed, as are the methods currently available for the estimation of dependence structures for high-dimensional data sets consisting of independent and identically distributed samples. Score metrics for the estimation of Bayesian networks when data sets are not independent and identically distributed are then developed and explored, and the utility and necessity of these metrics is demonstrated. Finally, the developed metrics are applied to a data set consisting of samples of grape genes taken from several different vineyards. / Thesis (Ph.D.) -- University of Adelaide, School of Mathematical Sciences, 2010
99	Signature-based activity detection based on Bayesian networks acquired from expert knowledge Fooladvandi, Farzad January 2008 (has links) <p>The maritime industry is experiencing one of its longest and fastest periods of growth. Hence, the global maritime surveillance capacity is in a great need of growth as well. The detection of vessel activity is an important objective of the civil security domain. Detecting vessel activity may become problematic if audit data is uncertain. This thesis aims to investigate if Bayesian networks acquired from expert knowledge can detect activities with a signature-based detection approach. For this, a maritime pilot-boat scenario has been identified with a domain expert. Each of the scenario’s activities has been divided up into signatures where each signature relates to a specific Bayesian network information node. The signatures were implemented to find evidences for the Bayesian network information nodes. AIS-data with real world observations have been used for testing, which have shown that it is possible to detect the maritime pilot-boat scenario based on the taken approach.</p> Signature-based detection Bayesian networks knowledge elicitation information fusion maritime situation awareness Computer science Datavetenskap
100	Um processo baseado em redes bayesianas para avaliação da aplicação do scrum em projetos de software. PERKUSICH, Mirko Barbosa. 10 September 2018 (has links) Submitted by Emanuel Varela Cardoso (emanuel.varela@ufcg.edu.br) on 2018-09-10T20:31:40Z No. of bitstreams: 1 MIRKO BARBOSA PERKUSICH – DISSERTAÇÃO (PPGCC) 2018.pdf: 3631770 bytes, checksum: 855cc4a180be90ee4d3788f2b18b6141 (MD5) / Made available in DSpace on 2018-09-10T20:31:40Z (GMT). No. of bitstreams: 1 MIRKO BARBOSA PERKUSICH – DISSERTAÇÃO (PPGCC) 2018.pdf: 3631770 bytes, checksum: 855cc4a180be90ee4d3788f2b18b6141 (MD5) Previous issue date: 2018-03-05 / O aumento na utilização de métodos ágeis tem sido motivado pela necessidade de respostas rápidas a demandas de um mercado volátil na área de software. Em contraste com os tradicionais processos dirigidos a planos, métodos ágeis são focados nas pessoas, orientados à comunicação, flexíveis, rápidos, leves, responsivos e dirigidos à aprendizagem e melhoria contínua. Como consequência, fatores subjetivos tais como colaboração, comunicação e auto-organização são chaves para avaliar a maturidade do desenvolvimento de software ágil. O Scrum, focado no gerenciamento de projetos, é o método ágil mais popular. Ao ser adotado por uma equipe, a aplicação do Scrum deve ser melhorada continuamente sendo complementado com práticas e processos de desenvolvimento e gerenciamento ágeis. Apesar da Reunião de Retrospectiva, evento do Scrum, ser um período reservado ao final de cada sprint para a equipe refletir sobre a melhoria do método de desenvolvimento, não há procedimentos claros e específicos para a realização da mesma. Na literatura, há diversas propostas de soluções, embora nenhuma consolidada, para tal. Desta forma, o problema em questão é: como instrumentar o Scrum para auxiliar na melhoria contínua do método de desenvolvimento com foco na avaliação do processo de engenharia de requisitos, equipe de desenvolvimento e incrementos do produto? Nesta tese, propõe-se um processo sistemático baseado em redes bayesianas para auxiliar na avaliação da aplicação do Scrum em projetos de software, instrumentando o método para auxiliar na sua melhoria contínua com foco na avaliação do processo de engenharia de requisitos, equipe de desenvolvimento e incrementos do produto. A rede bayesiana foi construída por meio de um processo de Engenharia de Conhecimento de Redes Bayesianas. Uma base de dados, elicitada de dezoito projetos reais de uma empresa, foi coletada por meio de um questionário. Essa base de dados foi utilizada para avaliar a acurácia da predição da Rede Bayesiana. Como resultado, a previsão foi correta para quatorze projetos (acurácia de 78%). Dessa forma, conclui-se que o modelo é capaz de realizar previsões com acurácia satisfatória e, dessa forma, é útil para auxiliar nas tomadas de decisões de projetos Scrum. / The use of Agile Software Development (ASD) is increasing to satisfy the need to respond to fast moving market demand and gain market share. In contrast with traditional plan-driven processes, ASD are people and communication-oriented, flexible, fast, lightweight, responsive, driven for learning and continuous improvement. As consequence, subjective factors such as collaboration, communication and self-management are key to evaluate the maturity of agile adoption. Scrum, which is focused on project management, is the most popular agile method. Whenever adopted, the usage of Scrum must be continuously improved by complementing it with development and management practices and processes. Even though the Retrospective Meeting, a Scrum event, is a period at the end of each sprint for the team to assess the development method, there are no clear and specific procedures to conduct it. In literature, there are several, but no consolidated, proposed solutions to assist on ASD adoption and assessment. Therefore, the research problem is: how to instrument Scrum to assist on the continuous improvement of the development method focusing on the requirements engineering process, development team and product increment? In this thesis, we propose a Bayesian networks-based process to assist on the assessment of Scrum-based projects, instrumenting the software development method to assist on its continuous improvement focusing on the requirements engineering process, development team and product increments. We have built the Bayesian network using a Knowledge Engineering Bayesian Network (KEBN) process that calculates the customer satisfaction given factors of the software development method. To evaluate its prediction accuracy, we have collected data from 18 industry projects from one organization through a questionnaire. As a result, the prediction was correct for fourteen projects (78% accuracy). Therefore, we conclude that the model is capable of accurately predicting the customer satisfaction and is useful to assist on decision-support on Scrum projects. Ciência da Computação Projetos de software Scrum Redes bayesianas Backlog do produto Product Backlog Bayesian networks Software projects

Search results