Global ETD Search

231	Bayesian classification of DNA barcodes Anderson, Michael P. January 1900 (has links) Doctor of Philosophy / Department of Statistics / Suzanne Dubnicka / DNA barcodes are short strands of nucleotide bases taken from the cytochrome c oxidase subunit 1 (COI) of the mitochondrial DNA (mtDNA). A single barcode may have the form C C G G C A T A G T A G G C A C T G . . . and typically ranges in length from 255 to around 700 nucleotide bases. Unlike nuclear DNA (nDNA), mtDNA remains largely unchanged as it is passed from mother to offspring. It has been proposed that these barcodes may be used as a method of differentiating between biological species (Hebert, Ratnasingham, and deWaard 2003). While this proposal is sharply debated among some taxonomists (Will and Rubinoff 2004), it has gained momentum and attention from biologists. One issue at the heart of the controversy is the use of genetic distance measures as a tool for species differentiation. Current methods of species classification utilize these distance measures that are heavily dependent on both evolutionary model assumptions as well as a clearly defined "gap" between intra- and interspecies variation (Meyer and Paulay 2005). We point out the limitations of such distance measures and propose a character-based method of species classification which utilizes an application of Bayes' rule to overcome these deficiencies. The proposed method is shown to provide accurate species-level classification. The proposed methods also provide answers to important questions not addressable with current methods. DNA Barcodes Bayesian Classification Species Discovery Naive Bayes Classifier Sequential Analysis High-dimensional Data Statistics (0463)
232	The Effectiveness of a Random Forests Model in Detecting Network-Based Buffer Overflow Attacks Julock, Gregory Alan 01 January 2013 (has links) Buffer Overflows are a common type of network intrusion attack that continue to plague the networked community. Unfortunately, this type of attack is not well detected with current data mining algorithms. This research investigated the use of Random Forests, an ensemble technique that creates multiple decision trees, and then votes for the best tree. The research Investigated Random Forests' effectiveness in detecting buffer overflows compared to other data mining methods such as CART and Naïve Bayes. Random Forests was used for variable reduction, cost sensitive classification was applied, and each method's detection performance compared and reported along with the receive operator characteristics. The experiment was able to show that Random Forests outperformed CART and Naïve Bayes in classification performance. Using a technique to obtain Buffer Overflow most important variables, Random Forests was also able to improve upon its Buffer Overflow classification performance. Buffer Overflows Cost sensitive classification Data Mining Decision Trees Naive Bayes Random Forests Computer Sciences
233	Information theoretic models of social interaction Salge, Christoph January 2013 (has links) This dissertation demonstrates, in a non-semantic information-theoretic framework, how the principles of 'maximisation of relevant information' and 'information parsimony' can guide the adaptation of an agent towards agent-agent interaction. Central to this thesis is the concept of digested information; I argue that an agent is intrinsically motivated to a.) process the relevant information in its environment and b.) display this information in its own actions. From the perspective of similar agents, who require similar information, this differentiates other agents from the rest of the environment, by virtue of the information they provide. This provides an informational incentive to observe other agents and integrate their information into one's own decision making process. This process is formalized in the framework of information theory, which allows for a quantitative treatment of the resulting effects, specifically how the digested information of an agent is influenced by several factors, such as the agent's performance and the integrated information of other agents. Two specific phenomena based on information maximisation arise in this thesis. One is flocking behaviour similar to boids that results when agents are searching for a location in a girdworld and integrated the information in other agent's actions via Bayes' Theorem. The other is an effect where integrating information from too many agents becomes detrimental to an agent's performance, for which several explanations are provided. 006.3
234	Using Machine Learning to Detect Malicious URLs Cheng, Aidan 01 January 2017 (has links) There is a need for better predictive model that reduces the number of malicious URLs being sent through emails. This system should learn from existing metadata about URLs. The ideal solution for this problem would be able to learn from its predictions. For example, if it predicts a URL to be malicious, and that URL is deemed safe by the sandboxing environment, the predictor should refine its model to account for this data. The problem, then, is to construct a model with these characteristics that can make these predictions for the vast number of URLs being processed. Given that the current system does not employ machine learning methods, we intend to investigate multiple such models and summarize which of those might be worth pursuing on a large scale. computer science machine learning computer security clinic malicious urls naive bayes Computer and Systems Architecture
235	A machine learning approach to fundraising success in higher education Ye, Liang 01 May 2017 (has links) New donor acquisition and current donor promotion are the two major programs in fundraising for higher education, and developing proper targeting strategies plays an important role in the both programs. This thesis presents machine learning solutions as targeting strategies for the both programs based on readily available alumni data in almost any institution. The targeting strategy for new donor acquisition is modeled as a donor identification problem. The Gaussian na ̈ıve bayes, random forest, and support vector machine algorithms are used and evaluated. The test results show that having been trained with enough samples, all three algorithms can distinguish donors from rejectors well, and big donors are identified more often than others.While there is a trade off between the cost of soliciting candidates and the success of donor acquisition, the results show that in a practical scenario where the models are properly used as the targeting strategy, more than 85% of new donors and more than 90% of new big donors can be acquired when only 40% of the candidates are solicited. The targeting strategy for donor promotion is modeled as a promising donor(i.e., those who will upgrade their pledge) prediction problem in machine learning.The Gaussian na ̈ıve bayes, random forest, and support vector machine algorithms are tested. The test results show that all the three algorithms can distinguish promising donors from non-promising donors (i.e., those who will not upgrade their pledge).When the age information is known, the best model produces an overall accuracy of 97% in the test set. The results show that in a practical scenario where the models are properly used as the targeting strategy, more than 85% of promising donors can be acquired when only 26% candidates are solicited. / Graduate / liangye714@gmail.com machine learning fundraising support vector machine random forest na ̈ıve bayes predictive analysis prospect research
236	Fonctions de perte en actuariat Craciun, Geanina January 2009 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal. Non-admissibilité Distribution de Pareto Estimateur de Bayes Fonction de perte LINEX et quadratique Fonction de perte asymétrique Perte espérée Inadmisibility Pareto distribution Bayes estimation LINEX and quadratic loss functions Assymetric loss functions Lindley's Bayes aprroximation form Modified reflected normal loss function Expected loss
237	Shape from gradients : a psychophysical and computational study of the role complex illumination gradients, such as shading and mutual illumination, play in three-dimensional shape perception Harding, Glen January 2013 (has links) The human visual system gathers information about three-dimensional object shape from a wide range of sources. How effectively we can use these sources, and how they are combined to form a consistent and accurate percept of the 3D world is the focus of much research. In complex scenes inter-reflections of light between surfaces (mutual illumination) can occur, creating chromatic illumination gradients. These gradients provide a source of information about 3D object shape, but little research has been conducted into the capabilities of the visual system to use such information. The experiments described here were conducted with the aim of understanding the influence of chromatic gradients from mutual illumination on 3D shape perception. Psychophysical experiments are described that were designed to investigate: If the human visual system takes account of mutual illumination when estimating 3D object shape, and how this might occur; How colour shading cues are integrated with other shape cues; The relative influence on 3D shape perception of achromatic (luminance) shading and chromatic shading from mutual illumination. In addition, one chapter explores a selection of mathematical models of cue integration and their applicability in this case. The results of the experiments suggest that the human visual system is able to quickly assess and take account of colour mutual illuminations when estimating 3D object shape, and use chromatic gradients as an independent and effective cue. Finally, mathematical modelling reveals that the chromatic gradient cue is likely integrated with other shape cues in a way that is close to statistically optimal. 617.7
238	Application of Machine Learning Techniques for Real-time Classification of Sensor Array Data Li, Sichu 15 May 2009 (has links) There is a significant need to identify approaches for classifying chemical sensor array data with high success rates that would enhance sensor detection capabilities. The present study attempts to fill this need by investigating six machine learning methods to classify a dataset collected using a chemical sensor array: K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Classification and Regression Trees (CART), Random Forest (RF), Naïve Bayes Classifier (NB), and Principal Component Regression (PCR). A total of 10 predictors that are associated with the response from 10 sensor channels are used to train and test the classifiers. A training dataset of 4 classes containing 136 samples is used to build the classifiers, and a dataset of 4 classes with 56 samples is used for testing. The results generated with the six different methods are compared and discussed. The RF, CART, and KNN are found to have success rates greater than 90%, and to outperform the other methods. KNN SVM CART Random Forest Naïve Bayes PCR machine learning classification sensor array
239	Application of Dirichlet Distribution for Polytopic Model Estimation Katkuri, Jaipal 05 August 2010 (has links) The polytopic model (PM) structure is often used in the areas of automatic control and fault detection and isolation (FDI). It is an alternative to the multiple model approach which explicitly allows for interpolation among local models. This thesis proposes a novel approach to PM estimation by modeling the set of PM weights as a random vector with Dirichlet Distribution (DD). A new approximate (adaptive) PM estimator, referred to as a Quasi-Bayesian Adaptive Kalman Filter (QBAKF) is derived and implemented. The model weights and state estimation in the QBAKF is performed adaptively by a simple QB weights' estimator and a single KF on the PM with the estimated weights. Since PM estimation problem is nonlinear and non-Gaussian, a DD marginalized particle filter (DDMPF) is also developed and implemented similar to MPF. The simulation results show that the newly proposed algorithms have better estimation accuracy, design simplicity, and computational requirements for PM estimation. Model interpolation Quasi-Bayes procedure for mixtures Dirichlet distribution Jump-Markov linear systems Polytopic model
240	Uma abordagem bayesiana para mapeamento de QTLs em populações experimentais / A Bayesian approach for mapping QTL in experimental populations Meyer, Andréia da Silva 03 April 2009 (has links) Muitos caracteres em plantas e animais são de natureza quantitativa, influenciados por múltiplos genes. Com o advento de novas técnicas moleculares tem sido possível mapear os locos que controlam os caracteres quantitativos, denominados QTLs (Quantitative Trait Loci). Mapear um QTL significa identificar sua posição no genoma, bem como, estimar seus efeitos genéticos. A maior dificuldade para realizar o mapeamento de QTLs, se deve ao fato de que o número de QTLs é desconhecido. Métodos bayesianos juntamente com método Monte Carlo com Cadeias de Markov (MCMC), têm sido implementados para inferir conjuntamente o número de QTLs, suas posições no genoma e os efeitos genéticos . O desafio está em obter a amostra da distribuição conjunta a posteriori desses parâmetros, uma vez que o número de QTLs pode ser considerado desconhecido e a dimensão do espaço paramétrico muda de acordo com o número de QTLs presente no modelo. No presente trabalho foi implementado, utilizando-se o programa estatístico R uma abordagem bayesiana para mapear QTLs em que múltiplos QTLs e os efeitos de epistasia são considerados no modelo. Para tanto foram ajustados modelos com números crescentes de QTLs e o fator de Bayes foi utilizado para selecionar o modelo mais adequado e conseqüentemente, estimar o número de QTLs que controlam os fenótipos de interesse. Para investigar a eficiência da metodologia implementada foi feito um estudo de simulação em que foram considerados duas diferentes populações experimentais: retrocruzamento e F2, sendo que para ambas as populações foi feito o estudo de simulação considerando modelos com e sem epistasia. A abordagem implementada mostrou-se muito eficiente, sendo que para todas as situações consideradas o modelo selecionado foi o modelo contendo o número verdadeiro de QTLs considerado na simulação dos dados. Além disso, foi feito o mapeamento de QTLs de três fenótipos de milho tropical: altura da planta (AP), altura da espiga (AE) e produção de grãos utilizando a metodologia implementada e os resultados obtidos foram comparados com os resultados encontrados pelo método CIM. / Many traits in plants and animals have quantitative nature, influenced by multiple genes. With the new molecular techniques, it has been possible to map the loci, which control the quantitative traits, called QTL (Quantitative Trait Loci). Mapping a QTL means to identify its position in the genome, as well as to estimate its genetics effects. The great difficulty of mapping QTL relates to the fact that the number of QTL is unknown. Bayesian approaches used with Markov Chain Monte Carlo method (MCMC) have been applied to infer QTL number, their positions in the genome and their genetic effects. The challenge is to obtain the sample from the joined distribution posterior of these parameters, since the number of QTL may be considered unknown and hence the dimension of the parametric space changes according to the number of QTL in the model. In this study, a Bayesian approach was applied, using the statistical program R, in order to map QTL, considering multiples QTL and epistasis effects in the model. Models were adjusted with the crescent number of QTL and Bayes factor was used to select the most suitable model and, consequently, to estimate the number of QTL that control interesting phenotype. To evaluate the efficiency of the applied methodology, a simulation study was done, considering two different experimental populations: backcross and F2, accomplishing the simulation study for both populations, considering models with and without epistasis. The applied approach resulted to be very efficient, considering that for all the used situations, the selected model was the one containing the real number of QTL used in the data simulation. Moreover, the QTL mapping of three phenotypes of tropical corn was done: plant height, corn-cob height and grain production, using the applied methodology and the results were compared to the results found by the CIM method. Bayes factor Bayesian inference Genética estatística Inferência baysiana Mapeamento genético MCMC Método de Monte Carlo. QTL mapping.

Search results