Global ETD Search

341	Email Classification : An evaluation of Deep Neural Networks with Naive Bayes Michailoff, John January 2019 (has links) Machine learning (ML) is an area of computer science that gives computers the ability to learn data patterns without prior programming for those patterns. Using neural networks in this area is based on simulating the biological functions of neurons in brains to learn patterns in data, giving computers a predictive ability to comprehend how data can be clustered. This research investigates the possibilities of using neural networks for classifying email, i.e. working as an email case manager. A Deep Neural Network (DNN) are multiple layers of neurons connected to each other by trainable weights. The main objective of this thesis was to evaluate how the three input arguments - data size, training time and neural network structure – affects the accuracy of Deep Neural Networks pattern recognition; also an evaluation of how the DNN performs compared to the statistical ML method, Naïve Bayes, in the form of prediction accuracy and complexity; and finally the viability of the resulting DNN as a case manager. Results show an improvement of accuracy on our networks with the increase of training time and data size respectively. By testing increasingly complex network structures (larger networks of neurons with more layers) it is observed that overfitting becomes a problem with increased training time, i.e. how accuracy decrease after a certain threshold of training time. Naïve Bayes classifiers performs worse than DNN in terms of accuracy, but better in reduced complexity; making NB viable on mobile platforms. We conclude that our developed prototype may work well in tangent with existing case management systems, tested by future research. Machine learning neural network DNN Naive Bayes network complexity Software Engineering Programvaruteknik
342	Study designs and statistical methods for pharmacogenomics and drug interaction studies Zhang, Pengyue 01 April 2016 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Adverse drug events (ADEs) are injuries resulting from drug-related medical interventions. ADEs can be either induced by a single drug or a drug-drug interaction (DDI). In order to prevent unnecessary ADEs, many regulatory agencies in public health maintain pharmacovigilance databases for detecting novel drug-ADE associations. However, pharmacovigilance databases usually contain a significant portion of false associations due to their nature structure (i.e. false drug-ADE associations caused by co-medications). Besides pharmacovigilance studies, the risks of ADEs can be minimized by understating their mechanisms, which include abnormal pharmacokinetics/pharmacodynamics due to genetic factors and synergistic effects between drugs. During the past decade, pharmacogenomics studies have successfully identified several predictive markers to reduce ADE risks. While, pharmacogenomics studies are usually limited by the sample size and budget. In this dissertation, we develop statistical methods for pharmacovigilance and pharmacogenomics studies. Firstly, we propose an empirical Bayes mixture model to identify significant drug-ADE associations. The proposed approach can be used for both signal generation and ranking. Following this approach, the portion of false associations from the detected signals can be well controlled. Secondly, we propose a mixture dose response model to investigate the functional relationship between increased dimensionality of drug combinations and the ADE risks. Moreover, this approach can be used to identify high-dimensional drug combinations that are associated with escalated ADE risks at a significantly low local false discovery rates. Finally, we proposed a cost-efficient design for pharmacogenomics studies. In order to pursue a further cost-efficiency, the proposed design involves both DNA pooling and two-stage design approach. Compared to traditional design, the cost under the proposed design will be reduced dramatically with an acceptable compromise on statistical power. The proposed methods are examined by extensive simulation studies. Furthermore, the proposed methods to analyze pharmacovigilance databases are applied to the FDA’s Adverse Reporting System database and a local electronic medical record (EMR) database. For different scenarios of pharmacogenomics study, optimized designs to detect a functioning rare allele are given as well. FAERS Drug-drug interaction Empirical Bayes Pharmacogenomics Pharmacovigilance Two-stage design
343	Bayesian Analysis of Systematic Theoretical Errors Models Billig, Ian A. 10 January 2021 (has links) No description available. Physics Statistics Nuclear Physics Bayesian Bayes Theorem Statistics EFT Nuclear Science
344	A Geometric Framework for Modeling and Inference using the Nonparametric Fisher–Rao metric Saha, Abhijoy 02 October 2019 (has links) No description available. Statistics nonparametric Fisher-Rao metric Riemannian manifold geometry statistics variational Bayes sensitivity analysis tumor heterogeneity
345	Investigations into the design and dissection of genetic networks Libby, Eric. January 2007 (has links) No description available. Bayes Theorem. Atrial Fibrillation -- genetics. Gene Expression -- genetics. Oligonucleotide Array Sequence Analysis.
346	Predicting SNI Codes from Company Descriptions : A Machine Learning Solution Lindholm, Erik, Nilsson, Jonas January 2023 (has links) This study aims to develop an automated solution for assigning area of industry codes to businesses based on the contents of their business descriptions. The Swedish standard industrial classification (SNI) is a system used by Statistics Sweden (SCB) for categorizing businesses for their statistics reports. Assignment of SNI codes has so far been done manually by the person registering a new company, but this is a far from optimal solution. Some of the 88 main group areas of industry are hard to tell apart from one another, and this often leads to incorrect assignments. Our approach to this problem was to train a machine learning model using the Naive Bayes and SVM classifier algorithms and conduct an experiment. In 2019, Dahlqvist and Strandlund had attempted this and reached an accuracy score of 52 percent by use of the gradient boosting classifier, but this was considered too low for real-world implementation. Our main goal was to achieve a higher accuracy than that of Dahlqvist and Strandlund, which we eventually succeeded in - our best-performing SVM model reached a score of 60.11 percent. Similarly to Dahlqvist and Strandlund, we concluded that the low quality of the dataset was the main obstacle for achieving higher scores. The dataset we used was severely imbalanced, and much time was spent on investigating and applying oversampling and undersampling as strategies for mitigating this problem. However, we found during the testing phase that none of these strategies had any positive effect on the accuracy scores. Machine learning text classification SNI Naive Bayes SVM oversampling undersampling Computer Sciences Datavetenskap (datalogi)
347	Machine Learning in the Open World Yicheng Cheng (11197908) 29 July 2021 (has links) <div>By Machine Learning in the Open World, we are trying to build models that can be used in a more realistic setting where there could always be something "unknown" happening. Beyond the traditional machine learning tasks such as classification and segmentation where all classes are predefined, we are dealing with the challenges from newly emerged classes, irrelevant classes, outliers, and class imbalance.</div><div>At the beginning, we focus on the Non-Exhaustive Learning (NEL) problem from a statistical aspect. By NEL, we assume that our training classes are non-exhaustive, where the testing data could contain unknown classes. And we aim to build models that could simultaneously perform classification and class discovery. We proposed a non-parametric Bayesian model that learns some hyper-parameters from both training and discovered classes (which is empty at the beginning), then infer the label partitioning under the guidance of the learned hyper-parameters, and repeat the above procedure until convergence.</div><div>After obtaining good results on applications with plain and low dimensional data such flow-cytometry and some benchmark datasets, we move forward to Non-Exhaustive Feature Learning (NEFL). For NEFL, we extend our work with deep learning techniques to learn representations on datasets with complex structural and spatial correlations. We proposed a metric learning approach to learn a feature space with good discrimination on both training classes and generalize well on unknown classes. Then we developed some variants of this metric learning algorithm to deal with outliers and irrelevant classes. We applied our final model to applications such as open world image classification, image segmentation, and SRS hyperspectral image segmentation and obtained promising results.</div><div>Finally, we did some explorations with Out of Distribution detection (OOD) to detect irrelevant sample and outliers to complete the story.</div> class discovery non-exhaustive learning non-parametric Bayes Deep Learning
348	Variant Detection Using Next Generation Sequencing Data Pyon, Yoon Soo 08 March 2013 (has links) No description available. Bioinformatics Computer Science Genetics structural variation SNP next generation sequencing naive Bayes classifier
349	Semiparametric Bayesian Joint Modeling with Applications in Toxicological Risk Assessment Hwang, Beom Seuk 06 August 2013 (has links) No description available. Biostatistics Benchmark dose Developmental toxicology study Kernel stick-breaking process Nonparametric Bayes
350	Improving Accuracy in Microwave Radiometry via Probability and Inverse Problem Theory Hudson, Derek Lavell 20 November 2009 (has links) (PDF) Three problems at the forefront of microwave radiometry are solved using probability theory and inverse problem formulations which are heavily based in probability theory. Probability theory is able to capture information about random phenomena, while inverse problem theory processes that information. The use of these theories results in more accurate estimates and assessments of estimate error than is possible with previous, non-probabilistic approaches. The benefits of probabilistic approaches are expounded and demonstrated. The first problem to be solved is a derivation of the error that remains after using a method which corrects radiometric measurements for polarization rotation. Yueh [1] proposed a method of using the third Stokes parameter TU to correct brightness temperatures such as Tv and Th for polarization rotation. This work presents an extended error analysis of Yueh's method. In order to carry out the analysis, a forward model of polarization rotation is developed which accounts for the random nature of thermal radiation, receiver noise, and (to first order) calibration. Analytic formulas are then derived and validated for bias, variance, and root-mean-square error (RMSE) as functions of scene and radiometer parameters. Examination of the formulas reveals that: 1) natural TU from planetary surface radiation, of the magnitude expected on Earth at L-band, has a negligible effect on correction for polarization rotation; 2) RMSE is a function of rotation angle Ω, but the value of Ω which minimizes RMSE is not known prior to instrument fabrication; and 3) if residual calibration errors can be sufficiently reduced via postlaunch calibration, then Yueh's method reduces the error incurred by polarization rotation to negligibility. The second problem addressed in this dissertation is optimal estimation of calibration parameters in microwave radiometers. Algebraic methods for internal calibration of a certain class of polarimetric microwave radiometers are presented by Piepmeier [2]. This dissertation demonstrates that Bayesian estimation of the calibration parameters decreases the RMSE of the estimates by a factor of two as compared with algebraic estimation. This improvement is obtained by using knowledge of the noise structure of the measurements and by utilizing all of the information provided by the measurements. Furthermore, it is demonstrated that much significant information is contained in the covariance information between the calibration parameters. This information can be preserved and conveyed by reporting a multidimensional pdf for the parameters rather than merely the means and variances of those parameters. The proposed method is also extended to estimate several hardware parameters of interest in system calibration. The final portion of this dissertation demonstrates the advantages of a probabilistic approach in an empirical situation. A recent inverse problem formulation, sketched in [3], is founded on probability theory and is sufficiently general that it can be applied in empirical situations. This dissertation applies that formulation to the retrieval of Antarctic air temperature from satellite measurements of microwave brightness temperature. The new method is contrasted with the curve-fitting approach which is the previous state-of-the-art. The adaptibility of the new method not only results in improved estimation but is also capable of producing useful estimates of air temperature in areas where the previous method fails due to the occurence of melt events. radiometer polarimetric calibration Stokes parameters polarization rotation microwave Antarctica probability Bayes inverse problem Electrical and Computer Engineering

Search results