Global ETD Search

341	Automation of support service using Natural Language Processing : Automation of errands tagging Haglund, Kristoffer January 2020 (has links) In this paper, Natural Language Processing and classification algorithms were used to create a program that automatically can tag different errands that are connected to Fortnox (an IT company based in Växjö) support service. Controlled experiments were conducted to find the best classification algorithm together with different Bag-of-Word pre-processing algorithms to find what was best suited for this problem. All data were provided by Fortnox and were manually labeled with tags connected to it as training and test data. The result of the final algorithm was 69.15% correctly/accurately predicted errands using all original data. When looking at the data that were incorrectly predicted a pattern was noticed where many errands have identical text attached to them. By removing the majority of these errands, the result was increased to 94.08% Natural Language Processing Naïve Bayes Support Vector Machine Neural Network Pre-processing Engineering and Technology Teknik och teknologier
342	Exploration of infectious disease transmission dynamics using the relative probability of direct transmission between patients Leavitt, Sarah Van Ness 06 October 2020 (has links) The question “who infected whom” is a perennial one in the study of infectious disease dynamics. To understand characteristics of infectious diseases such as how many people will one case produce over the course of infection (the reproductive number), how much time between the infection of two connected cases (the generation interval), and what factors are associated with transmission, one must ascertain who infected whom. The current best practices for linking cases are contact investigations and pathogen whole genome sequencing (WGS). However, these data sources cannot perfectly link cases, are expensive to obtain, and are often not available for all cases in a study. This lack of discriminatory data limits the use of established methods in many existing infectious disease datasets. We developed a method to estimate the relative probability of direct transmission between any two infectious disease cases. We used a subset of cases that have pathogen WGS or contact investigation data to train a model and then used demographic, spatial, clinical, and temporal data to predict the relative transmission probabilities for all case-pairs using a simple machine learning algorithm called naive Bayes. We adapted existing methods to estimate the reproductive number and generation interval to use these probabilities. Finally, we explored the associations between various covariates and transmission and how they related to the associations between covariates and pathogen genetic relatedness. We applied these methods to a tuberculosis outbreak in Hamburg, Germany and to surveillance data in Massachusetts, USA. Through simulations we found that our estimated transmission probabilities accurately classified pairs as links and nonlinks and were able to accurately estimate the reproductive number and the generation interval. We also found that the association between covariates and genetic relatedness captures the direction but not absolute magnitude of the association between covariates and transmission, but the bias was improved by using effect estimates from the naive Bayes algorithm. The methods developed in this dissertation can be used to explore transmission dynamics and estimate infectious disease parameters in established datasets where this was not previously feasible because of a lack of highly discriminatory information, and therefore expand our understanding of many infectious diseases. Biostatistics Generation interval Naive Bayes Noise reduction Reproductive number Risk factors of transmission Tuberculosis
343	Email Classification : An evaluation of Deep Neural Networks with Naive Bayes Michailoff, John January 2019 (has links) Machine learning (ML) is an area of computer science that gives computers the ability to learn data patterns without prior programming for those patterns. Using neural networks in this area is based on simulating the biological functions of neurons in brains to learn patterns in data, giving computers a predictive ability to comprehend how data can be clustered. This research investigates the possibilities of using neural networks for classifying email, i.e. working as an email case manager. A Deep Neural Network (DNN) are multiple layers of neurons connected to each other by trainable weights. The main objective of this thesis was to evaluate how the three input arguments - data size, training time and neural network structure – affects the accuracy of Deep Neural Networks pattern recognition; also an evaluation of how the DNN performs compared to the statistical ML method, Naïve Bayes, in the form of prediction accuracy and complexity; and finally the viability of the resulting DNN as a case manager. Results show an improvement of accuracy on our networks with the increase of training time and data size respectively. By testing increasingly complex network structures (larger networks of neurons with more layers) it is observed that overfitting becomes a problem with increased training time, i.e. how accuracy decrease after a certain threshold of training time. Naïve Bayes classifiers performs worse than DNN in terms of accuracy, but better in reduced complexity; making NB viable on mobile platforms. We conclude that our developed prototype may work well in tangent with existing case management systems, tested by future research. Machine learning neural network DNN Naive Bayes network complexity Software Engineering Programvaruteknik
344	Study designs and statistical methods for pharmacogenomics and drug interaction studies Zhang, Pengyue 01 April 2016 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Adverse drug events (ADEs) are injuries resulting from drug-related medical interventions. ADEs can be either induced by a single drug or a drug-drug interaction (DDI). In order to prevent unnecessary ADEs, many regulatory agencies in public health maintain pharmacovigilance databases for detecting novel drug-ADE associations. However, pharmacovigilance databases usually contain a significant portion of false associations due to their nature structure (i.e. false drug-ADE associations caused by co-medications). Besides pharmacovigilance studies, the risks of ADEs can be minimized by understating their mechanisms, which include abnormal pharmacokinetics/pharmacodynamics due to genetic factors and synergistic effects between drugs. During the past decade, pharmacogenomics studies have successfully identified several predictive markers to reduce ADE risks. While, pharmacogenomics studies are usually limited by the sample size and budget. In this dissertation, we develop statistical methods for pharmacovigilance and pharmacogenomics studies. Firstly, we propose an empirical Bayes mixture model to identify significant drug-ADE associations. The proposed approach can be used for both signal generation and ranking. Following this approach, the portion of false associations from the detected signals can be well controlled. Secondly, we propose a mixture dose response model to investigate the functional relationship between increased dimensionality of drug combinations and the ADE risks. Moreover, this approach can be used to identify high-dimensional drug combinations that are associated with escalated ADE risks at a significantly low local false discovery rates. Finally, we proposed a cost-efficient design for pharmacogenomics studies. In order to pursue a further cost-efficiency, the proposed design involves both DNA pooling and two-stage design approach. Compared to traditional design, the cost under the proposed design will be reduced dramatically with an acceptable compromise on statistical power. The proposed methods are examined by extensive simulation studies. Furthermore, the proposed methods to analyze pharmacovigilance databases are applied to the FDA’s Adverse Reporting System database and a local electronic medical record (EMR) database. For different scenarios of pharmacogenomics study, optimized designs to detect a functioning rare allele are given as well. FAERS Drug-drug interaction Empirical Bayes Pharmacogenomics Pharmacovigilance Two-stage design
345	Bayesian Analysis of Systematic Theoretical Errors Models Billig, Ian A. 10 January 2021 (has links) No description available. Physics Statistics Nuclear Physics Bayesian Bayes Theorem Statistics EFT Nuclear Science
346	A Geometric Framework for Modeling and Inference using the Nonparametric Fisher–Rao metric Saha, Abhijoy 02 October 2019 (has links) No description available. Statistics nonparametric Fisher-Rao metric Riemannian manifold geometry statistics variational Bayes sensitivity analysis tumor heterogeneity
347	Investigations into the design and dissection of genetic networks Libby, Eric. January 2007 (has links) No description available. Bayes Theorem. Atrial Fibrillation -- genetics. Gene Expression -- genetics. Oligonucleotide Array Sequence Analysis.
348	Predicting SNI Codes from Company Descriptions : A Machine Learning Solution Lindholm, Erik, Nilsson, Jonas January 2023 (has links) This study aims to develop an automated solution for assigning area of industry codes to businesses based on the contents of their business descriptions. The Swedish standard industrial classification (SNI) is a system used by Statistics Sweden (SCB) for categorizing businesses for their statistics reports. Assignment of SNI codes has so far been done manually by the person registering a new company, but this is a far from optimal solution. Some of the 88 main group areas of industry are hard to tell apart from one another, and this often leads to incorrect assignments. Our approach to this problem was to train a machine learning model using the Naive Bayes and SVM classifier algorithms and conduct an experiment. In 2019, Dahlqvist and Strandlund had attempted this and reached an accuracy score of 52 percent by use of the gradient boosting classifier, but this was considered too low for real-world implementation. Our main goal was to achieve a higher accuracy than that of Dahlqvist and Strandlund, which we eventually succeeded in - our best-performing SVM model reached a score of 60.11 percent. Similarly to Dahlqvist and Strandlund, we concluded that the low quality of the dataset was the main obstacle for achieving higher scores. The dataset we used was severely imbalanced, and much time was spent on investigating and applying oversampling and undersampling as strategies for mitigating this problem. However, we found during the testing phase that none of these strategies had any positive effect on the accuracy scores. Machine learning text classification SNI Naive Bayes SVM oversampling undersampling Computer Sciences Datavetenskap (datalogi)
349	Machine Learning in the Open World Yicheng Cheng (11197908) 29 July 2021 (has links) <div>By Machine Learning in the Open World, we are trying to build models that can be used in a more realistic setting where there could always be something "unknown" happening. Beyond the traditional machine learning tasks such as classification and segmentation where all classes are predefined, we are dealing with the challenges from newly emerged classes, irrelevant classes, outliers, and class imbalance.</div><div>At the beginning, we focus on the Non-Exhaustive Learning (NEL) problem from a statistical aspect. By NEL, we assume that our training classes are non-exhaustive, where the testing data could contain unknown classes. And we aim to build models that could simultaneously perform classification and class discovery. We proposed a non-parametric Bayesian model that learns some hyper-parameters from both training and discovered classes (which is empty at the beginning), then infer the label partitioning under the guidance of the learned hyper-parameters, and repeat the above procedure until convergence.</div><div>After obtaining good results on applications with plain and low dimensional data such flow-cytometry and some benchmark datasets, we move forward to Non-Exhaustive Feature Learning (NEFL). For NEFL, we extend our work with deep learning techniques to learn representations on datasets with complex structural and spatial correlations. We proposed a metric learning approach to learn a feature space with good discrimination on both training classes and generalize well on unknown classes. Then we developed some variants of this metric learning algorithm to deal with outliers and irrelevant classes. We applied our final model to applications such as open world image classification, image segmentation, and SRS hyperspectral image segmentation and obtained promising results.</div><div>Finally, we did some explorations with Out of Distribution detection (OOD) to detect irrelevant sample and outliers to complete the story.</div> class discovery non-exhaustive learning non-parametric Bayes Deep Learning
350	Variant Detection Using Next Generation Sequencing Data Pyon, Yoon Soo 08 March 2013 (has links) No description available. Bioinformatics Computer Science Genetics structural variation SNP next generation sequencing naive Bayes classifier

Search results