Spelling suggestions: "subject:"bayes"" "subject:"hayes""
341 |
Automation of support service using Natural Language Processing : Automation of errands taggingHaglund, Kristoffer January 2020 (has links)
In this paper, Natural Language Processing and classification algorithms were used to create a program that automatically can tag different errands that are connected to Fortnox (an IT company based in Växjö) support service. Controlled experiments were conducted to find the best classification algorithm together with different Bag-of-Word pre-processing algorithms to find what was best suited for this problem. All data were provided by Fortnox and were manually labeled with tags connected to it as training and test data. The result of the final algorithm was 69.15% correctly/accurately predicted errands using all original data. When looking at the data that were incorrectly predicted a pattern was noticed where many errands have identical text attached to them. By removing the majority of these errands, the result was increased to 94.08%
|
342 |
Exploration of infectious disease transmission dynamics using the relative probability of direct transmission between patientsLeavitt, Sarah Van Ness 06 October 2020 (has links)
The question “who infected whom” is a perennial one in the study of infectious disease dynamics. To understand characteristics of infectious diseases such as how many people will one case produce over the course of infection (the reproductive number), how much time between the infection of two connected cases (the generation interval), and what factors are associated with transmission, one must ascertain who infected whom. The current best practices for linking cases are contact investigations and pathogen whole genome sequencing (WGS). However, these data sources cannot perfectly link cases, are expensive to obtain, and are often not available for all cases in a study. This lack of discriminatory data limits the use of established methods in many existing infectious disease datasets.
We developed a method to estimate the relative probability of direct transmission between any two infectious disease cases. We used a subset of cases that have pathogen WGS or contact investigation data to train a model and then used demographic, spatial, clinical, and temporal data to predict the relative transmission probabilities for all case-pairs using a simple machine learning algorithm called naive Bayes. We adapted existing methods to estimate the reproductive number and generation interval to use these probabilities. Finally, we explored the associations between various covariates and transmission and how they related to the associations between covariates and pathogen genetic relatedness. We applied these methods to a tuberculosis outbreak in Hamburg, Germany and to surveillance data in Massachusetts, USA.
Through simulations we found that our estimated transmission probabilities accurately classified pairs as links and nonlinks and were able to accurately estimate the reproductive number and the generation interval. We also found that the association between covariates and genetic relatedness captures the direction but not absolute magnitude of the association between covariates and transmission, but the bias was improved by using effect estimates from the naive Bayes algorithm. The methods developed in this dissertation can be used to explore transmission dynamics and estimate infectious disease parameters in established datasets where this was not previously feasible because of a lack of highly discriminatory information, and therefore expand our understanding of many infectious diseases.
|
343 |
Email Classification : An evaluation of Deep Neural Networks with Naive BayesMichailoff, John January 2019 (has links)
Machine learning (ML) is an area of computer science that gives computers the ability to learn data patterns without prior programming for those patterns. Using neural networks in this area is based on simulating the biological functions of neurons in brains to learn patterns in data, giving computers a predictive ability to comprehend how data can be clustered. This research investigates the possibilities of using neural networks for classifying email, i.e. working as an email case manager. A Deep Neural Network (DNN) are multiple layers of neurons connected to each other by trainable weights. The main objective of this thesis was to evaluate how the three input arguments - data size, training time and neural network structure – affects the accuracy of Deep Neural Networks pattern recognition; also an evaluation of how the DNN performs compared to the statistical ML method, Naïve Bayes, in the form of prediction accuracy and complexity; and finally the viability of the resulting DNN as a case manager. Results show an improvement of accuracy on our networks with the increase of training time and data size respectively. By testing increasingly complex network structures (larger networks of neurons with more layers) it is observed that overfitting becomes a problem with increased training time, i.e. how accuracy decrease after a certain threshold of training time. Naïve Bayes classifiers performs worse than DNN in terms of accuracy, but better in reduced complexity; making NB viable on mobile platforms. We conclude that our developed prototype may work well in tangent with existing case management systems, tested by future research.
|
344 |
Study designs and statistical methods for pharmacogenomics and drug interaction studiesZhang, Pengyue 01 April 2016 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Adverse drug events (ADEs) are injuries resulting from drug-related medical
interventions. ADEs can be either induced by a single drug or a drug-drug interaction (DDI).
In order to prevent unnecessary ADEs, many regulatory agencies in public health maintain
pharmacovigilance databases for detecting novel drug-ADE associations. However,
pharmacovigilance databases usually contain a significant portion of false associations due
to their nature structure (i.e. false drug-ADE associations caused by co-medications).
Besides pharmacovigilance studies, the risks of ADEs can be minimized by understating
their mechanisms, which include abnormal pharmacokinetics/pharmacodynamics due to
genetic factors and synergistic effects between drugs. During the past decade,
pharmacogenomics studies have successfully identified several predictive markers to
reduce ADE risks. While, pharmacogenomics studies are usually limited by the sample
size and budget.
In this dissertation, we develop statistical methods for pharmacovigilance and
pharmacogenomics studies. Firstly, we propose an empirical Bayes mixture model to
identify significant drug-ADE associations. The proposed approach can be used for both
signal generation and ranking. Following this approach, the portion of false associations
from the detected signals can be well controlled. Secondly, we propose a mixture dose
response model to investigate the functional relationship between increased dimensionality
of drug combinations and the ADE risks. Moreover, this approach can be used to identify high-dimensional drug combinations that are associated with escalated ADE risks at a
significantly low local false discovery rates. Finally, we proposed a cost-efficient design
for pharmacogenomics studies. In order to pursue a further cost-efficiency, the proposed
design involves both DNA pooling and two-stage design approach. Compared to traditional
design, the cost under the proposed design will be reduced dramatically with an acceptable
compromise on statistical power. The proposed methods are examined by extensive
simulation studies. Furthermore, the proposed methods to analyze pharmacovigilance
databases are applied to the FDA’s Adverse Reporting System database and a local
electronic medical record (EMR) database. For different scenarios of pharmacogenomics
study, optimized designs to detect a functioning rare allele are given as well.
|
345 |
Bayesian Analysis of Systematic Theoretical Errors ModelsBillig, Ian A. 10 January 2021 (has links)
No description available.
|
346 |
A Geometric Framework for Modeling and Inference using the Nonparametric Fisher–Rao metricSaha, Abhijoy 02 October 2019 (has links)
No description available.
|
347 |
Investigations into the design and dissection of genetic networksLibby, Eric. January 2007 (has links)
No description available.
|
348 |
Predicting SNI Codes from Company Descriptions : A Machine Learning SolutionLindholm, Erik, Nilsson, Jonas January 2023 (has links)
This study aims to develop an automated solution for assigning area of industry codes to businesses based on the contents of their business descriptions. The Swedish standard industrial classification (SNI) is a system used by Statistics Sweden (SCB) for categorizing businesses for their statistics reports. Assignment of SNI codes has so far been done manually by the person registering a new company, but this is a far from optimal solution. Some of the 88 main group areas of industry are hard to tell apart from one another, and this often leads to incorrect assignments. Our approach to this problem was to train a machine learning model using the Naive Bayes and SVM classifier algorithms and conduct an experiment. In 2019, Dahlqvist and Strandlund had attempted this and reached an accuracy score of 52 percent by use of the gradient boosting classifier, but this was considered too low for real-world implementation. Our main goal was to achieve a higher accuracy than that of Dahlqvist and Strandlund, which we eventually succeeded in - our best-performing SVM model reached a score of 60.11 percent. Similarly to Dahlqvist and Strandlund, we concluded that the low quality of the dataset was the main obstacle for achieving higher scores. The dataset we used was severely imbalanced, and much time was spent on investigating and applying oversampling and undersampling as strategies for mitigating this problem. However, we found during the testing phase that none of these strategies had any positive effect on the accuracy scores.
|
349 |
Machine Learning in the Open WorldYicheng Cheng (11197908) 29 July 2021 (has links)
<div>By Machine Learning in the Open World, we are trying to build models that can be used in a more realistic setting where there could always be something "unknown" happening. Beyond the traditional machine learning tasks such as classification and segmentation where all classes are predefined, we are dealing with the challenges from newly emerged classes, irrelevant classes, outliers, and class imbalance.</div><div>At the beginning, we focus on the Non-Exhaustive Learning (NEL) problem from a statistical aspect. By NEL, we assume that our training classes are non-exhaustive, where the testing data could contain unknown classes. And we aim to build models that could simultaneously perform classification and class discovery. We proposed a non-parametric Bayesian model that learns some hyper-parameters from both training and discovered classes (which is empty at the beginning), then infer the label partitioning under the guidance of the learned hyper-parameters, and repeat the above procedure until convergence.</div><div>After obtaining good results on applications with plain and low dimensional data such flow-cytometry and some benchmark datasets, we move forward to Non-Exhaustive Feature Learning (NEFL). For NEFL, we extend our work with deep learning techniques to learn representations on datasets with complex structural and spatial correlations. We proposed a metric learning approach to learn a feature space with good discrimination on both training classes and generalize well on unknown classes. Then we developed some variants of this metric learning algorithm to deal with outliers and irrelevant classes. We applied our final model to applications such as open world image classification, image segmentation, and SRS hyperspectral image segmentation and obtained promising results.</div><div>Finally, we did some explorations with Out of Distribution detection (OOD) to detect irrelevant sample and outliers to complete the story.</div>
|
350 |
Variant Detection Using Next Generation Sequencing DataPyon, Yoon Soo 08 March 2013 (has links)
No description available.
|
Page generated in 0.0358 seconds