Global ETD Search

31	Learning Predictive Models from Electronic Health Records Zhao, Jing January 2017 (has links) The ongoing digitization of healthcare, which has been much accelerated by the widespread adoption of electronic health records, generates unprecedented amounts of clinical data in a readily computable form. This, in turn, affords great opportunities for making meaningful secondary use of clinical data in the endeavor to improve healthcare, as well as to support epidemiology and medical research. To that end, there is a need for techniques capable of effectively and efficiently analyzing large amounts of clinical data. While machine learning provides the necessary tools, learning effective predictive models from electronic health records comes with many challenges due to the complexity of the data. Electronic health records contain heterogeneous and longitudinal data that jointly provides a rich perspective of patient trajectories in the healthcare process. The diverse characteristics of the data need to be properly accounted for when learning predictive models from clinical data. However, how best to represent healthcare data for predictive modeling has been insufficiently studied. This thesis addresses several of the technical challenges involved in learning effective predictive models from electronic health records. Methods are developed to address the challenges of (i) representing heterogeneous types of data, (ii) leveraging the concept hierarchy of clinical codes, and (iii) modeling the temporality of clinical events. The proposed methods are evaluated empirically in the context of detecting adverse drug events in electronic health records. Various representations of each type of data that account for its unique characteristics are investigated and it is shown that combining multiple representations yields improved predictive performance. It is also demonstrated how the information embedded in the concept hierarchy of clinical codes can be exploited, both for creating enriched feature spaces and for decomposing the predictive task. Moreover, incorporating temporal information leads to more effective predictive models by distinguishing between event occurrences in the patient history. Both single-point representations, using pre-assigned or learned temporal weights, and multivariate time series representations are shown to be more informative than representations in which temporality is ignored. Effective methods for representing heterogeneous and longitudinal data are key for enhancing and truly enabling meaningful secondary use of electronic health records through large-scale analysis of clinical data. Data Science Machine Learning Predictive Modeling Data Representation Health Informatics Electronic Health Records
32	IDENTIFICATION OF NOVEL SLEEP RELATED GENES FROM LARGE SCALE PHENOTYPING EXPERIMENTS IN MICE Joshi, Shreyas 01 January 2017 (has links) Humans spend a third of their lives sleeping but very little is known about the physiological and genetic mechanisms controlling sleep. Increased data from sleep phenotyping studies in mouse and other species, genetic crosses, and gene expression databases can all help improve our understanding of the process. Here, we present analysis of our own sleep data from the large-scale phenotyping program at The Jackson Laboratory (JAX), to identify the best gene candidates and phenotype predictors for influencing sleep traits. The original knockout mouse project (KOMP) was a worldwide collaborative effort to produce embryonic stem (ES) cell lines with one of mouse’s 21,000 protein coding genes knocked out. The objective of KOMP2 is to phenotype as many as of these lines as feasible, with each mouse studied over a ten-week period (www.mousephenotype.org). The phenotyping for sleep behavior is done using our non-invasive Piezo system for mouse activity monitoring. Thus far, sleep behavior has been recorded in more than 6000 mice representing 343 knockout lines and nearly 2000 control mice. Control and KO mice have been compared using multivariate statistical approaches to identify genes that exhibit significant effects on sleep variables from Piezo data. Using these statistical approaches, significant genes affecting sleep have been identified. Genes affecting sleep in a specific sex and that specifically affect sleep during daytime and/or night have also been identified and reported. The KOMP2 consists of a broad-based phenotyping pipeline that consists of collection of physiological and biochemical parameters through a variety of assays. Mice enter the pipeline at 4 weeks of age and leave at 18 weeks. Currently, the IMPC (International Mouse Phenotyping Consortium) database consists of more than 33 million observations. Our final dataset prepared by extracting biological sample data for whom sleep recordings are available consists of nearly 1.5 million observations from multitude of phenotyping assays. Through big data analytics and sophisticated machine learning approaches, we have been able to identify predictor phenotypes that affect sleep in mice. The phenotypes thus identified can play a key role in developing our understanding of mechanism of sleep regulation. Sleep Bioinformatics Gene-Phenotype Association KOMP2 Predictive Modeling Complex Traits Behavioral Neurobiology Bioinformatics Computational Biology Genetics
33	Predikce prodejností magazínů / Magazine sales prediction Rajčan, Šimon January 2013 (has links) Today, many magazine publishing houses faces the problem of future predictions of their products. In many cases, these predictions are made by employees based on their personal experiences and guesses. The problems of this attitude are high expanses on making the predictions and increased expanses when those predictions are wrong. The aim of this work is to study existing regression methods of automatic prediction and create a solution for predicting the magazine sales in Russian publishing house Burda.
34	Comparação de algoritmos de aprendizagem de máquina para construção de modelos preditivos de diabetes não diagnosticado / Comparison of machine learning algorithms to build predictive models of undiagnosed diabetes Olivera, André Rodrigues January 2016 (has links) O objetivo deste trabalho foi desenvolver e comparar modelos preditivos para detecção de diabetes não diagnosticado utilizando diferentes algoritmos de aprendizagem de máquina. Os dados utilizados foram do Estudo Longitudinal de Saúde do Adulto (ELSA-Brasil), um conjunto bastante completo com aproximadamente 15 mil participantes. As variáveis preditoras foram selecionadas de forma que sejam informações simples dos participantes, sem necessidade de exames laboratoriais. Os testes foram realizados em quatro etapas: ajuste dos parâmetros através de validação cruzada, seleção automática de variáveis, validação cruzada para estimativa de erros e teste de generalização em um conjunto independente dos dados. Os resultados demonstram a viabilidade de utilizar informações simples para detectar casos diabetes não diagnosticado na população. Além disso, os resultados comparam algoritmos de aprendizagem de máquina e mostram a possibilidade de utilizar outros algoritmos, alternativamente à Regressão Logística, para a construção de modelos preditivos. / The aim of this work was to develop and to compare predictive models to detect undiagnosed diabetes using different machine learning algorithms and data from the Longitudinal Study of Adult Health (ELSA-Brasil), which collected an extensive dataset from around 15 thousand participants. The predictor variables were selected from literature research. The tests were performed in four steps: parameter tuning with cross validation, automatic feature selection, cross validation to error evaluation and generalization test in an independent dataset. The results show the feasibility of extracting useful information from ELSA-Brasil as well as the potential to use other algorithms, in addition to logistic regression, to build predictive models from ELSA-Brasil dataset. Informática médica Aprendizagem : Maquina Diabetes : Diagnostico Machine learning Predictive modeling Data mining
35	Cross contamination of Listeria monocytogenes in ready-to-eat meat product during slicing: a predictive approach / Contaminação cruzada de Listeria monocytogenes em produto cárneo pronto para o consumo durante o fatiamento: uma abordagem preditiva Lopes, Janaina Thaís 12 May 2017 (has links) Ready to eat (RTE) meat products are subject to recontamination after industrial processing, mainly by Listeria monocytogenes, a pathogenic microorganism that can persist for a long time in the environment. A RTE meat product that is contaminated with L. monocytogenes due to cross contamination during some stage after industrial processing, such as weighing, slicing or wrapping, can be an important causer of disease, due to absence of a kill step before consumption. The objective of this project was to measure the transfer of L. monocytogenes during slicing of cooked ham (cross contamination) at retail, simulating in the laboratory the practices in commercial slicing, and to develop a predictive model capable of describing this transfer. It was observed that in the first slices obtained after the experimental contamination of the slicer, the counts and the transfer rates of L. monocytogenes were higher than in the subsequent slices, and the counting curves presented a long tail as the slices were obtained. The data demonstrate that the slicer may be a relevant source of cross contamination of L. monocytogenes for RTE meat products, regardless of the level of contamination of the slicer. With the data obtained, a new transfer model was proposed called 4p-2se, as it contained four parameters (4p) and two environments (2se), and was independent of the quantification of the pathogen transferred to the slicer. The proposed model was compared to two pathogen transfer models previously described, and the predicted data presented lower RMSE (Root Mean Sum of squared errors) values than the other models. The 4p-2se model was able to satisfactorily predict the pathogen transfer data during slicing of cooked ham, which could assist the food retail establishments and regulatory agencies in the evaluation and control of cross contamination of RTE foods and in the design of proper risk management strategies. / Os produtos derivados da carne que são prontos para consumo estão sujeitos à recontaminação após o processamento industrial, principalmente por Listeria monocytogenes, um microrganismo patogênico capaz de persistir por longo tempo no ambiente. Um produto cárneo pronto para consumo que se contamina com L. monocytogenes devido à contaminação cruzada durante alguma etapa após o processamento industrial, tal como pesagem, fatiamento ou acondicionamento, pode ser um importante causador de enfermidade, pois não há uma etapa de eliminação do patógeno antes do consumo. Este projeto teve por objetivo mensurar a transferência de L. monocytogenes durante o fatiamento de presunto cozido (contaminação cruzada), simulando em laboratório práticas adotadas nos estabelecimentos comerciais de fatiamento de produtos prontos para o consumo, e desenvolver um modelo preditivo capaz de descrever esta transferência. Foi observado que nas primeiras fatias obtidas após a contaminação experimental do fatiador, as contagens e as taxas de transferência de L. monocytogenes eram mais altas que nas subsequentes, observando-se que as curvas de contagem apresentavam uma longa cauda ao longo do fatiamento. Os dados demonstram que o fatiador pode ser uma fonte importante de contaminação cruzada de L. monocytogenes para produtos cárneos prontos para o consumo fatiados, independentemente do nível de contaminação do fatiador. Com os dados obtidos, foi possível sugerir um novo modelo de transferência, denominado 4p-2se, formado por uma equação com apenas quatro parâmetros (4p) e dois ambientes (2se,) sendo esse modelo independente da quantificação do patógeno transferido para o fatiador. O modelo sugerido foi comparado a outros dois modelos de transferência previamente descritos, observando os dados preditos no modelo 4p-2se apresentavam valores de RMSE (Root Mean Sum of squared erros) mais baixos que os demais modelos. O modelo proposto mostrou-se capaz de predizer satisfatoriamente os dados de transferência de patógeno durante o fatiamento de presunto cozido, podendo auxiliar os estabelecimentos comerciais de alimentos e as agências reguladoras na avaliação e controle da contaminação cruzada de alimento prontos para consumo e na concepção de estratégias adequadas de gestão de risco. Contaminação cruzada Cooked ham Cross contamination Listeria monocytogenes Listeria monocytogenes Modelagem preditiva predictive modeling Presunto cozido
36	Site Location Modeling and Prehistoric Rock Shelter Selection on the Upper Cumberland Plateau of Tennessee Langston, Lucinda M 01 May 2013 (has links) Using data collected from 2 archaeological surveys of the Upper Cumberland Plateau (UCP), Pogue Creek Gorge and East Obey, a site location model was developed for prehistoric rock shelter occupation in the region. Further, the UCP model was used to explore factors related to differential site selection of rock shelters. Different from traditional approaches such as those that use (aspatial) logistic regression, the UCP model was developed using spatial logistic regression. However, models were also generated using other regression-based approaches in an effort to demonstrate the need for a spatial approach to archaeological site location modeling. Based on the UCP model, proximity to the vegetation zones of Southern Red Oak and Hickory were the most influential factors in prehistoric site selection of rock shelters on the UCP. Predictive Modeling GIS Prehistory Rock Shelters Spatial Logistic Regression Geospatial Analysis Archaeological Anthropology Geographic Information Sciences
37	Exploring and Describing the Spatial and Temporal Dynamics of Medusahead in the Channeled Scablands of Eastern Washington Using Remote Sensing Techniques Bateman, Timothy M. 01 December 2017 (has links) Medusahead is a harmful weed that is invading public lands in the West. The invasion is a serious concern to the public because it can reduce forage for livestock and wildlife, increase fire frequency, alter important ecosystem cycles (like water), reduce recreational activities, and produce landscapes that are aesthetically unpleasing. Invasions can drive up costs that generally require taxpayer’s dollars. Medusahead seedlings typically spread to new areas by attaching itself to passing objects (e.g. vehicles, animals, clothing) where it can quickly begin to affect plants communities. To be effective, management plans need to be sustainable, informed, and considerate to invasion levels across large landscapes. Ecological remote sensing analysis is a method that uses airborne imagery, taken from drones, aircrafts, or satellites, to gather information about ecological systems. This Thesis strived to use remote sensing techniques to identify medusahead in the landscape and its changes through time. This was done for an extensive area of rangelands in the Channel Scabland region of eastern ashington. This Thesis provided results that would benefit land managers that include: 1) a dispersal map of medusahead, 2) a time line of medusahead cover through time, 3) “high risk’ dispersal areas, 4) climatic factors showing an influence on the time line of medusahead, 5) a strategy map that can be utilized by land managers to direct management needs. This Thesis shows how remote sensing applications can be used to detect medusahead in the landscape and understand its invasiveness through time. This information can help create sustainable and effective management plans so land managers can continue to protect and improve western public lands threatened by the invasion of medusahead. Medusahead Remote Sensing Weed Management Rangelands Predictive Modeling Environmental Sciences Natural Resources Management and Policy
38	Predictive Failure Model for Flip Chip on Board Component Level Assemblies Muncy, Jennifer V. 27 January 2004 (has links) Environmental stress tests, or accelerated life tests, apply stresses to electronic packages that exceed the stress levels experienced in the field. In theory, these elevated stress levels are used to generate the same failure mechanisms that are seen in the field, only at an accelerated rate. The methods of assessing reliability of electronic packages can be classified into two categories: a statistical failure based approach and a physics of failure based approach. This research uses a statistical based methodology to identify the critical factors in reliability performance of a flip chip on board component level assembly and a physics of failure based approach to develop a low cycle strain based fatigue equation for flip chip component level assemblies. The critical factors in determining reliability performance were established via experimental investigation and their influence quantified via regression analysis. This methodology differs from other strain based fatigue approaches because it is not an empirical fit to experimental data; it utilizes regression analysis and least squares to obtain correction factors, or correction functions, and constants for a strain based fatigue equation, where the total inelastic strain is determined analytically. The end product is a general flip chip on board equation rather than one that is specific to a certain test vehicle or material set. Electronics Flip chip Solder interconnect fatigue Predictive modeling Reliability Solder and soldering Fatigue Flip chip technology
39	Toward a predictive model of tumor growth Hawkins-Daarud, Andrea Jeanine 16 June 2011 (has links) In this work, an attempt is made to lay out a framework in which models of tumor growth can be built, calibrated, validated, and differentiated in their level of goodness in such a manner that all the uncertainties associated with each step of the modeling process can be accounted for in the final model prediction. The study can be divided into four basic parts. The first involves the development of a general family of mathematical models of interacting species representing the various constituents of living tissue, which generalizes those previously available in the literature. In this theory, surface effects are introduced by incorporating in the Helmholtz free ` gradients of the volume fractions of the interacting species, thus providing a generalization of the Cahn-Hilliard theory of phase change in binary media and leading to fourth-order, coupled systems of nonlinear evolution equations. A subset of these governing equations is selected as the primary class of models of tumor growth considered in this work. The second component of this study focuses on the emerging and fundamentally important issue of predictive modeling, the study of model calibration, validation, and quantification of uncertainty in predictions of target outputs of models. The Bayesian framework suggested by Babuska, Nobile, and Tempone is employed to embed the calibration and validation processes within the framework of statistical inverse theory. Extensions of the theory are developed which are regarded as necessary for certain scenarios in these methods to models of tumor growth. The third part of the study focuses on the numerical approximation of the diffuse-interface models of tumor growth and on the numerical implementations of the statistical inverse methods at the core of the validation process. A class of mixed finite element models is developed for the considered mass-conservation models of tumor growth. A family of time marching schemes is developed and applied to representative problems of tumor evolution. Finally, in the fourth component of this investigation, a collection of synthetic examples, mostly in two-dimensions, is considered to provide a proof-of-concept of the theory and methods developed in this work. / text Tumor growth Predictive modeling Bayesian calibration Validation Uncertainty quantification Applied math
40	Social Approaches to Disease Prediction Mansouri, Mehrdad 25 November 2014 (has links) Objective: This thesis focuses on design and evaluation of a disease prediction system that be able to detect hidden and upcoming diseases of an individual. Unlike previous works that has typically relied on precise medical examinations to extract symptoms and risk factors for computing probability of occurrence of a disease, the proposed disease prediction system is based on similar patterns of disease comorbidity in population and the individual to evaluate the risk of a disease. Methods: We combine three machine learning algorithms to construct the prediction system: an item based recommendation system, a Bayesian graphical model and a rule based recommender. We also propose multiple similarity measures for the recommendation system, each useful in a particular condition. We finally show how best values of parameters of the system can be derived from optimization of cost function and ROC curve. Results: A permutation test is designed to evaluate accuracy of the prediction system accurately. Results showed considerable advantage of the proposed system in compare to an item based recommendation system and improvements of prediction if system is trained for each specific gender and race. Conclusion: The proposed system has been shown to be a competent method in accurately identifying potential diseases in patients with multiple diseases, just based on their disease records. The procedure also contains novel soft computing and machine learning ideas that can be used in prediction problems. The proposed system has the possibility of using more complex datasets that include timeline of diseases, disease networks and social network. This makes it an even more capable platform for disease prediction. Hence, this thesis contributes to improvement of the disease prediction field. / Graduate / 0800 / 0766 / 0984 / mehrdadmansouri@yahoo.com Disease Prediction Comorbidity Probabilistic Graphical Model Recommender system Predictive Modeling Permutation Test

Search results