Global ETD Search

51	Source determination and predictive model development for Escherichia coli concentrations at F.W. Kent Park Lake, Oxford, Iowa Simmer, Reid A. 01 July 2016 (has links) Fecal contamination of Iowa recreational water bodies poses a threat to water quality as well as human health. Concern regarding the health effects of waterborne pathogens resulted in 149 beach advisories across 39 state-owned beaches during the 2015 beach season alone. While the presence of pollution is often clear, its cause and source may be difficult to identify. Furthermore, the current practice in Iowa of sampling once per week leads to high uncertainty and inadequately protects swimmers from exposure. The objective of this study was to determine the influential environmental factors and sources causing spikes in fecal contamination at F.W. Kent Park Lake in Oxford, IA, and to develop a predictive model of beach E. coli concentrations. Water samples were collected at the swimming beach as well as throughout the watershed from May to October, 2015. All samples were analyzed for Escherichia Coli using the IDEXX Colilert enumeration method. Together with weekly data from 2012 through 2014, two predictive models of E. coli based upon influential environmental and water quality variables were developed using EPA Virtual Beach software. These models proved to be more accurate than the current method used to assess risks to swimmers that assumes bacterial concentrations remain constant between samples. In addition, through statistical analysis and modeling, this study found evidence that the main source of fecal contamination were wild geese that frequent the beach. Bacteria Beach Escherichia Coli Predictive modeling Public Health Water Quality Geography
52	Predicting Delays In Delivery Process Using Machine Learning-Based Approach Shehryar Shahid (9745388) 16 December 2020 (has links) <div>There has been a great interest in applying Data Science, Machine Learning, and AI-related technologies in recent years. Industries are adopting these technologies very rapidly, which has enabled them to gather valuable data about their businesses. One such industry that can leverage this data to improve their business's output and quality is the logistics and transport industry. This phenomenon provides an excellent opportunity for companies who rely heavily on air transportation to leverage this data to gain valuable insights and improve their business operations. This thesis is aimed to leverage this data to develop techniques to model complex business processes and design a machine learning-based predictive analytical approach to predict process violations.</div><div>This thesis focused on solving delays in shipment delivery by modeling a prediction technique to predict these delays. The approach presented here was based on real airfreight shipping data, which follows the International Air and Transport Association industry standard for airfreight transportation, to identify shipments at risk of being delayed. By leveraging the shipment process structure, this research presented a new approach that solved the complex event-driven structure of airfreight data that made it difficult to model for predictive analytics.</div><div>By applying different data mining and machine learning techniques, prediction techniques were developed to predict delays in delivering airfreight shipments. The prediction techniques were based on random forest and gradient boosting algorithms. To compare and select the best model, the prediction results were interpreted in the form of six confusion matrix-based performance metrics. The results showed that all the predictors had a high specificity of over 90%, but the sensitivity was low, under 44%. Accuracy was observed to be over 75%, and a geometric mean was between 58% – 64%.</div><div>The performance metrics results provided evidence that our approach could be implemented to develop a prediction technique to model complex business processes. Additionally, an early prediction method was designed to test predictors' performance if complete process information was not available. This proposed method delivered compelling evidence suggesting that early prediction can be achieved without compromising the predictor’s performance.</div> Applied Computer Science Predictive Modeling Machine Learning Decision Making Logistics information system Data Science
53	Application of Machine Learning Strategies to Improve the Prediction of Changes in the Airline Network Topology Aleksandra Dervisevic (9873020) 18 December 2020 (has links) <div><p>Predictive modeling allows us to analyze historical patterns to forecast future events. When the data available for this analysis is imbalanced or skewed, many challenges arise. The lack of sensitivity towards the class with less data available hinders the sought-after predictive capabilities of the model. These imbalanced datasets are found across many different fields, including medical imaging, insurance claims and financial frauds. The objective of this thesis is to identify the challenges, and means to assess, the application of machine learning to transportation data that is imbalanced and using only one independent variable. </p><p>Airlines undergo a decision-making process on air route addition or deletion in order to adjust the services offered with respect to demand and cost, amongst other criteria. This process greatly affects the topology of the network, and results in a continuously evolving Air Traffic Network (ATN). Organizations like the Federal Aviation Administration (FAA) are interested in the network transformation and the influence airlines have as stakeholders. For this reason, they attempt to model the criteria used by airlines to modify routes. The goal is to be able to predict trends and dependencies observed in the network evolution, by understanding the relation between the number of passengers per flight leg as the single independent variable and the airline’s decision to keep or eliminate that route (the dependent variable). Research to date has used optimization-based methods and machine learning algorithms to model airlines’ decision-making process on air route addition and deletion, but these studies demonstrate less than a 50% accuracy. </p><p>In particular, two machine learning (ML) algorithms are examined: Sparse Gaussian Classification (SGC) and Deep Neural Networks (DNN). SGC is the extension of Gaussian Process Classification models to large datasets. These models use Gaussian Processes (GPs), which are proven to perform well in binary classification problems. DNN uses multiple layers of probabilities between the input and output layers. It is one of the most popular ML algorithms currently in use, so the results obtained using SGC were compared to the DNN model. </p><p>At a first glance, these two models appear to perform equally, giving a high accuracy output of 97.77%. However, post-processing the results using a simple Bayes classifier and using the appropriate metrics for measuring the performance of models trained with imbalanced datasets reveals otherwise. The results in both SGC and DNN provided predictions with a 1% of precision and 20% of recall with an score of 0.02 and an AUC (Area Under the Curve) of 0.38 and 0.31 respectively. The low score indicates the classifier is not performing accurately, and the AUC value confirms the inability of the models to differentiate between the classes. This is probably due to the existing interaction and competition of the airlines in the market, which is not captured by the models. Interestingly enough, the behavior of both models is very different across the range of threshold values. The SGC model captured more effectively the low confidence in these results. In order to validate the model, a stratified K-fold cross-validation model was run. </p>The future application of Gaussian Processes in model-building for decision-making will depend on a clear understanding of its limitations and the imbalanced datasets used in the process, the central purpose of this thesis. Future steps in this investigation include further analysis of the training data as well as the exploration of variable-optimization algorithms. The tuning process of the SGC model could be improved by utilizing optimal hyperparameters and inducing inputs.<br></div><div><div><br></div></div> Aerospace Engineering Predictive modeling Machine learning Gaussian processes Air traffic network Network topology Imbalanced data
54	Predicting customer level risk patterns in non-life insurance / Prediktering av riskmönster på kundnivå i sakförsäkring Villaume, Erik January 2012 (has links) Several models for predicting future customer profitability early into customer life-cycles in the property and casualty business are constructed and studied. The objective is to model risk at a customer level with input data available early into a private consumer’s lifespan. Two retained models, one using Generalized Linear Model another using a multilayer perceptron, a special form of Artificial Neural Network are evaluated using actual data. Numerical results show that differentiation on estimated future risk is most effective for customers with highest claim frequencies. Predictive Modeling Generalized Linear Models Artificial Neural Networks. Probability Theory and Statistics Sannolikhetsteori och statistik
55	Modeling strategies using predictive analytics : Forecasting future sales and churn management / Strategier för modelleringmedprediktiv analys Aronsson, Henrik January 2015 (has links) This project was carried out for a company named Attollo, a consulting firm specialized in Business Intelligence and Corporate Performance Management. The project aims to explore a new area for Attollo, predictive analytics, which is then applied to Klarna, a client of Attollo. Attollo has a partnership with IBM, which sells services for predictive analytics. The tool that this project is carried out with, is a software from IBM: SPSS Modeler. Five different examples are given of what and how the predictive work that was carried out at Klarna consisted of. From these examples, the different predictive models' functionality are described. The result of this project demonstrates, by using predictive analytics, how predictive models can be created. The conclusion is that predictive analytics enables companies to understand their customers better and hence make better decisions. / Detta projekt har utforts tillsammans med ett foretag som heter Attollo, en konsultfirma som ar specialiserade inom Business Intelligence & Coporate Performance Management. Projektet grundar sig pa att Attollo ville utforska ett nytt omrade, prediktiv analys, som sedan applicerades pa Klarna, en kund till Attollo. Attollo har ett partnerskap med IBM, som saljer tjanster for prediktiv analys. Verktyget som detta projekt utforts med, ar en mjukvara fran IBM: SPSS Modeler. Fem olika exempel beskriver det prediktiva arbetet som utfordes vid Klarna. Fran dessa exempel beskrivs ocksa de olika prediktiva modellernas funktionalitet. Resultatet av detta projekt visar hur man genom prediktiv analys kan skapa prediktiva modeller. Slutsatsen ar att prediktiv analys ger foretag storre mojlighet att forsta sina kunder och darav kunna gora battre beslut. Business Intelligence Predictive Analytics Data mining Data warehouse Predictive Modeling RFM Churn
56	Predictive modeling med maximal entropi för att förutsäga platser med fornnordisk bosättning Rönnlund, Elias January 2021 (has links) En komplett bild av bosättningar från förhistorisk tid har alltid varit svår att kartlägga med tanke på hur tiden gömt undan dessa platser och lämningar genom nedbrytning av det material de tillverkats av och uppbyggnaden av nya lager av sediment. Arkeologer har genom tiden använt sig av en mängd olika typer av metoder och tekniker för att finna spår av dessa förhistoriska lämningar. I modern tid har GIS blivit ett vanligt användningsområde till att assistera den här processen. I den här studien är det ”predictive modeling” som använts för att förutsäga sannolikheten av att kunna hitta nya arkeologiska fynd baserat på redan funna och dess samband med egenskaper i landskapet och miljön. Med en relativt ny metod som använder sig av principen för maximal entropi i sin algoritm hoppas den här studien kunna visa prov på potentialen för den här tekniken i Sverige till att underlätta arkeologers arbete samt ge en inblick i det förflutna när det gäller människors framgång och val av bosättning. Genom att skapa modeller med programvaran Maxent producerades sannolikhetskartor över studieområdet baserat på 221 fyndplatser och upp till 16 faktorer samt statistiska diagram för att ge en djupare inblick i modellens byggnadsprocess. Validering av resultatet visade prov på mycket stor framgång. Trots det utmärkta resultatet finns en viss skepsis i hur behjälplig just den här modellen vore för arkeologin i att hitta nya bosättningar från forntiden. I och med att den här studien är rätt begränsad i sin tillgång till data har den ändå visat potentialen i hur algoritmer med användning av principen för maximal entropi har för arkeologin inom Sverige. Med ett större och mer precisare urval av fyndplatser och faktorer, både över miljö, landskap och övrigt, har modeller som denna en stor potential till att både assistera arkeologin att hitta fortfarande gömda fornnordiska boplatser och utvinna information om forntida människors liv och samhällen. GIS predictive modeling maximal entropi maxent arkeologi Other Computer and Information Science Annan data- och informationsvetenskap
57	It's All Downhill From Here: A Forecast of Subsidence Rates in the Lower Mississippi River Industrial Corridor Harris, Joseph B., Joyner, T. Andrew, Rohli, Robert V., Friedland, Carol J., Tollefson, William C. 01 January 2020 (has links) Southeast Louisiana is susceptible to the impact of subsidence due to natural and anthropogenic processes including sediment compaction and loading, fluid withdrawal, and faulting. Subsidence rates in Southeast Louisiana are higher than anywhere else in the United States, and the impact of subsidence rates on industrial complexes has not been studied. Spatial interpolation methods were analyzed to determine the best fit for subsidence rates and to create a predictive surface for the lower Mississippi River Industrial corridor (LMRIC). Empirical Bayesian kriging, ordinary kriging, universal kriging, and inverse distance weighted interpolation methods were applied to the 2004 National Oceanic and Atmospheric Administration (NOAA) published Technical Report #50 dataset and cross-validation methods were utilized to determine the accuracy of each method. The mean error and root mean square error were calculated for each interpolation method, then used to detect bias and compare the predicted value with the actual observation value. Cross-validation estimates are comparable for each method statistically and visually; however, the results indicate the empirical Bayesian kriging interpolation method is the most accurate of the methods using the lowest mean error and root mean square error scores. Digital elevation models for the years 2025, 2050, and 2075 were developed based on the predictive surface of subsidence rates using the results from the empirical Bayesian kriging interpolation method. Results indicate that by 2025, 31.4% of landmass in the LMRIC will be below 0 m NAVD88, with 40.4% below 0 m NAVD88 by 2050, and 51.8% by 2075. Subsidence rates in the LMRIC range from approximately 16 mm to less than one mm per year. Nine of the 122 industrial complexes located in the LMRIC are estimated to be below 0 m NAVD88 by the year 2075. Limited economic impacts can be inferred based on the number of facilities impacted; however, service disruptions due to subsidence impacting infrastructure surrounding these industrial complexes would have catastrophic economic impacts on a regional, state, and national level. geographic information system geostatistics interpolation land loss Louisiana predictive modeling subsidence Geosciences
58	Leveraging Machine Learning for Pattern Discovery and Decision Optimization on Last-minute Surgery Cancellation Liu, Lei January 2021 (has links) No description available. Health Sciences pediatric surgery cancellation quality improvement predictive modeling machine learning spatial regression models socioeconomic factors
59	PROGNOSTIC MODELS OF CLINICAL OUTCOMES AND PREDICTIVE MODELS OF TREATMENT RESPONSE IN PRECISION PSYCHIATRY Watts, Devon January 2022 (has links) In this thesis, we developed prognostic models of clinical outcomes, specific to violent and criminal outcomes in psychiatry, and predictive models of treatment response at an individual level. Overall, we demonstrate that evidence-based risk factors, protective factors, and treatment status variables were able to prognosticate prospective physical aggression at an individual level; 2) prognostic models of clinical and violent outcomes in psychiatry have largely focused on clinical and sociodemographic variables, show similar performance between identifying true positives and true negatives, although the error rate of models are still high, and further refinement is needed; 3) within treatment response prediction models in MDD using EEG, greater performance was observed in predicting response to rTMS, relative to antidepressants, and across models, greater sensitivity (true positives), were observed relative to specificity (true negatives), suggesting that EEG prediction models thus far better identify non-responders than responders; and 4) across randomized clinical trials using data-driven biomarkers in predictive models, based on the consistency of performance across models with large sample sizes, the highest degree of evidence was in predicting response to sertraline and citalopram using fMRI features. / Dissertation / Doctor of Philosophy (PhD) precision psychiatry individualized predictive modeling psychotic disorders treatment response data-driven biomarkers
60	Genetic Algorighm Representation Selection Impact on Binary Classification Problems Maldonado, Stephen V 01 January 2022 (has links) In this thesis, we explore the impact of problem representation on the ability for the genetic algorithms (GA) to evolve a binary prediction model to predict whether a physical therapist is paid above or below the median amount from Medicare. We explore three different problem representations, the vector GA (VGA), the binary GA (BGA), and the proportional GA (PGA). We find that all three representations can produce models with high accuracy and low loss that are better than Scikit-Learn’s logistic regression model and that all three representations select the same features; however, the PGA representation tends to create lower weights than the VGA and BGA. We also find that mutation rate creates more of a difference in accuracy when comparing the individual with the best fitness (lowest binary cross entropy loss) and the most accurate solution when the mutation rate is higher. We then explore potential of biases in the PGA mapping functions that may encourage the lower values. We find that the PGA has biases on the values they can encode depending on the mapping function; however, since we do not find a bias towards lower values for all tested mapping functions, it is more likely that it is more difficult for the PGA to encode more extreme values given crossover tends to have an averaging effect on the PGA chromosome. Genetic Algorithms Binary Classification Optimization Machine Learning Linear Models Predictive Modeling Computer Sciences

Search results