Global ETD Search

281	Thesis_deposit.pdf Sehwan Kim (15348235) 25 April 2023 (has links) <p> Adaptive MCMC is advantageous over traditional MCMC due to its ability to automatically adjust its proposal distributions during the sampling process, providing improved sampling efficiency and faster convergence to the target distribution, especially in complex or high-dimensional problems. However, designing and validating the adaptive scheme cautiously is crucial to ensure algorithm validity and prevent the introduction of biases. This dissertation focuses on the use of Adaptive MCMC for deep learning, specifically addressing the mode collapse issue in Generative Adversarial Networks (GANs) and implementing Fiducial inference, and its application to Causal inference in individual treatment effect problems.</p> <p><br></p> <p> First, GAN was recently introduced in the literature as a novel machine learning method for training generative models. However, GAN is very difficult to train due to the issue of mode collapse, i.e., lack of diversity among generated data. We figure out the reason why GAN suffers from this issue and lay out a new theoretical framework for GAN based on randomized decision rules such that the mode collapse issue can be overcome essentially. Under the new theoretical framework, the discriminator converges to a fixed point while the generator converges to a distribution at the Nash equilibrium.</p> <p><br></p> <p> Second, Fiducial inference was generally considered as R.A. Fisher's a big blunder, but the goal he initially set, <em>making inference for the uncertainty of model parameters on the basis of observations</em>, has been continually pursued by many statisticians. By leveraging on advanced statistical computing techniques such as stochastic approximation Markov chain Monte Carlo, we develop a new statistical inference method, the so-called extended Fiducial inference, which achieves the initial goal of fiducial inference. </p> <p><br></p> <p> Lastly, estimating ITE is important for decision making in various fields, particularly in health research where precision medicine is being investigated. Conditional average treatment effect (CATE) is often used for such purpose, but uncertainty quantification and explaining the variability of predicted ITE is still needed for fair decision making. We discuss using extended Fiducial inference to construct prediction intervals for ITE, and introduces a double neural net algorithm for efficient prediction and estimation of nonlinear ITE.</p> Computational statistics Statistical data science Statistical theory adaptive MCMC techniques Generative Adversarial Net Fiducial inference Casual inference Individual treatment effects Stochastic Approximation Monte Carlo
282	[en] A CLOUD COMPUTING PLATFORM FOR STORING GEOREFERENCED MOBILITY DATA / [pt] UMA PLATAFORMA NA NUVEM PARA ARMAZENAMENTO DE DADOS GEORREFERENCIADOS DE MOBILIDADE URBANA RAFAEL BARBOSA NASSER 15 December 2016 (has links) [pt] A qualidade de vida nos grandes centros urbanos tem sido motivo de preocupação para governantes, empresários e para a população residente em geral. Os serviços de transporte público coletivo exercem papel central nessa discussão, uma vez que determinam, sobretudo para aquela camada da sociedade de menor poder aquisitivo, o tempo desperdiçado diariamente em seus deslocamentos. Nas metrópoles brasileiras, os ônibus municipais são predominantes no transporte coletivo. Os usuários deste serviço – passageiros – não dispõem de informações atualizadas sobre os ônibus e linhas de ônibus em operação. Oferecer essa natureza de informação contribui para uma melhor experiência de uso diário deste modal e, consequentemente, proporciona maior qualidade de vida aos seus usuários. Em uma visão mais abrangente, os ônibus podem ser considerados sensores que viabilizam a compreensão dos padrões e identificação de anomalias no tráfego de veículos nas áreas urbanas, possibilitando galgar benefícios para toda população. O presente trabalho apresenta uma plataforma na nuvem que captura, enriquece, armazena e disponibiliza os dados dos dispositivos de GPS instalados nos ônibus, permitindo a extração de conhecimento a partir deste valioso e volumoso conjunto de informações. Experimentos são realizados com os ônibus do Município do Rio de Janeiro, com aplicações focadas no passageiro e na sociedade. As metodologias, discussões e técnicas empregadas ao longo do trabalho poderão ser reutilizados para diferentes cidades, modais e perspectivas. / [en] The quality of life in urban centers has been a concern for governments, business and the resident population in general. Public transportation services perform a central role in this discussion, since they determine, especially for that layer of lower-income society, the time wasted daily in their movements. In Brazilian cities, city buses are predominant in public transportion. Users of this service - passengers - do not have updated information of buses and lines. Offer this kind of information contributes to a better everyday experience of this modal and therefore provides greater quality of life for its users. In a broader view, the bus can be considered sensors that enable the understanding of the patterns and identify anomalies in vehicle traffic in urban areas, allowing benefits for the whole population. This work presents a platform in the cloud computing environment that captures, enriches, stores and makes available the data from GPS devices installed on buses, allowing the extraction of knowledge from this valuable and voluminous set of information. Experiments are performed with the buses of the Municipality of Rio de Janeiro, with applications focused on passenger and society. The methodologies, discussions and techniques used throughout the work can be reused for different cities, modal and perspectives. [pt] ENGENHARIA DE SOFTWARE [pt] COMPUTACAO NA NUVEM [pt] GEOPROCESSAMENTO [pt] CIENCIA DE DADOS [pt] MOBILIDADE URBANA [pt] COMPUTACAO PARALELA [pt] INFORMATICA [en] SOFTWARE ENGINEERING [en] GEOPROCESSING [en] DATA SCIENCE [en] URBAN MOBILITY [en] PARALLEL COMPUTING [en] COMPUTER SCIENCE
283	Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease. / Knowledge Discovery och Data mining med hjälp av demografiska och kliniska data för att diagnostisera hjärtsjukdomar. Fernandez Sanchez, Javier January 2018 (has links) Cardiovascular disease (CVD) is the leading cause of morbidity, mortality, premature death and reduced quality of life for the citizens of the EU. It has been reported that CVD represents a major economic load on health care sys- tems in terms of hospitalizations, rehabilitation services, physician visits and medication. Data Mining techniques with clinical data has become an interesting tool to prevent, diagnose or treat CVD. In this thesis, Knowledge Dis- covery and Data Mining (KDD) was employed to analyse clinical and demographic data, which could be used to diagnose coronary artery disease (CAD). The exploratory data analysis (EDA) showed that female patients at an el- derly age with a higher level of cholesterol, maximum achieved heart rate and ST-depression are more prone to be diagnosed with heart disease. Furthermore, patients with atypical angina are more likely to be at an elderly age with a slightly higher level of cholesterol and maximum achieved heart rate than asymptotic chest pain patients. More- over, patients with exercise induced angina contained lower values of maximum achieved heart rate than those who do not experience it. We could verify that patients who experience exercise induced angina and asymptomatic chest pain are more likely to be diagnosed with heart disease. On the other hand, Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Decision Tree, Bagging and Boosting methods were evaluated by adopting a stratified 10 fold cross-validation approach. The learning models provided an average of 78-83% F-score and a mean AUC of 85-88%. Among all the models, the highest score is given by Radial Basis Function Kernel Support Vector Machines (RBF-SVM), achieving 82.5% ± 4.7% of F-score and an AUC of 87.6% ± 5.8%. Our research con- firmed that data mining techniques can support physicians in their interpretations of heart disease diagnosis in addition to clinical and demographic characteristics of patients. machine learning data science artificial intelligence data mining cardiovascular disease CVD exploratory analysis EDA clinical data support vector machines preprocessing decision trees logistic regression KNN adaboost xgboost random forest health healthcare Medical Engineering Medicinteknik
284	Improving The Robustness of Artificial Neural Networks via Bayesian Approaches Jun Zhuang (16456041) 30 August 2023 (has links) <p>Artificial neural networks (ANNs) have achieved extraordinary performance in various domains in recent years. However, some studies reveal that ANNs may be vulnerable in three aspects: label scarcity, perturbations, and open-set emerging classes. Noisy labeling and self-supervised learning approaches address the label scarcity issues, but most of the work couldn't handle the perturbations. Adversarial training methods, topological denoising methods, and mechanism designing methods aim to mitigate the negative effects caused by perturbations. However, adversarial training methods can barely train a robust model under the circumstance of extensive label scarcity; topological denoising methods are not efficient on dynamic data structures; and mechanism designing methods often depend on heuristic explorations. Detection-based methods devote to identifying novel or anomaly instances for further downstream tasks. Nonetheless, such instances may belong to open-set new emerging classes. To embrace the aforementioned challenges, we address the robustness issues of ANNs from two aspects. First, we propose a series of Bayesian label transition models to improve the robustness of Graph Neural Networks (GNNs) in the presence of label scarcity and perturbations in the graph domain. Second, we propose a new non-exhaustive learning model, named NE-GM-GAN, to handle both open-set problems and class-imbalance issues in network intrusion datasets. Extensive experiments with several datasets demonstrate that our proposed models can effectively improve the robustness of ANNs.</p> Artificial Neural Networks Graph Neural Networks (GNNs) Generative Adversarial Learning Bayesian Inference Adversarial Defense Noisy Label Learning Open-set Learning
285	Rewiring Police Officer Training Networks to Reduce Forecasted Use of Force Ritika Pandey (9147281) 30 August 2023 (has links) <p><br></p> <p>Police use of force has become a topic of significant concern, particularly given the disparate impact on communities of color. Research has shown that police officer involved shootings, misconduct and excessive use of force complaints exhibit network effects, where officers are at greater risk of being involved in these incidents when they socialize with officers who have a history of use of force and misconduct. Given that use of force and misconduct behavior appear to be transmissible across police networks, we are attempting to address if police networks can be altered to reduce use of force and misconduct events in a limited scope.</p> <p><br></p> <p>In this work, we analyze a novel dataset from the Indianapolis Metropolitan Police Department on officer field training, subsequent use of force, and the role of network effects from field training officers. We construct a network survival model for analyzing time-to-event of use of force incidents involving new police trainees. The model includes network effects of the diffusion of risk from field training officers (FTOs) to trainees. We then introduce a network rewiring algorithm to maximize the expected time to use of force events upon completion of field training. We study several versions of the algorithm, including constraints that encourage demographic diversity of FTOs. The results show that FTO use of force history is the best predictor of trainee's time to use of force in the survival model and rewiring the network can increase the expected time (in days) of a recruit's first use of force incident by 8%. </p> <p>We then discuss the potential benefits and challenges associated with implementing such an algorithm in practice.</p> <p><br></p> Data mining and knowledge discovery Statistical data science network optimization annealing police use of force
286	SOARNET, Deep Learning Thermal Detection for Free Flight Tallman, Jake T 01 June 2021 (has links) (PDF) Thermals are regions of rising hot air formed on the ground through the warming of the surface by the sun. Thermals are commonly used by birds and glider pilots to extend flight duration, increase cross-country distance, and conserve energy. This kind of powerless flight using natural sources of lift is called soaring. Once a thermal is encountered, the pilot flies in circles to keep within the thermal, so gaining altitude before flying off to the next thermal and towards the destination. A single thermal can net a pilot thousands of feet of elevation gain, however estimating thermal locations is not an easy task. Pilots look for different indicators: color variation on the ground because the difference in the amount of heat absorbed by the ground varies based on the color/composition, birds circling in an area gaining lift, and certain types of cloud formations (cumulus clouds). The above methods are not always reliable enough and pilots study the weather for thermals by estimating solar heating of the ground using cloud cover and time of year and the lapse rate and dew point of the troposphere. In this paper, we present a Machine Learning based solution for assisting in forecasting thermals. We created a custom dataset using flight data recorded and uploaded to public databases by soaring pilots. We determine where and when the pilot encountered thermals to pull weather and satellite images corresponding to the location and time of the flight. Using this dataset we train an algorithm to automatically predict the location of thermals given as input the current weather conditions and terrain information obtained from Google Earth Engine and thermal regions encountered as truth labels. We were able to converge very well on the training and validation set, proving our method with around a 0.98 F1 score. These results indicate success in creating a custom dataset and a powerful neural network with the necessity of bolstering our custom dataset. Machine Learning Free Flight Image Segmentation Satellite Imagery Deep Learning Gliding Artificial Intelligence and Robotics Aviation Safety and Security Databases and Information Systems Data Science Environmental Monitoring Fluid Dynamics Geology Other Aerospace Engineering
287	ASSESSMENT OF VARIABILITY OF LAND USE IMPACTS ON WATER QUALITY CONTAMINANTS Johann Alexander Vera (14103150), Bernard A. Engel (5644601) 10 December 2022 (has links) <p> The hydrological cycle is affected by land use variability. Land use spatial and temporal variability has the power to alter watershed runoff, water resource quantity and quality, ecosystems, and environmental sustainability. In recent decades, agriculture lands, pastures, plantations, and urban areas have increased, resulting in significant increases in energy, water, and fertilizer usage, as well as significant biodiversity losses. </p> Land use and environmental planning Surface water hydrology Statistical data science water quality contaminants swat linear mixed model
288	DETERMINING STRUCTURE AND GROWTH CHARACTERISTICS OF OXIDEHETEROSTRUCTURES THROUGH DEPOSITION AND DATA SCIENCE: TOWARDS SINGLE CRYSTAL BATTERIES Fraser, Kimberly 27 January 2023 (has links) No description available. Materials Science Statistics Chemistry Chemical Engineering Computer Science Engineering Pulsed laser deposition thin film data science battery lithium ion transmission electron microscopy RHEED supervised learning unsupervised learning
289	Convolution and Autoencoders Applied to Nonlinear Differential Equations Borquaye, Noah 01 December 2023 (has links) (PDF) Autoencoders, a type of artificial neural network, have gained recognition by researchers in various fields, especially machine learning due to their vast applications in data representations from inputs. Recently researchers have explored the possibility to extend the application of autoencoders to solve nonlinear differential equations. Algorithms and methods employed in an autoencoder framework include sparse identification of nonlinear dynamics (SINDy), dynamic mode decomposition (DMD), Koopman operator theory and singular value decomposition (SVD). These approaches use matrix multiplication to represent linear transformation. However, machine learning algorithms often use convolution to represent linear transformations. In our work, we modify these approaches to system identification and forecasting of solutions of nonlinear differential equations by replacing matrix multiplication with convolution transformation. In particular, we develop convolution-based approach to dynamic mode decomposition and discuss its application to problems not solvable otherwise. convolution autoencoder neural network linear transformation Python system identification eigendecomposition Applied Mathematics Computational Engineering Computer Engineering Data Science Dynamical Systems Education Mathematics Other Mathematics Other Physical Sciences and Mathematics Physical Sciences and Mathematics Systems Science
290	Learning From Data Across Domains: Enhancing Human and Machine Understanding of Data From the Wild Sean Michael Kulinski (17593182) 13 December 2023 (has links) <p dir="ltr">Data is collected everywhere in our world; however, it often is noisy and incomplete. Different sources of data may have different characteristics, quality levels, or come from dynamic and diverse environments. This poses challenges for both humans who want to gain insights from data and machines which are learning patterns from data. How can we leverage the diversity of data across domains to enhance our understanding and decision-making? In this thesis, we address this question by proposing novel methods and applications that use multiple domains as more holistic sources of information for both human and machine learning tasks. For example, to help human operators understand environmental dynamics, we show the detection and localization of distribution shifts to problematic features, as well as how interpretable distributional mappings can be used to explain the differences between shifted distributions. For robustifying machine learning, we propose a causal-inspired method to find latent factors that are robust to environmental changes and can be used for counterfactual generation or domain-independent training; we propose a domain generalization framework that allows for fast and scalable models that are robust to distribution shift; and we introduce a new dataset based on human matches in StarCraft II that exhibits complex and shifting multi-agent behaviors. We showcase our methods across various domains such as healthcare, natural language processing (NLP), computer vision (CV), etc. to demonstrate that learning from data across domains can lead to more faithful representations of data and its generating environments for both humans and machines.</p> Knowledge representation and reasoning Natural language processing Planning and decision making Data engineering and data science Data mining and knowledge discovery Stream and sensor data Human-computer interaction Mixed initiative and human-in-the-loop Machine Learning Distribution Shifts Domain Generalization Artificial Intelligence

Search results