Global ETD Search

281	Decomposition and Stability of Multiparameter Persistence Modules Cheng Xin (16750956) 04 August 2023 (has links) <p>The only datasets used in my thesis work are from TUDatasets, <a href="https://chrsmrrs.github.io/datasets/">TUDataset \| TUD Benchmark datasets (chrsmrrs.github.io)</a>, a collection of public benchmark datasets for graph classification and regression.</p><p><br></p> Data engineering and data science persistence homology computational topology computational geometry persistent homology multiparameter persistence modules persistence module Decomposition (Mathematics) stability algorithm
282	An Open-Source Framework for Large-Scale ML Model Serving Sigfridsson, Petter January 2022 (has links) The machine learning (ML) industry has taken great strides forward and is today facing new challenges. Many more models are developed, used and served within the industry. Datasets that models are trained on, are constantly changing. This demands that modern machine learning processes can handle large number of models, extreme load and support recurring updates in a scalable manner. To handle these challenges, there is a concept called model serving. Model serving is a relatively new concept where more efforts are required to address both conceptual and technical challenges. Existing ML model serving solutions aim to be scalable for the purpose of serving one model at a time. The industry itself requires that the whole ML process, the number of served models and that recurring updates are scalable. That is why this thesis presents an open-source framework for large-scale ML model serving that aims to meet the requirements of today’s ML industry. The presented framework is proven to handle a large-scale ML model serving environment in a scalable way but with some limitations. Results show that the number of parallel requests the framework can handle can be optimized. This would make the solution more efficient in the sense of resource utilization. One avenue for future improvements could be to integrate the developed framework as an application into the open-source machine learning platform STACKn. Model Serving Machine Learning MLOps DevOps Distributed Computing Infrastructure System Scalability and Performance Cloud Infrastructure Open-Source Software Horizontal Scalability Data Engineering Cloud Computing AI Data Science Engineering and Technology Teknik och teknologier
283	An analysis of text-based machine learning models for vulnerability detection Napier, Kollin Ryne 12 May 2023 (has links) (PDF) With an increase in complexity of software, developers rely more on reuse and dependencies in their source code via code snippets. As a result, it is becoming harder to identify and mitigate vulnerabilities. Although traditional analysis tools are still utilized, machine learning models are being adopted to expand efforts and combat such threats. Given the possibilities towards usage of such models, research in this area has introduced various approaches which vary in usability and prediction. In generalizing models to a more natural language approach, researchers have opted to train models on source code to identify existing and potential vulnerabilities. Exploratory research has been performed by treating source code as plain text, creating “text-based” models. With a motivation to prevent vulnerable code snippets, we present a dissertation on the effectiveness of text-based machine learning models for vulnerability detection. We utilize datasets composed of open-source projects and vulnerability types to generate our own training and testing data via extracted function pairings. Using this data, we evaluate a series of text-based machine learning models, coupled with natural language processing (NLP) techniques and our own data processing methods. Through empirical research, we demonstrate the effectiveness of such models based on statistical evidence. From these results, we determine negative correlations and identify "cross-cutting" features. Finally, we present analysis of models with "cross-cutting" feature removal to improve performance while providing explainability towards model decisions. text-based machine learning vulnerability detection natural language processing data analysis explainability text-based analysis machine learning deep learning neural networks Computer Sciences Data Science Information Security Software Engineering
284	Thesis_deposit.pdf Sehwan Kim (15348235) 25 April 2023 (has links) <p> Adaptive MCMC is advantageous over traditional MCMC due to its ability to automatically adjust its proposal distributions during the sampling process, providing improved sampling efficiency and faster convergence to the target distribution, especially in complex or high-dimensional problems. However, designing and validating the adaptive scheme cautiously is crucial to ensure algorithm validity and prevent the introduction of biases. This dissertation focuses on the use of Adaptive MCMC for deep learning, specifically addressing the mode collapse issue in Generative Adversarial Networks (GANs) and implementing Fiducial inference, and its application to Causal inference in individual treatment effect problems.</p> <p><br></p> <p> First, GAN was recently introduced in the literature as a novel machine learning method for training generative models. However, GAN is very difficult to train due to the issue of mode collapse, i.e., lack of diversity among generated data. We figure out the reason why GAN suffers from this issue and lay out a new theoretical framework for GAN based on randomized decision rules such that the mode collapse issue can be overcome essentially. Under the new theoretical framework, the discriminator converges to a fixed point while the generator converges to a distribution at the Nash equilibrium.</p> <p><br></p> <p> Second, Fiducial inference was generally considered as R.A. Fisher's a big blunder, but the goal he initially set, <em>making inference for the uncertainty of model parameters on the basis of observations</em>, has been continually pursued by many statisticians. By leveraging on advanced statistical computing techniques such as stochastic approximation Markov chain Monte Carlo, we develop a new statistical inference method, the so-called extended Fiducial inference, which achieves the initial goal of fiducial inference. </p> <p><br></p> <p> Lastly, estimating ITE is important for decision making in various fields, particularly in health research where precision medicine is being investigated. Conditional average treatment effect (CATE) is often used for such purpose, but uncertainty quantification and explaining the variability of predicted ITE is still needed for fair decision making. We discuss using extended Fiducial inference to construct prediction intervals for ITE, and introduces a double neural net algorithm for efficient prediction and estimation of nonlinear ITE.</p> Computational statistics Statistical data science Statistical theory adaptive MCMC techniques Generative Adversarial Net Fiducial inference Casual inference Individual treatment effects Stochastic Approximation Monte Carlo
285	[en] A CLOUD COMPUTING PLATFORM FOR STORING GEOREFERENCED MOBILITY DATA / [pt] UMA PLATAFORMA NA NUVEM PARA ARMAZENAMENTO DE DADOS GEORREFERENCIADOS DE MOBILIDADE URBANA RAFAEL BARBOSA NASSER 15 December 2016 (has links) [pt] A qualidade de vida nos grandes centros urbanos tem sido motivo de preocupação para governantes, empresários e para a população residente em geral. Os serviços de transporte público coletivo exercem papel central nessa discussão, uma vez que determinam, sobretudo para aquela camada da sociedade de menor poder aquisitivo, o tempo desperdiçado diariamente em seus deslocamentos. Nas metrópoles brasileiras, os ônibus municipais são predominantes no transporte coletivo. Os usuários deste serviço – passageiros – não dispõem de informações atualizadas sobre os ônibus e linhas de ônibus em operação. Oferecer essa natureza de informação contribui para uma melhor experiência de uso diário deste modal e, consequentemente, proporciona maior qualidade de vida aos seus usuários. Em uma visão mais abrangente, os ônibus podem ser considerados sensores que viabilizam a compreensão dos padrões e identificação de anomalias no tráfego de veículos nas áreas urbanas, possibilitando galgar benefícios para toda população. O presente trabalho apresenta uma plataforma na nuvem que captura, enriquece, armazena e disponibiliza os dados dos dispositivos de GPS instalados nos ônibus, permitindo a extração de conhecimento a partir deste valioso e volumoso conjunto de informações. Experimentos são realizados com os ônibus do Município do Rio de Janeiro, com aplicações focadas no passageiro e na sociedade. As metodologias, discussões e técnicas empregadas ao longo do trabalho poderão ser reutilizados para diferentes cidades, modais e perspectivas. / [en] The quality of life in urban centers has been a concern for governments, business and the resident population in general. Public transportation services perform a central role in this discussion, since they determine, especially for that layer of lower-income society, the time wasted daily in their movements. In Brazilian cities, city buses are predominant in public transportion. Users of this service - passengers - do not have updated information of buses and lines. Offer this kind of information contributes to a better everyday experience of this modal and therefore provides greater quality of life for its users. In a broader view, the bus can be considered sensors that enable the understanding of the patterns and identify anomalies in vehicle traffic in urban areas, allowing benefits for the whole population. This work presents a platform in the cloud computing environment that captures, enriches, stores and makes available the data from GPS devices installed on buses, allowing the extraction of knowledge from this valuable and voluminous set of information. Experiments are performed with the buses of the Municipality of Rio de Janeiro, with applications focused on passenger and society. The methodologies, discussions and techniques used throughout the work can be reused for different cities, modal and perspectives. [pt] ENGENHARIA DE SOFTWARE [pt] COMPUTACAO NA NUVEM [pt] GEOPROCESSAMENTO [pt] CIENCIA DE DADOS [pt] MOBILIDADE URBANA [pt] COMPUTACAO PARALELA [pt] INFORMATICA [en] SOFTWARE ENGINEERING [en] GEOPROCESSING [en] DATA SCIENCE [en] URBAN MOBILITY [en] PARALLEL COMPUTING [en] COMPUTER SCIENCE
286	Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease. / Knowledge Discovery och Data mining med hjälp av demografiska och kliniska data för att diagnostisera hjärtsjukdomar. Fernandez Sanchez, Javier January 2018 (has links) Cardiovascular disease (CVD) is the leading cause of morbidity, mortality, premature death and reduced quality of life for the citizens of the EU. It has been reported that CVD represents a major economic load on health care sys- tems in terms of hospitalizations, rehabilitation services, physician visits and medication. Data Mining techniques with clinical data has become an interesting tool to prevent, diagnose or treat CVD. In this thesis, Knowledge Dis- covery and Data Mining (KDD) was employed to analyse clinical and demographic data, which could be used to diagnose coronary artery disease (CAD). The exploratory data analysis (EDA) showed that female patients at an el- derly age with a higher level of cholesterol, maximum achieved heart rate and ST-depression are more prone to be diagnosed with heart disease. Furthermore, patients with atypical angina are more likely to be at an elderly age with a slightly higher level of cholesterol and maximum achieved heart rate than asymptotic chest pain patients. More- over, patients with exercise induced angina contained lower values of maximum achieved heart rate than those who do not experience it. We could verify that patients who experience exercise induced angina and asymptomatic chest pain are more likely to be diagnosed with heart disease. On the other hand, Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Decision Tree, Bagging and Boosting methods were evaluated by adopting a stratified 10 fold cross-validation approach. The learning models provided an average of 78-83% F-score and a mean AUC of 85-88%. Among all the models, the highest score is given by Radial Basis Function Kernel Support Vector Machines (RBF-SVM), achieving 82.5% ± 4.7% of F-score and an AUC of 87.6% ± 5.8%. Our research con- firmed that data mining techniques can support physicians in their interpretations of heart disease diagnosis in addition to clinical and demographic characteristics of patients. machine learning data science artificial intelligence data mining cardiovascular disease CVD exploratory analysis EDA clinical data support vector machines preprocessing decision trees logistic regression KNN adaboost xgboost random forest health healthcare Medical Engineering Medicinteknik
287	Improving The Robustness of Artificial Neural Networks via Bayesian Approaches Jun Zhuang (16456041) 30 August 2023 (has links) <p>Artificial neural networks (ANNs) have achieved extraordinary performance in various domains in recent years. However, some studies reveal that ANNs may be vulnerable in three aspects: label scarcity, perturbations, and open-set emerging classes. Noisy labeling and self-supervised learning approaches address the label scarcity issues, but most of the work couldn't handle the perturbations. Adversarial training methods, topological denoising methods, and mechanism designing methods aim to mitigate the negative effects caused by perturbations. However, adversarial training methods can barely train a robust model under the circumstance of extensive label scarcity; topological denoising methods are not efficient on dynamic data structures; and mechanism designing methods often depend on heuristic explorations. Detection-based methods devote to identifying novel or anomaly instances for further downstream tasks. Nonetheless, such instances may belong to open-set new emerging classes. To embrace the aforementioned challenges, we address the robustness issues of ANNs from two aspects. First, we propose a series of Bayesian label transition models to improve the robustness of Graph Neural Networks (GNNs) in the presence of label scarcity and perturbations in the graph domain. Second, we propose a new non-exhaustive learning model, named NE-GM-GAN, to handle both open-set problems and class-imbalance issues in network intrusion datasets. Extensive experiments with several datasets demonstrate that our proposed models can effectively improve the robustness of ANNs.</p> Artificial Neural Networks Graph Neural Networks (GNNs) Generative Adversarial Learning Bayesian Inference Adversarial Defense Noisy Label Learning Open-set Learning
288	Rewiring Police Officer Training Networks to Reduce Forecasted Use of Force Ritika Pandey (9147281) 30 August 2023 (has links) <p><br></p> <p>Police use of force has become a topic of significant concern, particularly given the disparate impact on communities of color. Research has shown that police officer involved shootings, misconduct and excessive use of force complaints exhibit network effects, where officers are at greater risk of being involved in these incidents when they socialize with officers who have a history of use of force and misconduct. Given that use of force and misconduct behavior appear to be transmissible across police networks, we are attempting to address if police networks can be altered to reduce use of force and misconduct events in a limited scope.</p> <p><br></p> <p>In this work, we analyze a novel dataset from the Indianapolis Metropolitan Police Department on officer field training, subsequent use of force, and the role of network effects from field training officers. We construct a network survival model for analyzing time-to-event of use of force incidents involving new police trainees. The model includes network effects of the diffusion of risk from field training officers (FTOs) to trainees. We then introduce a network rewiring algorithm to maximize the expected time to use of force events upon completion of field training. We study several versions of the algorithm, including constraints that encourage demographic diversity of FTOs. The results show that FTO use of force history is the best predictor of trainee's time to use of force in the survival model and rewiring the network can increase the expected time (in days) of a recruit's first use of force incident by 8%. </p> <p>We then discuss the potential benefits and challenges associated with implementing such an algorithm in practice.</p> <p><br></p> Data mining and knowledge discovery Statistical data science network optimization annealing police use of force
289	SOARNET, Deep Learning Thermal Detection for Free Flight Tallman, Jake T 01 June 2021 (has links) (PDF) Thermals are regions of rising hot air formed on the ground through the warming of the surface by the sun. Thermals are commonly used by birds and glider pilots to extend flight duration, increase cross-country distance, and conserve energy. This kind of powerless flight using natural sources of lift is called soaring. Once a thermal is encountered, the pilot flies in circles to keep within the thermal, so gaining altitude before flying off to the next thermal and towards the destination. A single thermal can net a pilot thousands of feet of elevation gain, however estimating thermal locations is not an easy task. Pilots look for different indicators: color variation on the ground because the difference in the amount of heat absorbed by the ground varies based on the color/composition, birds circling in an area gaining lift, and certain types of cloud formations (cumulus clouds). The above methods are not always reliable enough and pilots study the weather for thermals by estimating solar heating of the ground using cloud cover and time of year and the lapse rate and dew point of the troposphere. In this paper, we present a Machine Learning based solution for assisting in forecasting thermals. We created a custom dataset using flight data recorded and uploaded to public databases by soaring pilots. We determine where and when the pilot encountered thermals to pull weather and satellite images corresponding to the location and time of the flight. Using this dataset we train an algorithm to automatically predict the location of thermals given as input the current weather conditions and terrain information obtained from Google Earth Engine and thermal regions encountered as truth labels. We were able to converge very well on the training and validation set, proving our method with around a 0.98 F1 score. These results indicate success in creating a custom dataset and a powerful neural network with the necessity of bolstering our custom dataset. Machine Learning Free Flight Image Segmentation Satellite Imagery Deep Learning Gliding Artificial Intelligence and Robotics Aviation Safety and Security Databases and Information Systems Data Science Environmental Monitoring Fluid Dynamics Geology Other Aerospace Engineering
290	ASSESSMENT OF VARIABILITY OF LAND USE IMPACTS ON WATER QUALITY CONTAMINANTS Johann Alexander Vera (14103150), Bernard A. Engel (5644601) 10 December 2022 (has links) <p> The hydrological cycle is affected by land use variability. Land use spatial and temporal variability has the power to alter watershed runoff, water resource quantity and quality, ecosystems, and environmental sustainability. In recent decades, agriculture lands, pastures, plantations, and urban areas have increased, resulting in significant increases in energy, water, and fertilizer usage, as well as significant biodiversity losses. </p> Land use and environmental planning Surface water hydrology Statistical data science water quality contaminants swat linear mixed model

Search results