Global ETD Search

621	Automated Machine Learning Framework for EEG/ERP Analysis: Viable Improvement on Traditional Approaches? Boshra, Rober January 2016 (has links) Event Related Potential (ERP) measures derived from the electroencephalogram (EEG) have been widely used in research on language, cognition, and pathology. The high dimensionality (time x channel x condition) of a typical EEG/ERP dataset makes it a time-consuming prospect to properly analyze, explore, and validate knowledge without a particular restricted hypothesis. This study proposes an automated empirical greedy approach to the analysis process to datamine an EEG dataset for the location, robustness, and latency of ERPs, if any, present in a given dataset. We utilize Support Vector Machines (SVM), a well established machine learning model, on top of a preprocessing pipeline that focuses on detecting differences across experimental conditions. A hybrid of monte-carlo bootstrapping, cross-validation, and permutation tests is used to ensure the reproducibility of results. This framework serves to reduce researcher bias, time spent during analysis, and provide statistically sound results that are agnostic to dataset specifications including the ERPs in question. This method has been tested and validated on three different datasets with different ERPs (N100, Mismatch Negativity (MMN), N2b, Phonological Mapping Negativity (PMN), and P300). Results show statistically significant, above-chance level identification of all ERPs in their respective experimental conditions, latency, and location. / Thesis / Master of Science (MSc) Machine Learning EEG ERP Cognitive Neuroscience Analysis
622	Radar and Camera Fusion in Intelligent Transportation System Ding, Bao Ming January 2023 (has links) Modern smart cities often consist of a vast array of all-purpose traffic monitoring systems to understand city status, help reduce traffic congestion, and to enforce traffic laws. It is critical for these systems to be able to robustly and effectively detect and classify road objects. The majority of current traffic monitoring solutions consist of single RGB cameras. While cost-effective, these RGB cameras can fail in adverse weather or under poor lighting conditions. This thesis explores the viability of fusing an mmWave Radar with an RGB camera to increase performance and make the system robust in any operating conditions. This thesis discusses the fusion device's design, build, and sensor selection process. Next, this thesis proposes the fusion device processing pipeline consisting of a novel radar object detection and classification algorithm, State-of-the-Art camera processing algorithms, and a practical fusion algorithm to fuse the result from the camera and the radar. The proposed radar detection algorithm includes a novel clustering algorithm based on DBSCAN and a feature-based object classifier. The proposed algorithms show higher accuracy compared to the baseline. The camera processing algorithms include Yolov5 and StrongSort, which are pre-trained on their respective dataset and show high accuracy without the need for transfer learning. Finally, the practical fusion algorithm fuses the information between the radar and the camera at the decision level, where the camera results are matched with the radar results based on probability. The fusion allows the device to combine the high data association accuracy of the camera sensor with the additional measured states of the radar system to form a better understanding of the observed objects. / Thesis / Master of Applied Science (MASc) Computer Vision Machine Learning Object Detection Radar
623	On the Neural Representation for Adversarial Attack and Defense Qiuling Xu (17121274) 20 October 2023 (has links) <p dir="ltr">Neural representations are high-dimensional embeddings generated during the feed-forward process of neural networks. These embeddings compress raw input information and extract abstract features beneficial for downstream tasks. However, effectively utilizing these representations poses challenges due to their inherent complexity. This complexity arises from the non-linear relationship between inputs and neural representations, as well as the diversity of the learning process.</p><p dir="ltr">In this thesis, we propose effective methods to utilize neural representations for adversarial attack and defense. Our approach generally involves decomposing complex neural representations into smaller, more analyzable parts. We also seek general patterns emerging during learning to better understand the semantic meaning associated with neural representations.</p><p dir="ltr">We demonstrate that formalizing neural representations can reveal models' weaknesses and aid in defending against poison attacks. Specifically, we define a new type of adversarial attack using neural style, a special component of neural representation. This new attack uncovers novel aspects of the models' vulnerabilities. </p><p dir="ltr">Furthermore, we develop an interpretation of neural representations by approximating their marginal distribution, treating intermediate neurons as feature indicators. By properly harnessing these rich feature indicators, we address scalability and imperceptibility issues related to pixel-wise bounds.</p><p dir="ltr">Finally, we discover that neural representations contain crucial information about how neural networks make decisions. Leveraging the general patterns in neural representations, we design algorithms to remove unwanted and harmful functionalities from neural networks, thereby mitigating poison attacks.</p> Adversarial machine learning Adversarial learning Adversarial Attack
624	The role of model implementation in neuroscientific applications of machine learning Abe, Taiga January 2024 (has links) In modern neuroscience, large scale machine learning models are becoming increasingly critical components of data analysis. Despite the accelerating adoption of these large scale machine learning tools, there are fundamental challenges to their use in scientific applications that remain largely unaddressed. In this thesis, I focus on one such challenge: variability in the predictions of large scale machine learning models relative to seemingly trivial differences in their implementation. Existing research has shown that the performance of large scale machine learning models (more so than traditional model like linear regression) is meaningfully entangled with design choices such as the hardware components, operating system, software dependencies, and random seed that the corresponding model depends upon. Within the bounds of current practice, there are few ways of controlling this kind of implementation variability across the broad community of neuroscience researchers (making data analysis less reproducible), and little understanding of how data analyses might be designed to mitigate these issues (making data analysis unreliable). This dissertation will present two broad research directions that address these shortcomings. First, I will describe a novel, cloud-based platform for sharing data analysis tools reproducibly and at scale. This platform, called NeuroCAAS, enables developers of novel data analyses to precisely specify an implementation of their entire data analysis, which can then be used automatically by any other user on custom built cloud resources. I show that this approach is able to efficiently support a wide variety of existing data analysis tools, as well as novel tools which would not be feasible to build and share outside of a platform like NeuroCAAS. Second, I conduct two large-scale studies on the behavior of deep ensembles. Deep ensembles are a class of machine learning model which uses implementation variability to improve the quality of model predictions; in particular, by aggregating the predictions of deep networks over stochastic initialization and training. Deep ensembles simultaneously provide a way to control the impact of implementation variability (by aggregating predictions across random seeds) and also to understand what kind of predictive diversity is generated by this particular form of implementation variability. I present a number of surprising results that contradict widely held intuitions about the performance of deep ensembles as well as the mechanisms behind their success, and show that in many aspects, the behavior of deep ensembles is similar to that of an appropriately chosen single neural network. As a whole, this dissertation presents novel methods and insights focused on the role of implementation variability in large scale machine learning models, and more generally upon the challenges of working with such large models in neuroscience data analysis. I conclude by discussing other ongoing efforts to improve the reproducibility and accessibility of large scale machine learning in neuroscience, as well as long term goals to speed the adoption and reliability of such methods in a scientific context. Neurosciences Computer science Machine learning--Statistical methods Machine learning--Mathematical models Deep learning (Machine learning) Cloud computing
625	Crowd and Hybrid Algorithms for Cost-Aware Classification Krivosheev, Evgeny 28 May 2020 (has links) Classification is a pervasive problem in research that aims at grouping items in categories according to established criteria. There are two prevalent ways to classify items of interest: i) to train and exploit machine learning (ML) algorithms or ii) to resort to human classification (via experts or crowdsourcing). Machine Learning algorithms have been rapidly improving with an impressive performance in complex problems such as object recognition and natural language understanding. However, in many cases they cannot yet deliver the required levels of precision and recall, typically due to difficulty of the problem and (lack of) availability of sufficiently large and clean datasets. Research in crowdsourcing has also made impressive progress in the last few years, and the crowd has been shown to perform well even in difficult tasks [Callaghan et al., 2018; Ranard et al., 2014]. However, crowdsourcing remains expensive, especially when aiming at high levels of accuracy, which often implies collecting more votes per item to make classification more robust to workers' errors. Recently, we witness rapidly emerging the third direction of hybrid crowd-machine classification that can achieve superior performance by combining the cost-effectiveness of automatic machine classifiers with the accuracy of human judgment. In this thesis, we focus on designing crowdsourcing strategies and hybrid crowd-machine approaches that optimize the item classification problem in terms of results and budget. We start by investigating crowd-based classification under the budget constraint with different loss implications, i.,e., when false positive and false negative errors carry different harm to the task. Further, we propose and validate a probabilistic crowd classification algorithm that iteratively estimates the statistical parameters of the task and data to efficiently manage the accuracy vs. cost trade-off. We then investigate how the crowd and machines can support each other in tackling classification problems. We present and evaluate a set of hybrid strategies balancing between investing money in building machines and exploiting them jointly with crowd-based classifiers. While analyzing our results of crowd and hybrid classification, we found it is relevant to study the problem of quality of crowd observations and their confusions as well as another promising direction of linking entities from structured and unstructured sources of data. We propose crowd and neural network grounded algorithms to cope with these challenges followed by rich evaluation on synthetic and real-world datasets.
626	UNDERSTANDING SPATIOTEMPORAL PATTERNS OF HARMFUL ALGAL BLOOMS: A CITIZEN SCIENCE PERSPECTIVE Lefaivre, Ryan 01 August 2023 (has links) (PDF) Harmful Algal Blooms (HABs) occur due to the excessive growth of algal in waterbodies such as lakes, rivers, and ponds. The cyanotoxins produced by HABs are harmful to wildlife, animals, and humans when ingested or exposed. Due to the toxic and rapid growth of HABs, it is essential to assess potential causes of HABs over broad geographical scales. This observational study aims to understand the spatiotemporal patterns and drivers of HABs across the State of Illinois using both regular environmental monitoring and citizen science datasets from the Illinois Environmental Protection Agency (IEPA). The Ambient Lake Monitoring Program and the Illinois Clean Lakes Program regularly conduct chlorophyll-a measurements, collectively referred to as the ALMP + ICLP dataset. Similarly, the Volunteer Lake Monitoring Program of the Illinois Environmental Protection Agency (IEPA) organizes volunteer citizens to collect Secchi-disk measurements, known as the VLMP dataset. Machine learning algorithms including Random Forest, Artificial Neural Network, and Support Vector Machine are used to evaluate HABs and trophic states of HABs based on nine meteorological variables, six lake morphological variables, and eight land use and land cover variables. The data characteristics found the Cook county area consisted of over half of the total VLMP observations. The meteorological variables were most important for accuracy and classification in the Random Forest modeling, and the VLMP dataset performed the best at trophic state classification, and the Random Forest model performed the best overall compared to the other machine learning models. This study concludes that the VLMP is a beneficial and comparable tool when coupled with the ALMP + ICLP data for HAB monitoring in Illinois. Citizen Science Community Science HABs Machine Learning
627	UNCOVERING TRENDS OF E. COLI TRANSPORT IN PRIVATE DRINKING WATER WELLS: AN ONTARIO CASE-STUDY White, Katie January 2023 (has links) Millions of Canadians rely on private groundwater wells to access drinking water, which presents many challenges including a lack of government regulations, and limited resources for maintenance, monitoring, management, and protection. These challenges result in an increased risk of acute gastrointestinal illness in private well users. The goal of this work is to improve the understanding of drivers of E. coli fate and transport in groundwater using a data-driven approach to better inform well owners and policy makers. Specifically, the objectives include: exploratory analysis of the physical and human drivers of private well contamination; advancing the understanding of the relationships between land use-land cover and E. coli presence in wells; assessment of rainfall intermittency patterns as a driver of contamination, as an alternative to standard lag times; and, the development of data-driven explanatory models for E. coli contamination in private wells that move towards a novel coupled-systems approach. This research utilizes a large dataset with 795,023 contamination observations, 253,136 unique wells, and over 33 variables (i.e., microbiological, hydrogeological, well characteristic, meteorological, geographical, and testing behaviour) across Ontario, Canada between 2010 and 2017. Data used includes the Well Water Information Database, Well Water Information System, Daymet, Provincial Digital Elevation Model, Ontario Land Cover Compilation, Southern Ontario Land Resource Information System, and Roads Network. Data analysis methods range from univariate and bivariate analyses to supervised and unsupervised machine learning techniques, including regression, clustering, and classification. This work has contributed important understandings of the relationships between E. coli contamination and well and aquifer characteristics, seasonality, weather, and human behaviour. Specifically, increased well depth reduced, but did not eliminate, likelihood of contamination; wells completed in consolidated material increased likelihood of contamination; the most significant driver of contamination was identified as land use - land cover, which was categorized into four classes of E. coli contamination potential for wells, ranging from very high to low; latitude was found to drive seasonality and consequent weather patterns, leading to the creation of geographically-based seasonal models; liquid water (i.e., rainfall, snow melt) was a key driver of contamination, where increased water generally increased presence of E.coli while causing decreasing prevalence; time of year, not habit, drove user testing, generally peaking in July; and, a surrogate measure of well user stewardship was identified as driving time to closest drop-off location. Further, this work has contributed methodological advancements in identifying drivers of groundwater contamination including: utilizing literature confidence ratings alongside regression analyses to supply strategic direction to policy makers; demonstrating the value of large datasets in combination with innovative machine learning techniques, and subject matter expertise, to identify improved physically-based understandings of the system; and, highlighting the need for coupled-systems approaches as physical models alone do not capture human behaviour-based factors of contamination. / Thesis / Doctor of Engineering (DEng) / There are millions of people globally relying on private groundwater to access drinking water. Unfortunately, these wells come with many challenges including a lack of government regulations, and limited resources for maintenance, management, and protection. These challenges also result in an increased risk of illness in private well users. Groundwater research is often limited by lack of numerical data, making it extremely difficult to understand how groundwater and contaminants are transported. This research utilizes a large dataset with 795,023 contamination observations, 253,136 unique wells, and over 33 variables (i.e., well and aquifer characteristics, human behaviours, weather-related) across Ontario, Canada between 2010 and 2017. The work in this thesis utilizes a data-driven approach, using various machine learning techniques combined with subject matter expertise, to uncover trends and insights into when and how contamination events occur in private wells, to inform policy makers and empower well users. Groundwater Private Wells Machine Learning Coupled-System
628	Using Machine Learning To Predict Type 2 Diabetes With Self-Controllable Lifestyle Risk Factors Zhao, Xubin January 2023 (has links) Globally, the prevalence of diabetes has seen a significant increase, rising from 211 million in 1990 (3.96% of the global population at that time) to 476 million in 2017 (6.31% of the global population in 2017). Extensive research has been conducted to study the causes of diabetes from a data-driven approach, leading to the development of prospective models for predicting future diabetes risks. These studies have highlighted the strong correlation between diabetes and various biomarker factors, such as BMI, age, and certain blood test measures. However, there is a lack of research that focuses on building prospective models to predict future diabetes risks based on lifestyle factors. Therefore, this thesis aims to employ popular machine learning methods to investigate whether it is possible to predict future diabetes using prospective models that incorporate self-controllable lifestyle factors. Our analysis produced remarkable results, with the biomarker model achieving an average validation AUC score of 0.78, while the lifestyle model reached 0.70. Notably, lifestyle features demonstrate a greater predictive capacity for short-term new-onset diabetes when compared to the long-term endpoint. The biomarker model identified visceral fat as the most significant risk factor, whereas income level and employment emerged as the top risk factors in the lifestyle model. This thesis represents an innovative approach to diabetes prediction by leveraging lifestyle factors, providing valuable data-driven insights into the root causes of diabetes. It addresses a critical research gap by highlighting the significant role of lifestyle factors in predicting the future onset of diabetes, particularly within the context of parametric modeling. / Thesis / Master of Science (MSc)
629	Yield Prediction Using Spatial and Temporal Deep Learning Algorithms and Data Fusion Bisht, Bhavesh 24 November 2023 (has links) The world’s population is expected to grow to 9.6 billion by 2050. This exponential growth imposes a significant challenge on food security making the development of efficient crop production a growing concern. The traditional methods of analyzing soil and crop yield rely on manual field surveys and the use of expensive instruments. This process is not only time-consuming but also requires a team of specialists making this method of prediction expensive. Prediction of yield is an integral part of smart farming as it enables farmers to make timely informed decisions and maximize productivity while minimizing waste. Traditional statistical approaches fall short in optimizing yield prediction due to the multitude of diverse variables that influence crop production. Additionally, the interactions between these variables are non-linear which these methods fail to capture. Recent approaches in machine learning and data-driven models are better suited for handling the complexity and variability of crop yield prediction. Maize, also known as corn, is a staple crop in many countries and is used in a variety of food products, including bread, cereal, and animal feed. In 2021-2022, the total production of corn was around 1.2 billion tonnes superseding that of wheat or rice, making it an essential element of food production. With the advent of remote sensing, Unmanned aerial vehicles or UAVs are widely used to capture high-quality field images making it possible to capture minute details for better analysis of the crops. By combining spatial features, such as topography and soil type, with crop growth information, it is possible to develop a robust and accurate system for predicting crop yield. Convolutional Neural Networks (CNNs) are a type of deep neural network that has shown remarkable success in computer vision tasks, achieving state-of-the-art performance. Their ability to automatically extract features and patterns from data sets makes them highly effective in analyzing complex and high-dimensional datasets, such as drone imagery. In this research, we aim to build an effective crop yield predictor using data fusion and deep learning. We propose several Deep CNN architectures that can accurately predict corn yield before the end of the harvesting season which can aid farmers by providing them with valuable information about potential harvest outcomes, enabling them to make informed decisions regarding resource allocation. UAVs equipped with RGB (Red Green Blue) and multi-spectral cameras were scheduled to capture high-resolution images for the entire growth period of 2021 of 3 fields located in Ottawa, Ontario, where primarily corn was grown. Whereas, the ground yield data was acquired at the time of harvesting using a yield monitoring device mounted on the harvester. Several data processing techniques were employed to remove erroneous measurements and the processed data was fed to different CNN architectures, and several analyses were done on the models to highlight the best techniques/methods that lead to the most optimal performance. The final best-performing model was a 3-dimensional CNN model that can predict yield utilizing the images from the Early(June) and Mid(July) growing stages with a Mean Absolute Percentage error of 15.18% and a Root Mean Squared Error of 17.63 (Bushels Per Acre). The model trained on data from Field 1 demonstrated an average Correlation Coefficient of 0.57 between the True and Predicted yield values from Field 2 and Field 3. This research provides a direction for developing an end-to-end yield prediction model. Additionally, by leveraging the results from the experiments presented in this research, image acquisition, and computation costs can be brought down. Deep Learning Machine Learning Yield Smart Farming
630	Optimizing ERP Recommendations Using Machine Learning Techniques Jeremiah, Ante January 2023 (has links) This study explores the application of a recommendation engine in collaboration with Fortnox. The primary focus of this paper is to find potential improvements for their recommendation engine in terms of accurate recommendation for users. This study evaluates the performance of various algorithms on imbalanced data without resampling, using EasyEnsemble undersampling, SMOTE oversampling, and weightedclass approaches. The results indicate that LinearSVC is the best algorithm without resampling. Decision Tree performs well when combined with EasyEnsemble, outperforming other algorithms. When using SMOTE, Decision Tree performs thebest with the default sampling strategy, while LinearSVC and MultinomialNB show similar results. Varying the threshold for SMOTE produces mixed results, with LinearSVC and MultinomialNB showing sensitivity to changes in the threshold value,while Decision Tree maintains consistent performance. Finally, when using weightedclass, Decision Tree outperforms LinearSVC in terms of accuracy and F1-Score.Overall, the findings provide insights into the performance of different algorithmson imbalanced data and highlight the effectiveness of certain techniques in addressing the class imbalance problem, and the algorithms’ sensitivity to changes with resampled data. Machine Learning Imbalanced Engineering and Technology Teknik och teknologier

Search results