Global ETD Search

1	Machine Learning based Methods to Improve Power System Operation under High Renewable Pennetration Bhavsar, Sujal Pradipkumar 19 September 2022 (has links) In an attempt to thwart global warming in a concerted way, more than 130 countries have committed to becoming carbon neutral around 2050. In the United States, the Biden ad- ministration has called for 100% clean energy by 2035. It is estimated that in order to meet that target, the energy production from solar and wind should increase to 50-70% from the current 11% share. Under higher penetration of solar and wind, the intermittency of the energy source poses critical problems in forecasting, uncertainty quantification, reserve man- agement, unit commitment, and economic dispatch, and presents unique challenges to the distribution system, including predicting solar adoption by the user as well as forecasting end-use load profiles. While these problems are complex, advances in machine learning and artificial intelligence provide opportunities for novel paradigms for addressing the challenges. The overall aim of the dissertation is to harness data-driven and model-based techniques and develop computationally efficient tools for improved power systems operation under high re- newables penetration in the next-generation electric grid. Some of the salient contributions of this work are the reduction in the number of uncertain scenarios by 99%; dramatic reduc- tion in the computational overhead to simulate stochastic unit commitment and economic dispatch on a single-node electric-grid system to merely 10 seconds from 24 hours; reduc- tion in the total monthly operating cost of two-stage stochastic economic dispatch by an average of 5%, and reduction in average overall reserve due to intermittency in renewables by 50%; and improvement in the existing end-use load prediction and rooftop PV adopter identification tools by a considerable margin. / Doctor of Philosophy / In an attempt to thwart global warming in a concerted way, more than 130 countries have committed to becoming carbon neutral around 2050. In the United States, the Biden ad- ministration has called for 100% clean energy by 2035. It is estimated that in order to meet that target, the energy production from solar and wind should increase to 50-70% from the current 11% share. Under higher penetration of solar and wind, the intermittency of the energy source poses critical problems in forecasting, uncertainty quantification, reserve man- agement, unit commitment, and economic dispatch, and presents unique challenges to the distribution system, including predicting solar adoption by the user as well as forecasting end-use load profiles. While these problems are complex, advances in machine learning and artificial intelligence provide opportunities for novel paradigms for addressing the challenges. The overall aim of the dissertation is to harness data-driven and model-based techniques and develop computationally efficient tools for improved power systems operation under high re- newables penetration in the next-generation electric grid. Some of the salient contributions of this work are the reduction in the number of uncertain scenarios by 99%; dramatic reduc- tion in the computational overhead to simulate stochastic unit commitment and economic dispatch on a single-node electric-grid system to merely 10 seconds from 24 hours; reduc- tion in the total monthly operating cost of two-stage stochastic economic dispatch by an average of 5%, and reduction in average overall reserve due to intermittency in renewables by 50%; and improvement in the existing end-use load prediction and rooftop PV adopter identification tools by a considerable margin. Applied machine learning Stochastic optimization Renewable integration
2	Early detection of malicious web content with applied machine learning Likarish, Peter F. 01 July 2011 (has links) This thesis explores the use of applied machine learning techniques to augment traditional methods of identifying and preventing web-based attacks. Several factors complicate the identification of web-based attacks. The first is the scale of the web. The amount of data on the web and the heterogeneous nature of this data complicate efforts to distinguish between benign sites and attack sites. Second, an attacker may duplicate their attack at multiple, unexpected locations (multiple URLs spread across different domains) with ease. Third, attacks can be hosted nearly anonymously; there is little cost or risk associated with hosting or publishing a web-based attack. In combination, these factors lead one to conclude that, currently, the webs threat landscape is unfavorably tilted towards the attacker. To counter these advantages this thesis describes our novel solutions to web se- curity problems. The common theme running through our work is the demonstration that we can detect attacks missed by other security tools as well as detecting attacks sooner than other security responses. To illustrate this, we describe the development of BayeShield, a browser-based tool capable of successfully identifying phishing at- tacks in the wild. Progressing from specific to a more general approach, we next focus on the detection of obfuscated scripts (one of the most commonly used tools in web-based attacks). Finally, we present TopSpector, a system we've designed to forecast malicious activity prior to it's occurrence. We demonstrate that by mining Top-Level DNS data we can produce a candidate set of domains that contains up to 65% of domains that will be blacklisted. Furthermore, on average TopSpector flags malicious domains 32 days before they are blacklisted, allowing the security community ample time to investigate these domains before they host malicious activity. applied machine learning Computer security Domain Name System javascript phishing web-based attacks Computer Sciences
3	Continuous dimensional emotion tracking in music Imbrasaite, Vaiva January 2015 (has links) The size of easily-accessible libraries of digital music recordings is growing every day, and people need new and more intuitive ways of managing them, searching through them and discovering new music. Musical emotion is a method of classification that people use without thinking and it therefore could be used for enriching music libraries to make them more user-friendly, evaluating new pieces or even for discovering meaningful features for automatic composition. The field of Emotion in Music is not new: there has been a lot of work done in musicology, psychology, and other fields. However, automatic emotion prediction in music is still at its infancy and often lacks that transfer of knowledge from the other fields surrounding it. This dissertation explores automatic continuous dimensional emotion prediction in music and shows how various findings from other areas of Emotion and Music and Affective Computing can be translated and used for this task. There are four main contributions. Firstly, I describe a study that I conducted which focused on evaluation metrics used to present the results of continuous emotion prediction. So far, the field lacks consensus on which metrics to use, making the comparison of different approaches near impossible. In this study, I investigated people’s intuitively preferred evaluation metric, and, on the basis of the results, suggested some guidelines for the analysis of the results of continuous emotion recognition algorithms. I discovered that root-mean-squared error (RMSE) is significantly preferable to the other metrics explored for the one dimensional case, and it has similar preference ratings to correlation coefficient in the two dimensional case. Secondly, I investigated how various findings from the field of Emotion in Music can be used when building feature vectors for machine learning solutions to the problem. I suggest some novel feature vector representation techniques, testing them on several datasets and several machine learning models, showing the advantage they can bring. Some of the suggested feature representations can reduce RMSE by up to 19% when compared to the standard feature representation, and up to 10-fold improvement for non-squared correlation coefficient. Thirdly, I describe Continuous Conditional Random Fields and Continuous Conditional Neural Fields (CCNF) and introduce their use for the problem of continuous dimensional emotion recognition in music, comparing them with Support Vector Regression. These two models incorporate some of the temporal information that the standard bag-of-frames approaches lack, and are therefore capable of improving the results. CCNF can reduce RMSE by up to 20% when compared to Support Vector Regression, and can increase squared correlation for the valence axis by up to 40%. Finally, I describe a novel multi-modal approach to continuous dimensional music emotion recognition. The field so far has focused solely on acoustic analysis of songs, while in this dissertation I show how the separation of vocals and music and the analysis of lyrics can be used to improve the performance of such systems. The separation of music and vocals can improve the results by up to 10% with a stronger impact on arousal, when compared to a system that uses only acoustic analysis of the whole signal, and the addition of the analysis of lyrics can provide a similar improvement to the results of the valence model. 786.7
4	Predicting Day-Zero Review Ratings: A Social Web Mining Approach John, Zubin R. January 2015 (has links) No description available. Artificial Intelligence Computer Science
5	Toward Better Understanding and Documentation of Rationale for Code Changes Alsafwan, Khadijah Ahmad 24 August 2023 (has links) Software development is driven by the development team's decisions. Communicating the rationale behind these decisions is essential for the projects success. Although the software engineering community recognizes the need and importance of rationale, there has been a lack of in-depth study of rationale for code changes. To bridge this gap, this dissertation examines the rationale behind code changes in-depth and breadth. This work includes two studies and an experiment. The first study aims to understand software developers' need. It finds that software developers need to investigate code changes to understand their rationale when working on diverse tasks. The study also reveals that software developers decompose the rationale of code commits into 15 separate components that they could seek when searching for rationale. The second study surveys software developers' experiences with rationale. It uncovers issues and challenges that software developers encounter while searching for and recording rationale for code changes. The study highlights rationale components that are needed and hard to find. Additionally, it discusses factors leading software developers to give up their search for the rationale of code changes. Finally, the experiment predicts the documentation of rationale components in pull request templates. Multiple statistical models are built to predict if rationale components' headers will not be filled. The trained models are effective in achieving high accuracy and recall. Overall, this work's findings shed light on the need for rationale and offer deep insights for fulfilling this important information need. / Doctor of Philosophy / Software developers build software by creating and changing the software's code. In this process, developers make decisions and other developers need to understand these decisions. The rationale behind code changes is an important piece of information that leads to development success if well explained and understood. In this work, we study the developers' need for rationale by conducting two studies and an experiment. In the first study, we found that software developers often need to look into the rationale behind code changes to understand them better while working on different tasks. We identified 15 different parts of rationale that developers seek when searching for rationale for code changes. The second study focused on the experiences of software developers when looking for and recording rationale. We discovered some challenges that developers face, like difficulty in finding specific rationale parts and the factors that make developers give up searching for rationale. The experiment predicts if developers would document rationale in specific templates. We built models to predict if certain parts of rationale would be left empty, and the models were effective. Overall, this research provides a better understanding of software developers' need, and it provides valuable insights to help fulfill this important information need. Software Engineering Software Evolution and Maintenance Revision Control Systems Software Changes Rationale Applied Machine Learning.
6	Data-driven Algorithms for Critical Detection Problems: From Healthcare to Cybersecurity Defenses Song, Wenjia 16 January 2025 (has links) Machine learning and data-driven approaches have been widely applied to critical detection problems, but their performance is often hindered by data-related challenges. This dissertation seeks to address three key challenges: data imbalance, scarcity of high-quality labels, and excessive data processing requirements, through studies in healthcare and cybersecurity. We study healthcare problems with imbalanced clinical datasets that lead to performance disparities across prediction classes and demographic groups. We systematically evaluate these disparities and propose a Double Prioritized (DP) bias correction method that significantly improves the model performance for underrepresented groups and reduces biases. Cyber threats, such as ransomware and advanced persistent threats (APTs), have presented growing threats in recent years. Existing ransomware defenses often rely on black-box models trained on unverified traces, providing limited interpretability. To address the scarcity of reliably labeled training data, we experimentally profile runtime ransomware behaviors of real-world samples and identify core patterns, enabling explainable and trustworthy detection. For APT detection, the large size of system audit logs hinders real-time detection. We introduce Madeline, a lightweight system that efficiently processes voluminous logs with compact representations, overcoming real-time detection bottlenecks. These contributions provide deployable and effective solutions, offering insights for future research within and beyond the fields of healthcare and cybersecurity. / Doctor of Philosophy / Machine learning and data-driven methods have been widely used to solve important detection problems, but their effectiveness is often limited by challenges related to the data they rely on. This dissertation focuses on three key challenges: imbalanced data, a lack of high-quality information, and the need to process large amounts of data quickly. We address these issues through studies in healthcare and cybersecurity. Data from clinical studies is often unbalanced, with certain patient groups or outcomes being underrepresented. This imbalance leads to inconsistent prediction accuracies across groups. We address this by developing a method called Double Prioritized (DP) bias correction, which significantly improves the accuracy for minority groups and reduces biases. Cyber threats are becoming increasingly serious risks. One type of prevalent malware is ransomware, which encrypts the victim's data and demands payment for recovery. Current ransomware defenses often learn from unverified data and make decisions without clear explanations. To improve this, we analyze how real-world ransomware behaves, identifying patterns that allow for more explainable and reliable detection. Another type of threat is called advanced persistent threats (APTs), which aim to stay undetected in the victim's system for a long time and exfiltrate data gradually. For APT detection, the challenge lies in analyzing the vast amount of activity data the system generates, which slows down detection. We introduce detectionname, a system designed to process large logs efficiently, enabling fast and accurate threat detection. These contributions provide practical solutions to pressing problems in healthcare and cybersecurity and offer ideas for future improvements within and beyond these fields. Cybersecurity Advanced Persistent Threats (APTs) Anomaly Detection Digital Health AI Fairness Applied Machine Learning
7	Data-based on-board diagnostics for diesel-engine NOx-reduction aftertreatment systems Atharva Tandale (15351352) 27 April 2023 (has links) <p>The NOx conversion efficiency of a combined Selective Catalytic Reduction and</p> <p>Ammonia Slip Catalyst (SCR-ASC) in a Diesel Aftertreatment (AT) system degrades with</p> <p>time. A novel model-informed data-driven On-Board Diagnostic (OBD) binary classification</p> <p>strategy is proposed in this paper to distinguish an End of Useful Life (EUL) SCR-ASC catalyst</p> <p>from Degreened (DG) ones. An optimized supervised machine learning model was used for the</p> <p>classification with a calibrated single-cell 3-state Continuous Stirred Tank Reactor (CSTR)</p> <p>observer used for state estimation. The method resulted in 87.5% classification accuracy when</p> <p>tested on 8 day-files from 4 trucks (2 day-files per truck; 1 DG and 1 EUL) operating in realworld on-road conditions.</p> diesel aftertreatment systems On-board Diagnostics catalyst aging observer design Applied Machine Learning
8	Pinball: Using Machine Learning Based Control in Real-Time, Cyber-Physical System Saranguhewa, Pavan January 2022 (has links) No description available. Electrical Engineering Intelligent Control Real-time Cyber Physical System Computer Vision Model Predictive Control Reinforcement Learning Applied Machine Learning
9	Automatic identification and removal of low quality online information Webb, Steve 17 November 2008 (has links) The advent of the Internet has generated a proliferation of online information-rich environments, which provide information consumers with an unprecedented amount of freely available information. However, the openness of these environments has also made them vulnerable to a new class of attacks called Denial of Information (DoI) attacks. Attackers launch these attacks by deliberately inserting low quality information into information-rich environments to promote that information or to deny access to high quality information. These attacks directly threaten the usefulness and dependability of online information-rich environments, and as a result, an important research question is how to automatically identify and remove this low quality information from these environments. The first contribution of this thesis research is a set of techniques for automatically recognizing and countering various forms of DoI attacks in email systems. We develop a new DoI attack based on camouflaged messages, and we show that spam producers and information consumers are entrenched in a spam arms race. To break free of this arms race, we propose two solutions. One solution involves refining the statistical learning process by associating disproportionate weights to spam and legitimate features, and the other solution leverages the existence of non-textual email features (e.g., URLs) to make the classification process more resilient against attacks. The second contribution of this thesis is a framework for collecting, analyzing, and classifying examples of DoI attacks in the World Wide Web. We propose a fully automatic Web spam collection technique and use it to create the Webb Spam Corpus -- a first-of-its-kind, large-scale, and publicly available Web spam data set. Then, we perform the first large-scale characterization of Web spam using content and HTTP session analysis. Next, we present a lightweight, predictive approach to Web spam classification that relies exclusively on HTTP session information. The final contribution of this thesis research is a collection of techniques that detect and help prevent DoI attacks within social environments. First, we provide detailed descriptions for each of these attacks. Then, we propose a novel technique for capturing examples of social spam, and we use our collected data to perform the first characterization of social spammers and their behaviors. Denial of information Email spam Web spam Social spam Applied machine learning Information security Spam filtering (Electronic mail) World Wide Web Online social networks Computer networks Security measures
10	Applications of Persistent Homology and Cycles Mandal, Sayan 13 November 2020 (has links) No description available. Computer Science Computer Engineering topological data analysis persistent homology image classification protein description persistent cycles gene expression analysis applied machine learning simplicial complex homology feature vector

Search results