Spelling suggestions: "subject:"applied machine learning"" "subject:"appplied machine learning""
1 |
Machine Learning based Methods to Improve Power System Operation under High Renewable PennetrationBhavsar, Sujal Pradipkumar 19 September 2022 (has links)
In an attempt to thwart global warming in a concerted way, more than 130 countries have committed to becoming carbon neutral around 2050. In the United States, the Biden ad- ministration has called for 100% clean energy by 2035. It is estimated that in order to meet that target, the energy production from solar and wind should increase to 50-70% from the current 11% share. Under higher penetration of solar and wind, the intermittency of the energy source poses critical problems in forecasting, uncertainty quantification, reserve man- agement, unit commitment, and economic dispatch, and presents unique challenges to the distribution system, including predicting solar adoption by the user as well as forecasting end-use load profiles. While these problems are complex, advances in machine learning and artificial intelligence provide opportunities for novel paradigms for addressing the challenges. The overall aim of the dissertation is to harness data-driven and model-based techniques and develop computationally efficient tools for improved power systems operation under high re- newables penetration in the next-generation electric grid. Some of the salient contributions of this work are the reduction in the number of uncertain scenarios by 99%; dramatic reduc- tion in the computational overhead to simulate stochastic unit commitment and economic dispatch on a single-node electric-grid system to merely 10 seconds from 24 hours; reduc- tion in the total monthly operating cost of two-stage stochastic economic dispatch by an average of 5%, and reduction in average overall reserve due to intermittency in renewables by 50%; and improvement in the existing end-use load prediction and rooftop PV adopter identification tools by a considerable margin. / Doctor of Philosophy / In an attempt to thwart global warming in a concerted way, more than 130 countries have committed to becoming carbon neutral around 2050. In the United States, the Biden ad- ministration has called for 100% clean energy by 2035. It is estimated that in order to meet that target, the energy production from solar and wind should increase to 50-70% from the current 11% share. Under higher penetration of solar and wind, the intermittency of the energy source poses critical problems in forecasting, uncertainty quantification, reserve man- agement, unit commitment, and economic dispatch, and presents unique challenges to the distribution system, including predicting solar adoption by the user as well as forecasting end-use load profiles. While these problems are complex, advances in machine learning and artificial intelligence provide opportunities for novel paradigms for addressing the challenges. The overall aim of the dissertation is to harness data-driven and model-based techniques and develop computationally efficient tools for improved power systems operation under high re- newables penetration in the next-generation electric grid. Some of the salient contributions of this work are the reduction in the number of uncertain scenarios by 99%; dramatic reduc- tion in the computational overhead to simulate stochastic unit commitment and economic dispatch on a single-node electric-grid system to merely 10 seconds from 24 hours; reduc- tion in the total monthly operating cost of two-stage stochastic economic dispatch by an average of 5%, and reduction in average overall reserve due to intermittency in renewables by 50%; and improvement in the existing end-use load prediction and rooftop PV adopter identification tools by a considerable margin.
|
2 |
Early detection of malicious web content with applied machine learningLikarish, Peter F. 01 July 2011 (has links)
This thesis explores the use of applied machine learning techniques to augment traditional methods of identifying and preventing web-based attacks. Several factors complicate the identification of web-based attacks. The first is the scale of the web. The amount of data on the web and the heterogeneous nature of this data complicate efforts to distinguish between benign sites and attack sites. Second, an attacker may duplicate their attack at multiple, unexpected locations (multiple URLs spread across different domains) with ease. Third, attacks can be hosted nearly anonymously; there is little cost or risk associated with hosting or publishing a web-based attack. In combination, these factors lead one to conclude that, currently, the webs threat landscape is unfavorably tilted towards the attacker.
To counter these advantages this thesis describes our novel solutions to web se- curity problems. The common theme running through our work is the demonstration that we can detect attacks missed by other security tools as well as detecting attacks sooner than other security responses. To illustrate this, we describe the development of BayeShield, a browser-based tool capable of successfully identifying phishing at- tacks in the wild. Progressing from specific to a more general approach, we next focus on the detection of obfuscated scripts (one of the most commonly used tools in web-based attacks). Finally, we present TopSpector, a system we've designed to forecast malicious activity prior to it's occurrence. We demonstrate that by mining Top-Level DNS data we can produce a candidate set of domains that contains up to 65% of domains that will be blacklisted. Furthermore, on average TopSpector flags malicious domains 32 days before they are blacklisted, allowing the security community ample time to investigate these domains before they host malicious activity.
|
3 |
Continuous dimensional emotion tracking in musicImbrasaite, Vaiva January 2015 (has links)
The size of easily-accessible libraries of digital music recordings is growing every day, and people need new and more intuitive ways of managing them, searching through them and discovering new music. Musical emotion is a method of classification that people use without thinking and it therefore could be used for enriching music libraries to make them more user-friendly, evaluating new pieces or even for discovering meaningful features for automatic composition. The field of Emotion in Music is not new: there has been a lot of work done in musicology, psychology, and other fields. However, automatic emotion prediction in music is still at its infancy and often lacks that transfer of knowledge from the other fields surrounding it. This dissertation explores automatic continuous dimensional emotion prediction in music and shows how various findings from other areas of Emotion and Music and Affective Computing can be translated and used for this task. There are four main contributions. Firstly, I describe a study that I conducted which focused on evaluation metrics used to present the results of continuous emotion prediction. So far, the field lacks consensus on which metrics to use, making the comparison of different approaches near impossible. In this study, I investigated people’s intuitively preferred evaluation metric, and, on the basis of the results, suggested some guidelines for the analysis of the results of continuous emotion recognition algorithms. I discovered that root-mean-squared error (RMSE) is significantly preferable to the other metrics explored for the one dimensional case, and it has similar preference ratings to correlation coefficient in the two dimensional case. Secondly, I investigated how various findings from the field of Emotion in Music can be used when building feature vectors for machine learning solutions to the problem. I suggest some novel feature vector representation techniques, testing them on several datasets and several machine learning models, showing the advantage they can bring. Some of the suggested feature representations can reduce RMSE by up to 19% when compared to the standard feature representation, and up to 10-fold improvement for non-squared correlation coefficient. Thirdly, I describe Continuous Conditional Random Fields and Continuous Conditional Neural Fields (CCNF) and introduce their use for the problem of continuous dimensional emotion recognition in music, comparing them with Support Vector Regression. These two models incorporate some of the temporal information that the standard bag-of-frames approaches lack, and are therefore capable of improving the results. CCNF can reduce RMSE by up to 20% when compared to Support Vector Regression, and can increase squared correlation for the valence axis by up to 40%. Finally, I describe a novel multi-modal approach to continuous dimensional music emotion recognition. The field so far has focused solely on acoustic analysis of songs, while in this dissertation I show how the separation of vocals and music and the analysis of lyrics can be used to improve the performance of such systems. The separation of music and vocals can improve the results by up to 10% with a stronger impact on arousal, when compared to a system that uses only acoustic analysis of the whole signal, and the addition of the analysis of lyrics can provide a similar improvement to the results of the valence model.
|
4 |
Predicting Day-Zero Review Ratings: A Social Web Mining ApproachJohn, Zubin R. January 2015 (has links)
No description available.
|
5 |
Toward Better Understanding and Documentation of Rationale for Code ChangesAlsafwan, Khadijah Ahmad 24 August 2023 (has links)
Software development is driven by the development team's decisions. Communicating the rationale behind these decisions is essential for the projects success. Although the software engineering community recognizes the need and importance of rationale, there has been a lack of in-depth study of rationale for code changes. To bridge this gap, this dissertation examines the rationale behind code changes in-depth and breadth. This work includes two studies and an experiment. The first study aims to understand software developers' need. It finds that software developers need to investigate code changes to understand their rationale when working on diverse tasks. The study also reveals that software developers decompose the rationale of code commits into 15 separate components that they could seek when searching for rationale. The second study surveys software developers' experiences with rationale. It uncovers issues and challenges that software developers encounter while searching for and recording rationale for code changes. The study highlights rationale components that are needed and hard to find. Additionally, it discusses factors leading software developers to give up their search for the rationale of code changes. Finally, the experiment predicts the documentation of rationale components in pull request templates. Multiple statistical models are built to predict if rationale components' headers will not be filled. The trained models are effective in achieving high accuracy and recall. Overall, this work's findings shed light on the need for rationale and offer deep insights for fulfilling this important information need. / Doctor of Philosophy / Software developers build software by creating and changing the software's code. In this process, developers make decisions and other developers need to understand these decisions. The rationale behind code changes is an important piece of information that leads to development success if well explained and understood. In this work, we study the developers' need for rationale by conducting two studies and an experiment. In the first study, we found that software developers often need to look into the rationale behind code changes to understand them better while working on different tasks. We identified 15 different parts of rationale that developers seek when searching for rationale for code changes. The second study focused on the experiences of software developers when looking for and recording rationale. We discovered some challenges that developers face, like difficulty in finding specific rationale parts and the factors that make developers give up searching for rationale. The experiment predicts if developers would document rationale in specific templates. We built models to predict if certain parts of rationale would be left empty, and the models were effective. Overall, this research provides a better understanding of software developers' need, and it provides valuable insights to help fulfill this important information need.
|
6 |
Data-based on-board diagnostics for diesel-engine NOx-reduction aftertreatment systemsAtharva Tandale (15351352) 27 April 2023 (has links)
<p>The NOx conversion efficiency of a combined Selective Catalytic Reduction and</p>
<p>Ammonia Slip Catalyst (SCR-ASC) in a Diesel Aftertreatment (AT) system degrades with</p>
<p>time. A novel model-informed data-driven On-Board Diagnostic (OBD) binary classification</p>
<p>strategy is proposed in this paper to distinguish an End of Useful Life (EUL) SCR-ASC catalyst</p>
<p>from Degreened (DG) ones. An optimized supervised machine learning model was used for the</p>
<p>classification with a calibrated single-cell 3-state Continuous Stirred Tank Reactor (CSTR)</p>
<p>observer used for state estimation. The method resulted in 87.5% classification accuracy when</p>
<p>tested on 8 day-files from 4 trucks (2 day-files per truck; 1 DG and 1 EUL) operating in realworld on-road conditions.</p>
|
7 |
Pinball: Using Machine Learning Based Control in Real-Time, Cyber-Physical SystemSaranguhewa, Pavan January 2022 (has links)
No description available.
|
8 |
Automatic identification and removal of low quality online informationWebb, Steve 17 November 2008 (has links)
The advent of the Internet has generated a proliferation of online information-rich environments, which provide information consumers with an unprecedented amount of freely available information. However, the openness of these environments has also made them vulnerable to a new class of attacks called Denial of Information (DoI) attacks. Attackers launch these attacks by deliberately inserting low quality information into information-rich environments to promote that information or to deny access to high quality information. These attacks directly threaten the usefulness and dependability of online information-rich environments, and as a result, an important research question is how to automatically identify and remove this low quality information from these environments. The first contribution of this thesis research is a set of techniques for automatically recognizing and countering various forms of DoI attacks in email systems. We develop a new DoI attack based on camouflaged messages, and we show that spam producers and information consumers are entrenched in a spam arms race. To break free of this arms race, we propose two solutions. One solution involves refining the statistical learning process by associating disproportionate weights to spam and legitimate features, and the other solution leverages the existence of non-textual email features (e.g., URLs) to make the classification process more resilient against attacks. The second contribution of this thesis is a framework for collecting, analyzing, and classifying examples of DoI attacks in the World Wide Web. We propose a fully automatic Web spam collection technique and use it to create the Webb Spam Corpus -- a first-of-its-kind, large-scale, and publicly available Web spam data set. Then, we perform the first large-scale characterization of Web spam using content and HTTP session analysis. Next, we present a lightweight, predictive approach to Web spam classification that relies exclusively on HTTP session information. The final contribution of this thesis research is a collection of techniques that detect and help prevent DoI attacks within social environments. First, we provide detailed descriptions for each of these attacks. Then, we propose a novel technique for capturing examples of social spam, and we use our collected data to perform the first characterization of social spammers and their behaviors.
|
9 |
Applications of Persistent Homology and CyclesMandal, Sayan 13 November 2020 (has links)
No description available.
|
10 |
EXPLORATORY DATA ANALYSIS OF CONSUMER FOOD SAFETY BEHAVIORSZachary R Berglund (14444238) 27 April 2023 (has links)
<p> </p>
<p>Food safety researchers and extension workers are focused on educating the different actors of the supply chain, from farm to fork. To accomplish this, researchers identify areas of improvement and investigate the factors that cause or explain food safety behaviors. This thesis is divided into a systematic literature review with a meta-analysis and qualitative synthesis (Ch. 2), then two case studies that use predictive models to find top predictors of food safety behaviors (Ch.3 and 4). The systematic review (Ch.2) investigates online food safety educational programs and their effectiveness, barriers, and recommendations on different subpopulations of students, consumers, and food workers. The findings showed a limited effect on attitudes in the different subpopulations. Several areas for future research and recommendations for educators were identified. The first case study (Ch.3) developed predictive models of different food safety behaviors at ten time points throughout the COVID-19 pandemic. Findings suggest an effect between changes in COVID-19 case numbers and how well attitudes related to COVID-19 can make predictions. Additionally, findings suggest the importance of attitudes when predicting food safety behaviors. Lastly, results identified that the belief that handwashing protects against foodborne illness was more important than the belief that handwashing protects against COVID-19 when predicting handwashing at most time points. These findings can identify insights into consumer behaviors during the pandemic and several possible areas for future research. The second case study (Ch. 4) developed predictive models of consumer flour handling practices and consumer awareness of flour-related recalls and how they are affected by the total number of flour-related recalls for a state where the consumer lives. Findings identified the importance of risk perceptions in predicting consumer flour handling practices. Results also showed that younger consumers were predicted to be more likely to be aware of flour recalls than consumers of older ages. Lastly, results show that the total number of flour-related recalls for a state where the consumer lives do not affect predictions. Findings identify potential challenges to recall communication and areas for future studies.</p>
|
Page generated in 0.0825 seconds