Global ETD Search

111	Anomaly Detection and Root Cause Analysis for LTE Radio Base Stations / Anomalitetsdetektion och grundorsaksanalys för LTE Radio Base-stationer López, Sergio January 2018 (has links) This project aims to detect possible anomalies in the resource consumption of radio base stations within the 4G LTE Radio architecture. This has been done by analyzing the statistical data that each node generates every 15 minutes, in the form of "performance maintenance counters". In this thesis, we introduce methods that allow resources to be automatically monitored after software updates, in order to detect any anomalies in the consumption patterns of the different resources compared to the reference period before the update. Additionally, we also attempt to narrow down the origin of anomalies by pointing out parameters potentially linked to the issue. / Detta projekt syftar till att upptäcka möjliga anomalier i resursförbrukningen hos radiobasstationer inom 4G LTE Radio-arkitekturen. Detta har gjorts genom att analysera de statistiska data som varje nod genererar var 15:e minut, i form av PM-räknare (PM = Performance Maintenance). I denna avhandling introducerar vi metoder som låter resurser över-vakas automatiskt efter programuppdateringar, för att upptäcka eventuella avvikelser i resursförbrukningen jämfört med referensperioden före uppdateringen. Dessutom försöker vi också avgränsa ursprunget till anomalier genom att peka ut parametrar som är potentiellt kopplade till problemet. Computer Systems Datorsystem
112	Applicability of GPT models to high-performance compute languages Icimpaye, Urlich January 2023 (has links) This thesis aims to investigate the feasibility of generating code in high-performance computing languages such as C++ with neural networks. This has been investigated by transfer learning publicly available pretrained transformers on C++ code. The models chosen for transfer learning are CodeT5, an encode-decoder model with 770 million parameters, and two decoder-only models called CodeGen, one with 350 million parameters and the other having one billion parameters. All models were trained on a labeled dataset where each sample had a prompt in natural language and an answer in C++ code. However, the CodeT5 model was also trained on an unlabelled dataset of C++ code since that model did not come pretrained on C++ code. The models were evaluated using the CodeBERTScore, which measures the cosine similarity of model-generated code with the reference code. The CodeT5 model gave the best score. However, looking at the types of programming tasks the model solved, the results indicate that they can only solve trivial programming tasks. This is likely due to the training corpus size and the models' size. Nevertheless, due to the limitations of computing resources available during the thesis, training larger models on a more extensive training corpus, specially labeled data, was not feasible, which would have given a performance gain. Additional computing resources would be required to train larger models on larger datasets to improve performance. Data Science Machine Learning Artificial Intelligence Transformers Engineering and Technology Teknik och teknologier
113	A Performance Survey of Text-Based Sentiment Analysis Methods for Automating Usability Evaluations Van Damme, Kelsi 01 June 2021 (has links) (PDF) Usability testing, or user experience (UX) testing, is increasingly recognized as an important part of the user interface design process. However, evaluating usability tests can be expensive in terms of time and resources and can lack consistency between human evaluators. This makes automation an appealing expansion or alternative to conventional usability techniques. Early usability automation focused on evaluating human behavior through quantitative metrics but the explosion of opinion mining and sentiment analysis applications in recent decades has led to exciting new possibilities for usability evaluation methods. This paper presents a survey of modern, open-source sentiment analyzers’ usefulness in extracting and correctly identifying moments of semantic significance in the context of recorded mock usability evaluations. Though our results did not find a text-based sentiment analyzer that could correctly parse moments as well as human evaluators, one analyzer was found to be able to parse positive moments found through audio-only cues as well as human evaluators. Further research into adjusting settings on current sentiment analyzers for usability evaluations and using multimodal tools instead of text-based analyzers could produce valuable tools for usability evaluations when used in conjunction with human evaluators. Usability User Experience Sentiment Analysis Agreement Automation Affectual Data Science Graphics and Human Computer Interfaces Software Engineering
114	TSPOONS: Tracking Salience Profiles of Online News Stories Paterson, Kimberly Laurel 01 June 2014 (has links) (PDF) News space is a relatively nebulous term that describes the general discourse concerning events that affect the populace. Past research has focused on qualitatively analyzing news space in an attempt to answer big questions about how the populace relates to the news and how they respond to it. We want to ask when do stories begin? What stories stand out among the noise? In order to answer the big questions about news space, we need to track the course of individual stories in the news. By analyzing the specific articles that comprise stories, we can synthesize the information gained from several stories to see a more complete picture of the discourse. The individual articles, the groups of articles that become stories, and the overall themes that connect stories together all complete the narrative about what is happening in society. TSPOONS provides a framework for analyzing news stories and answering two main questions: what were the important stories during some time frame and what were the important stories involving some topic. Drawing technical news stories from Techmeme.com, TSPOONS generates profiles of each news story, quantitatively measuring the importance, or salience, of news stories as well as quantifying the impact of these stories over time. Automated Discourse Analysis Natural Language Processing Data Science Computer Science Other Computer Engineering
115	Disclosure of suicidal drivers on social media: a natural language processing and thematic analysis approach Donnelly, Hayoung Kim 22 August 2023 (has links) It is common for people to search for health information on the internet, share their health issues through social media, and ask for advice from people in online communities. Some people reported feeling more comfortable sharing their psychological stress online and anonymously asking for advice from people. As such, people disclose not only their suicide risk but also their suicidal risk-associated drivers (e.g., suicide ideation, relational stress, financial crisis). This study aims to identify suicidal drivers from narratives extracted from social media, synthesize findings and suicide theories, and provide insights into future suicide prevention policies and practices. This research gathered and analyzed 128,587 posts written by 76,547 people worldwide. The posts were written in English from January 2021 to December 2022 on the r/SuicideWatch of Reddit. Natural Language Processing and topic modeling, specifically Latent Dirichlet Allocation (LDA), were used to identify clusters of posts based on similarities and differences between posts. Thematic analysis was used to identify suicidal drivers across clusters of posts. The web crawler developed by Brandwatch was used in data collection, and Python was used for all analyses. Six theme clusters of posts were identified. The first theme was Disclosure of Repetitive Suicide Ideation (i.e., “I want to die. I want to die, I want to die…(repeated)”), and 36.4% of posts had this theme. The second theme was Disclosure of Relational Stress (i.e., “I don’t have any friends”), and 31.9% of posts had this theme. The third theme was Disclosure of Suicide Attempts and Negative Healthcare Experiences (i.e., “I’ve had a suicide attempt before”, “The nurses ignored me”), and 9.9% of posts had this theme. The fourth theme was Disclosure of Abuse (i.e., “He would beat me black and blue”), and 8.8% of posts had this theme. The fifth theme was Disclosure of Contextual Stress, including finance and legal matters (i.e., “every moment was a living fear of the debt collector knocking on the door”), and 7.2% of posts had this theme. The last theme was Philosophical and Informative Discussions around suicide (i.e., “After death, the physical begins to deteriorate and life/energy is simply moved to another being”), and 5.8% of posts had this theme. Understanding different suicidal drivers is an essential component in designing individualized intervention plans for people at suicide risk. The current research identified the idiosyncrasies in the suicide drivers people talked about when disclosing their suicidality. Furthermore, the findings from this study’s data-inspired and exploratory approach provided additional evidence supporting existing suicide theories and frameworks. This research has the potential to lay the groundwork for designing suicide intervention strategies that target individuals’ self-disclosures of their struggles online. Psychology Data science Prevention Public health Self-disclosure Suicidality Suicide risk
116	Big Data and the Integrated Sciences of the Mind Faries, Frank January 2022 (has links) No description available. Philosophy big data machine learning cognitive science data science perspectivism integration
117	GRAPH-BASED ANALYSIS OF NON-RANDOM MISSING DATA PROBLEMS WITH LOW-RANK NATURE: STRUCTURED PREDICTION, MATRIX COMPLETION AND SPARSE PCA Hanbyul Lee (17586345) 09 December 2023 (has links) <p dir="ltr">In most theoretical studies on missing data analysis, data is typically assumed to be missing according to a specific probabilistic model. However, such assumption may not accurately reflect real-world situations, and sometimes missing is not purely random. In this thesis, our focus is on analyzing incomplete data matrices without relying on any probabilistic model assumptions for the missing schemes. To characterize a missing scheme deterministically, we employ a graph whose adjacency matrix is a binary matrix that indicates whether each matrix entry is observed or not. Leveraging its graph properties, we mathematically represent the missing pattern of an incomplete data matrix and conduct a theoretical analysis of how this non-random missing pattern affects the solvability of specific problems related to incomplete data. This dissertation primarily focuses on three types of incomplete data problems characterized by their low-rank nature: structured prediction, matrix completion, and sparse PCA.</p><p dir="ltr">First, we investigate a basic structured prediction problem, which involves recovering binary node labels on a fixed undirected graph, where noisy binary observations corresponding to edges are given. Essentially, this setting parallels a simple binary rank-1 symmetric matrix completion problem, where missing entries are determined by a fixed undirected graph. Our aim is to establish the fundamental limit bounds of this problem, revealing a close association between the limits and graph properties, such as connectivity.</p><p dir="ltr">Second, we move on to the general low-rank matrix completion problem. In this study, we establish provable guarantees for exact and approximate low-rank matrix completion problems that can be applied to any non-random missing pattern, by utilizing the observation graph corresponding to the missing scheme. We theoretically and experimentally show that the standard constrained nuclear norm minimization algorithm can successfully recover the true matrix when the observation graph is well-connected and has similar node degrees. We also verify that matrix completion is achievable with a near-optimal sample complexity rate when the observation graph has uniform node degrees and its adjacency matrix has a large spectral gap.</p><p dir="ltr">Finally, we address the sparse PCA problem, featuring an approximate low-rank attribute. Missing data is common in situations where sparse PCA is useful, such as single-cell RNA sequence data analysis. We propose a semidefinite relaxation of the non-convex $\ell_1$-regularized PCA problem to solve sparse PCA on incomplete data. We demonstrate that the method is particularly effective when the observation pattern has favorable properties. Our theory is substantiated through synthetic and real data analysis, showcasing the superior performance of our algorithm compared to other sparse PCA approaches, especially when the observed data pattern has specific characteristics.</p> Statistical data science Missing data handling structured prediction problems Matrix completion approach Sparse principal component analysis
118	<b>Deep Neural Network Structural Vulnerabilities And Remedial Measures</b> Yitao Li (9148706) 02 December 2023 (has links) <p dir="ltr">In the realm of deep learning and neural networks, there has been substantial advancement, but the persistent DNN vulnerability to adversarial attacks has prompted the search for more efficient defense strategies. Unfortunately, this becomes an arms race. Stronger attacks are being develops, while more sophisticated defense strategies are being proposed, which either require modifying the model's structure or incurring significant computational costs during training. The first part of the work makes a significant progress towards breaking this arms race. Let’s consider natural images, where all the feature values are discrete. Our proposed metrics are able to discover all the vulnerabilities surrounding a given natural image. Given sufficient computation resource, we are able to discover all the adversarial examples given one clean natural image, eliminating the need to develop new attacks. For remedial measures, our approach is to introduce a random factor into DNN classification process. Furthermore, our approach can be combined with existing defense strategy, such as adversarial training, to further improve performance.</p> Computer vision Adversarial machine learning Statistical data science Model Robustness explainable AI method model security
119	Strainer: State Transcript Rating for Informed News Entity Retrieval Gerrity, Thomas M 01 June 2022 (has links) (PDF) Over the past two decades there has been a rapid decline in public oversight of state and local governments. From 2003 to 2014, the number of journalists assigned to cover the proceedings in state houses has declined by more than 30\%. During the same time period, non-profit projects such as Digital Democracy sought to collect and store legislative bill and hearing information on behalf of the public. More recently, AI4Reporters, an offshoot of Digital Democracy, seeks to actively summarize interesting legislative data. This thesis presents STRAINER, a parallel project with AI4Reporters, as an active data retrieval and filtering system for surfacing newsworthy legislative data. Within STRAINER we define and implement a process pipeline by which information regarding legislative bill discussion events can be collected from a variety of sources and aggregated into feature sets suitable for machine learning. Utilizing two independent labeling techniques we trained a variety of SVM and Logistic Regression models to predict the newsworthiness of bill discussions that took place in the California State Legislature during the 2017-2018 session year. We found that our models were able to correctly retrieve more than 80\% of newsworthy discussions. News Machine Learning Data Science AI4Reporters Legislative Oversight Computational Engineering Journalism Studies Other Computer Engineering
120	Deep Learning for Detecting Trees in the Urban Environment from Lidar Rice, Julian R 01 August 2022 (has links) (PDF) Cataloguing and classifying trees in the urban environment is a crucial step in urban and environmental planning. However, manual collection and maintenance of this data is expensive and time-consuming. Algorithmic approaches that rely on remote sensing data have been developed for tree detection in forests, though they generally struggle in the more varied urban environment. This work proposes a novel method for the detection of trees in the urban environment that applies deep learning to remote sensing data. Specifically, we train a PointNet-based neural network to predict tree locations directly from LIDAR data augmented with multi-spectral imaging. We compare this model to numerous high-performant baselines on a large and varied dataset in the Southern California region. We find that our best model outperforms all baselines with a 75.5\% F-score and 2.28 meter RMSE, while being highly efficient. We then analyze and compare the sources of errors, and how these reveal the strengths and weaknesses of each approach. Deep Learning Machine Learning Forestry Remote Sensing AI Trees Data Science

Search results