Global ETD Search

1	Detecting Faults in Telecom Software Using Diffusion Models : A proof of concept study for the application of diffusion models on Telecom data / Feldetektering av telekom-mjukvaror med hjälp av diffusionsmodeller Nabeel, Mohamad January 2023 (has links) This thesis focuses on software fault detection in the telecom industry, which is crucial for companies like Ericsson to ensure stable and reliable software. Given the importance of software performance to companies that rely on it, automatically detecting faulty behavior in test or operational environments is challenging. Several approaches have been proposed to address this problem. This thesis explores reconstruction-based and forecasting-based anomaly detection using diffusion models to address software failure detection. To this end, the usage of the Structured State Space Sequence Diffusion Model was explored, which can handle temporal dependencies of varying lengths. The numerical time series data results were promising, demonstrating the model’s effectiveness in capturing and reconstructing the underlying patterns, particularly with continuous features. The contributions of this thesis are threefold: (i) A proposal of a framework for utilizing diffusion models for Time Series anomaly detection, (ii) a proposal of a particular Diffusion model Architecture that is capable of outperforming existing Ericsson Solutions on an anomaly detection dataset, (iii) presentation of experiments and results which add extra insight into the model’s capabilities, exposing some of its limitations and suggesting future research avenues to enhance its capabilities further. / Uppsatsen fokuserar på detektering av programvarufel inom telekomindustrin, vilket är essentiellt för företag som Ericsson för att säkerställa stabil och pålitlig programvara. Med hänsyn till vikten av programvarans prestanda för företag som är beroende av den är automatisk detektering av felaktigt beteende i test- eller operativa miljöer en utmanande uppgift. Flera metoder har föreslagits för att lösa problemet. Uppsatsen utforskar generativ-baserad och prediktiv-baserad anomalidetektering med hjälp av diffusionsmodeller för att hantera detektering av programvarufel. Den valda nätverksarkitekturen för att återskapa tidsseriedata var modellen ”Structured State Space Sequence Diffusion”. Resultaten för numeriska tidsseriedata var lovande och visade på modellens effektivitet i att fånga och återskapa de underliggande mönstren. Dock observerades det att modellen stötte på svårigheter vid hantering av kategoriska tidsseriekolumner. Begränsningarna i att fånga kategoriska tidsseriefunktioner pekar på ett område där modellens förmågor kan förbättras. Framtida forskning kan fokusera på att förbättra modellens förmåga att hantera kategoriska data på ett effektivt sätt. Diffusion models Anomaly Detection Telecommunication Time Series Diffusionsmodeller Anomalitetsdetektering Telekommunikation Tidsserier Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
2	Anomaly Detection in Categorical Data with Interpretable Machine Learning : A random forest approach to classify imbalanced data Yan, Ping January 2019 (has links) Metadata refers to "data about data", which contains information needed to understand theprocess of data collection. In this thesis, we investigate if metadata features can be usedto detect broken data and how a tree-based interpretable machine learning algorithm canbe used for an effective classification. The goal of this thesis is two-fold. Firstly, we applya classification schema using metadata features for detecting broken data. Secondly, wegenerate the feature importance rate to understand the model’s logic and reveal the keyfactors that lead to broken data. The given task from the Swedish automotive company Veoneer is a typical problem oflearning from extremely imbalanced data set, with 97 percent of data belongs healthy dataand only 3 percent of data belongs to broken data. Furthermore, the whole data set containsonly categorical variables in nominal scales, which brings challenges to the learningalgorithm. The notion of handling imbalanced problem for continuous data is relativelywell-studied, but for categorical data, the solution is not straightforward. In this thesis, we propose a combination of tree-based supervised learning and hyperparametertuning to identify the broken data from a large data set. Our methods arecomposed of three phases: data cleaning, which is eliminating ambiguous and redundantinstances, followed by the supervised learning algorithm with random forest, lastly, weapplied a random search for hyper-parameter optimization on random forest model. Our results show empirically that tree-based ensemble method together with a randomsearch for hyper-parameter optimization have made improvement to random forest performancein terms of the area under the ROC. The model outperformed an acceptableclassification result and showed that metadata features are capable of detecting brokendata and providing an interpretable result by identifying the key features for classificationmodel. machine learning decision tree imbalanced data anomaly detection random forest maskininlärning beslut träd obalanserat data anomalitetsdetektering Probability Theory and Statistics Sannolikhetsteori och statistik
3	Detection and Classification of Anomalies in Road Traffic using Spark Streaming Consuegra Rengifo, Nathan Adolfo January 2018 (has links) Road traffic control has been around for a long time to guarantee the safety of vehicles and pedestrians. However, anomalies such as accidents or natural disasters cannot be avoided. Therefore, it is important to be prepared as soon as possible to prevent a higher number of human losses. Nevertheless, there is no system accurate enough that detects and classifies anomalies from the road traffic in real time. To solve this issue, the following study proposes the training of a machine learning model for detection and classification of anomalies on the highways of Stockholm. Due to the lack of a labeled dataset, the first phase of the work is to detect the different kind of outliers that can be found and manually label them based on the results of a data exploration study. Datasets containing information regarding accidents and weather are also included to further expand the amount of anomalies. All experiments use real world datasets coming from either the sensors located on the highways of Stockholm or from official accident and weather reports. Then, three models (Decision Trees, Random Forest and Logistic Regression) are trained to detect and classify the outliers. The design of an Apache Spark streaming application that uses the model with the best results is also provided. The outcomes indicate that Logistic Regression is better than the rest but still suffers from the imbalanced nature of the dataset. In the future, this project can be used to not only contribute to future research on similar topics but also to monitor the highways of Stockholm. / Vägtrafikkontroll har funnits länge för att garantera säkerheten hos fordon och fotgängare. Emellertid kan avvikelser som olyckor eller naturkatastrofer inte undvikas. Därför är det viktigt att förberedas så snart som möjligt för att förhindra ett större antal mänskliga förluster. Ändå finns det inget system som är noggrannt som upptäcker och klassificerar avvikelser från vägtrafiken i realtid. För att lösa detta problem föreslår följande studie utbildningen av en maskininlärningsmodell för detektering och klassificering av anomalier på Stockholms vägar. På grund av bristen på en märkt dataset är den första fasen av arbetet att upptäcka olika slags avvikare som kan hittas och manuellt märka dem utifrån resultaten av en datautforskningsstudie. Dataset som innehåller information om olyckor och väder ingår också för att ytterligare öka antalet anomalier. Alla experiment använder realtidsdataset från antingen sensorerna på Stockholms vägar eller från officiella olyckor och väderrapporter. Därefter utbildas tre modeller (beslutsträd, slumpmässig skog och logistisk regression) för att upptäcka och klassificera outliersna. Utformningen av en Apache Spark streaming-applikation som använder modellen med de bästa resultaten ges också. Resultaten tyder på att logistisk regression är bättre än resten men fortfarande lider av datasetets obalanserade natur. I framtiden kan detta projekt användas för att inte bara bidra till framtida forskning kring liknande ämnen utan även att övervaka Stockholms vägar. anomaly detection traffic flow accidents weather decision tree random forest logistic regression streaming. anomalitetsdetektering trafikflöde olyckor väder beslutsträd slumpmässig skog logistisk regression streaming. Computer and Information Sciences Data- och informationsvetenskap
4	Unsupervised Anomaly Detection on Multi-Process Event Time Series Vendramin, Nicoló January 2018 (has links) Establishing whether the observed data are anomalous or not is an important task that has been widely investigated in literature, and it becomes an even more complex problem if combined with high dimensional representations and multiple sources independently generating the patterns to be analyzed. The work presented in this master thesis employs a data-driven pipeline for the definition of a recurrent auto-encoder architecture to analyze, in an unsupervised fashion, high-dimensional event time-series generated by multiple and variable processes interacting with a system. Facing the above mentioned problem the work investigates whether it is possible or not to use a single model to analyze patterns produced by different sources. The analysis of log files that record events of interaction between users and the radio network infrastructure is employed as realworld case-study for the given problem. The investigation aims to verify the performances of a single machine learning model applied to the learning of multiple patterns developed through time by distinct sources. The work proposes a pipeline, to deal with the complex representation of the data source and the definition and tuning of the anomaly detection model, that is based on no domain-specific knowledge and can thus be adapted to different problem settings. The model has been implemented in four different variants that have been evaluated over both normal and anomalous data, gathered partially from real network cells and partially from the simulation of anomalous behaviours. The empirical results show the applicability of the model for the detection of anomalous sequences and events in the described conditions, with scores reaching above 80% in terms of F1-score, and varying depending on the specific threshold setting. In addition, their deeper interpretation gives insights about the difference between the variants of the model and thus, their limitations and strong points. / Att fastställa huruvida observerade data är avvikande eller inte är en viktig uppgift som har studerats ingående i litteraturen och problemet blir ännu mer komplext, om detta kombineras med högdimensionella representationer och flera källor som oberoende genererar de mönster som ska analyseras. Arbetet som presenteras i denna uppsats använder en data-driven pipeline för definitionen av en återkommande auto-encoderarkitektur för att analysera, på ett oövervakat sätt, högdimensionella händelsetidsserier som genereras av flera och variabla processer som interagerar med ett system. Mot bakgrund av ovanstående problem undersöker arbetet om det är möjligt eller inte att använda en enda modell för att analysera mönster som producerats av olika källor. Analys av loggfiler som registrerar händelser av interaktion mellan användare och radionätverksinfrastruktur används som en fallstudie för det angivna problemet. Undersökningen syftar till att verifiera prestandan hos en enda maskininlärningsmodell som tillämpas för inlärning av flera mönster som utvecklats över tid från olika källor. Arbetet föreslår en pipeline för att hantera den komplexa representationen hos datakällorna och definitionen och avstämningen av anomalidetektionsmodellen, som inte är baserad på domänspecifik kunskap och därför kan anpassas till olika probleminställningar. Modellen har implementerats i fyra olika varianter som har utvärderats med avseende på både normala och avvikande data, som delvis har samlats in från verkliga nätverksceller och delvis från simulering av avvikande beteenden. De empiriska resultaten visar modellens tillämplighet för detektering av avvikande sekvenser och händelser i det föreslagna ramverket, med F1-score över 80%, varierande beroende på den specifika tröskelinställningen. Dessutom ger deras djupare tolkning insikter om skillnaden mellan olika varianter av modellen och därmed deras begränsningar och styrkor. Anomaly Detection Recurrent Neural Networks Time Series Analysis Unsupervised Learning Anomalitetsdetektering Återkommande neurala nätverk Tidsserieanalys Oövervakat lärande Computer and Information Sciences Data- och informationsvetenskap

1

Page generated in 0.0881 seconds