Global ETD Search

1	Requirement-based Root Cause Analysis Using Log Data Zawawy, Hamzeh January 2012 (has links) Root Cause Analysis for software systems is a challenging diagnostic task due to complexity emanating from the interactions between system components. Furthermore, the sheer size of the logged data makes it often difficult for human operators and administrators to perform problem diagnosis and root cause analysis. The diagnostic task is further complicated by the lack of models that could be used to support the diagnostic process. Traditionally, this diagnostic task is conducted by human experts who create mental models of systems, in order to generate hypotheses and conduct the analysis even in the presence of incomplete logged data. A challenge in this area is to provide the necessary concepts, tools, and techniques for the operators to focus their attention to specific parts of the logged data and ultimately to automate the diagnostic process. The work described in this thesis aims at proposing a framework that includes techniques, formalisms, and algorithms aimed at automating the process of root cause analysis. In particular, this work uses annotated requirement goal models to represent the monitored systems' requirements and runtime behavior. The goal models are used in combination with log data to generate a ranked set of diagnostics that represent the combination of tasks that failed leading to the observed failure. In addition, the framework uses a combination of word-based and topic-based information retrieval techniques to reduce the size of log data by filtering out a subset of log data to facilitate the diagnostic process. The process of log data filtering and reduction is based on goal model annotations and generates a sequence of logical literals that represent the possible systems' observations. A second level of investigation consists of looking for evidence for any malicious (i.e., intentionally caused by a third party) activity leading to task failures. This analysis uses annotated anti-goal models that denote possible actions that can be taken by an external user to threaten a given system task. The framework uses a novel probabilistic approach based on Markov Logic Networks. Our experiments show that our approach improves over existing proposals by handling uncertainty in observations, using natively generated log data, and by providing ranked diagnoses. The proposed framework has been evaluated using a test environment based on commercial off-the-shelf software components, publicly available Java Based ATM machine, and the large publicly available dataset (DARPA 2000). Root cause analysis Log data Probabilistic reasoning Markov Logic Network Electrical and Computer Engineering
2	Žiniatinklio įrašų gavybos paruošimo, analizės ir rezultatų pateikimo naudotojui tobulinimas / Enhancements of pre-processing, analysis and presentation techniques in web log mining Pabarškaitė, Židrina 13 July 2009 (has links) Mokslo problemos aktualumas – dėl didėjančios konkurencijos rinkoje ieškoma naujų darbo formų, todėl didžioji dalis verslo ir ne pelno siekiančių struktūrų perkeliamos į internetinę erdvę. Tai apima įvairių tipų – įmonės-kliento, įmonės-įmonės (skirtingų verslo subjektų) bei kitokius santykius. Be to, per paskutinį dešimtmetį išaugo valstybinių institucijų, bibliotekų, asmeninių svetainių skaičius. Siūlyti prekes, teikti verslo paslaugas ar skelbti aktualią informaciją internete yra labai patogu, nes tai nepriklauso nuo geografinių ir laiko juostų skirtumų. Naudotojas, esantis kitur, nei verslo ar informacijos teikėjas, gali naršyti įmonės internetinę svetainę ir priimti sprendimą, susijusį su minėta verslo struktūra. Šis virtualus ryšys tarp tinklapių ir jų lankytojų palieka pėdsakus – įrašus arba dar kitaip vadinamus įrašus žiniatinklio žurnale, kurie kaupiasi tinklapį aptarnaujančioje tarnybinėje stotyje. Dėl tobulėjančių technologijų atsirado galimybė kaupti ir analizuoti didelių apimčių duomenis, todėl daugiau nei prieš dešimtmetį atsirado nauja tyrimų sritis – žiniatinklio įrašų gavyba. Šio žinių gavybos procesas yra panašus į kitokių duomenų (pvz. finansinių, medicininių), tačiau tam tikri šio proceso etapai yra skirtingi bei unikalūs. Praktinė nauda, kuri gali būti gaunama analizuojant naudotojų naršymo maršrutus tinklapyje – ištirti ryšius tarp susijusių puslapių, atrasti dažniausiai pasirenkamų puslapių sekas bei tokias puslapių sekas, kurios naršomos tam tikru... [toliau žr. visą tekstą] / Topicality of the problem – Internet is becoming an important part of our life; therefore more attention is paid to the information quality on the web and how it is displayed to the user. This knowledge can be extracted by gathering web servers’ data – log files, where all users’ navigational patters are recorded. The research area of this work is web log data analysis in order to enhance information presentation on the web. Web log data analysis steps are similar to other kind of data analysis (e. g. financial, medical) but some processes are different and unique. The research objects of the dissertation are web log data cleaning methods, data mining algorithms and web text mining. The key aim of the work is to improve pattern discovery steps mining web log data in order to: 1. improve the quality of the data for researchers who analyse users behaviour, 2. improve the ways how information is presented, to speed up information display to the end user. Informatics Engineering Žiniatinklis Žurnalo įrašai Žinių gavyba Web Log data Data mining
3	Requirement-based Root Cause Analysis Using Log Data Zawawy, Hamzeh January 2012 (has links) Root Cause Analysis for software systems is a challenging diagnostic task due to complexity emanating from the interactions between system components. Furthermore, the sheer size of the logged data makes it often difficult for human operators and administrators to perform problem diagnosis and root cause analysis. The diagnostic task is further complicated by the lack of models that could be used to support the diagnostic process. Traditionally, this diagnostic task is conducted by human experts who create mental models of systems, in order to generate hypotheses and conduct the analysis even in the presence of incomplete logged data. A challenge in this area is to provide the necessary concepts, tools, and techniques for the operators to focus their attention to specific parts of the logged data and ultimately to automate the diagnostic process. The work described in this thesis aims at proposing a framework that includes techniques, formalisms, and algorithms aimed at automating the process of root cause analysis. In particular, this work uses annotated requirement goal models to represent the monitored systems' requirements and runtime behavior. The goal models are used in combination with log data to generate a ranked set of diagnostics that represent the combination of tasks that failed leading to the observed failure. In addition, the framework uses a combination of word-based and topic-based information retrieval techniques to reduce the size of log data by filtering out a subset of log data to facilitate the diagnostic process. The process of log data filtering and reduction is based on goal model annotations and generates a sequence of logical literals that represent the possible systems' observations. A second level of investigation consists of looking for evidence for any malicious (i.e., intentionally caused by a third party) activity leading to task failures. This analysis uses annotated anti-goal models that denote possible actions that can be taken by an external user to threaten a given system task. The framework uses a novel probabilistic approach based on Markov Logic Networks. Our experiments show that our approach improves over existing proposals by handling uncertainty in observations, using natively generated log data, and by providing ranked diagnoses. The proposed framework has been evaluated using a test environment based on commercial off-the-shelf software components, publicly available Java Based ATM machine, and the large publicly available dataset (DARPA 2000). Root cause analysis Log data Probabilistic reasoning Markov Logic Network Electrical and Computer Engineering
4	Personalisierungsstrategien im E-Commerce : die Webloganalyse als Instrument der Personalisierung im Rahmen des eCRM / Mayer, Thomas. January 2007 (has links) Universiẗat, Diss., 2006--Freiburg (Breisg.).
5	Designing and Evaluating a Visualization System for Log Data Wang, Xiaohan January 2020 (has links) In the engineering field, log data analysis has been conducted by most companies as it has become a significant step for discovering problems and obtaining insights into the system. Visualization which brings better comprehension of data could be used as an effective and intuitive method for data analysis. This study aims at applying a participatory design approach to develop a visualization system of log data, employed with design activities including interviews, prototyping, usability testing and questionnaires in the research process, along with a comparative study on the impacts of using narrative visualization techniques and storytelling on usability and user engagement with exploratory visualizations. The findings exposed that using storytelling and narrative visualization techniques seems to increase user engagement while it does not seem to increase usability. Definitive conclusions could not be drawn due to a low demographic diversity of participants; however, the results could be an initial insight to trigger further research on the impacts of storytelling and narrative visualization techniques on user experience. Future research is encouraged to recruit more participants in a wide diversity, pre-process log data and conduct a comparative study on selecting the best visualization for log data. / Inom teknikområdet har loggdata-analys genomförts av de flesta företag eftersom det har blivit ett viktigt steg for att upptäcka problem och fa insikt i systemet. Visualisering som ger bättre förståelse av data kan användas som en effektiv och intuitiv metod for dataanalys. Denna studie syftar till att tillämpa en deltagande designmetod for att utveckla ett visualiseringssystem av loggdata, anställda med designaktiviteter inklusive intervjuer, prototyper, användbarhetstest och frågeformulär i forskningsprocessen, tillsammans med en jämförande studie av effekterna av att använda berättande visualiseringstekniker och berättelse om användbarhet och användarengagemang med utforskande visualiseringar. Resultaten visade att användning av berättelser och berättande visualiseringstekniker verkar oka användarnas engagemang medan det inte verkar oka användbarheten. Definitiva slutsatser kunde inte dras på grund av en lag demografisk mångfald av deltagare; emellertid kan resultaten vara en första insikt for att utlösa ytterligare forskning om effekterna av berättelser och berättande visualiseringstekniker på användarupplevelsen. Framtida forskning uppmuntras att rekrytera fler deltagare i en stor mångfald, förbereda loggdata och genomföra en jämförande studie om att välja den basta visualiseringen for loggdata. Usability evaluation User Engagement Visualization Participatory design Log data Computer and Information Sciences Data- och informationsvetenskap
6	Stealth Assessment of Self-Regulative Behaviors within a Game-Based Environment January 2014 (has links) abstract: Students' ability to regulate and control their behaviors during learning has been shown to be a critical skill for academic success. However, researchers often struggle with ways to capture the nuances of this ability, often solely relying on self-report measures. This thesis proposal employs a novel approach to investigating variations in students' ability to self-regulate by using process data from the game-based Intelligent Tutoring System (ITS) iSTART-ME. This approach affords a nuanced examination of how students' regulate their interactions with game-based features at both a coarse-grained and fine-grain levels and the ultimate impact that those behaviors have on in-system performance and learning outcomes (i.e., self-explanation quality). This thesis is comprised of two submitted manuscripts that examined how a group of 40 high school students chose to engage with game-based features and how those interactions influenced their target skill performance. Findings suggest that in-system log data has the potential to provide stealth assessments of students' self-regulation while learning. / Dissertation/Thesis / M.A. Psychology 2014 Psychology Cognitive psychology Data Mining Game-based learning Intelligent Tutoring Systems Log-data Seductive Details Self-regulation
7	Detekce anomalit v log datech / Anomaly Detection on Log Data Babušík, Jan January 2021 (has links) This thesis deals with anomaly detection of log data. Big software systems produce a great amount of log data which are not further processed. There are usually so many logs that it becomes impossible to check every log entry manually. In this thesis we introduce models that minimize primarily count of false positive predictions with expected complexity of data annotation taken into account. The compared models are based on PCA algorithm, N-gram model and recurrent neural networks with LSTM cell. In the thesis we present results of the models on widely used datasets and also on a real dataset provided by HAVIT, s.r.o. 1
8	Advanced Algorithms for Classification and Anomaly Detection on Log File Data : Comparative study of different Machine Learning Approaches Wessman, Filip January 2021 (has links) Background: A problematic area in today’s large scale distributed systems is the exponential amount of growing log data. Finding anomalies by observing and monitoring this data with manual human inspection methods becomes progressively more challenging, complex and time consuming. This is vital for making these systems available around-the-clock. Aim: The main objective of this study is to determine which are the most suitable Machine Learning (ML) algorithms and if they can live up to needs and requirements regarding optimization and efficiency in the log data monitoring area. Including what specific steps of the overall problem can be improved by using these algorithms for anomaly detection and classification on different real provided data logs. Approach: Initial pre-study is conducted, logs are collected and then preprocessed with log parsing tool Drain and regular expressions. The approach consisted of a combination of K-Means + XGBoost and respectively Principal Component Analysis (PCA) + K-Means + XGBoost. These was trained, tested and with different metrics individually evaluated against two datasets, one being a Server data log and on a HTTP Access log. Results: The results showed that both approaches performed very well on both datasets. Able to with high accuracy, precision and low calculation time classify, detect and make predictions on log data events. It was further shown that when applied without dimensionality reduction, PCA, results of the prediction model is slightly better, by a few percent. As for the prediction time, there was marginally small to no difference for when comparing the prediction time with and without PCA. Conclusions: Overall there are very small differences when comparing the results for with and without PCA. But in essence, it is better to do not use PCA and instead apply the original data on the ML models. The models performance is generally very dependent on the data being applied, it the initial preprocessing steps, size and it is structure, especially affecting the calculation time the most. Machine Learning (ML) K-Means Principal Component Analysis (PCA) XGBoost Log data Anomaly Detection Outlier Detection Clustering. Computer Engineering Datorteknik
9	From Log-Data to Regressive Machine Learning Models for Predictive Maintenance : A case study van Dam, Lucas Christiaan January 2022 (has links) There are three ways to deal with component failure: reactive maintenance, preventive maintenance, and predictive maintenance. Reactive maintenance is to repair only once something breaks. Preventive maintenance is to repair before it breaks, independent of actual wear. Predictive maintenance is performed on the basis of real time operational data, repairing when components cross a certain degradation threshold. With classification models one can determine the health state of a component. Regression models, on the other hand, allow the user to calculate a more precise estimate of remaining useful life. Previous research on regression models have exclusively used sensory data while classification models have used both sensory data as well as log-data. Research on predictive maintenance using regression models have found most success using SVM regression, decision trees, random forest regression, artificial neural networks and LSTM models. Companies have more and more data to their disposal about the performance of their machines, but usually in the form of log-data. The goal of this research is to find if it is possible to use log-data for regression models. If this is the case, more sophisticated regression models can be used to apply predictive maintenance more accurately on a broader scale than is currently the case. The project was performed through a case study at a company in the semiconductor industry in the Netherlands, with years of log-data of their product that are gradually degrading over time. After quantifying the log-data and trying all kinds of different regression models in combination with different time scales, the results were unilaterally abysmal and were unable to make any decent prediction. The reason for this according to several experts in the field of data science is that there was no in depth understanding of the data. They say it is required to have an integral understanding of the log-data and to closely collaborate with field engineers who know the data in and out. If a field engineer can say something about the degradation of a machine using only the log-data, a machine learning model can do it too. If a machine learning model is unable to purposefully overfit on the training data and the results are bad, there is no signal in the dataset and the task is impossible. It does not matter if the data was originally sensory or log-based, the only thing that matters is understanding what the data means and the presence of the degradation signal within. Predictive maintenance Log-data Remaining useful lifetime Regression models Machine learning Component degradation Övrig annan teknik
10	Supervised Failure Diagnosis of Clustered Logs from Microservice Tests / Övervakad feldiagnos av klustrade loggar från tester på mikrotjänster Strömdahl, Amanda January 2023 (has links) Pinpointing the source of a software failure based on log files can be a time consuming process. Automated log analysis tools are meant to streamline such processes, and can be used for tasks like failure diagnosis. This thesis evaluates three supervised models for failure diagnosis of clustered log data. The goal of the thesis is to compare the performance of the models on industry data, as a way to investigate whether the chosen ML techniques are suitable in the context of automated log analysis. A Random Forest, an SVM and an MLP are generated from a dataset of 194 failed executions of tests on microservices, that each resulted in a large collection of logs. The models are tuned with random search and compared in terms of precision, recall, F1-score, hold-out accuracy and 5-fold cross-validation accuracy. The hold-out accuracy is calculated as a mean from 50 hold-out data splits, and the cross-validation accuracy is computed separately from a single set of folds. The results show that the Random Forest scores highest in terms of mean hold-out accuracy (90%), compared to the SVM (86%) and the Neural Network (85%). The mean cross-validation accuracy is the highest for the SVM (95%), closely followed by the Random Forest (94%), and lastly the Neural Network (85%). The precision, recall and F1-score are stable and consistent with the hold-out results, although the precision results are slightly higher than the other two measures. According to this evaluation, the Random Forest has the overall highest performance on the dataset when considering the hold-out- and cross-validation accuracies, and also the fact that it has the lowest complexity and thus the shortest training time, compared to the other considered solutions. All in all, the results of the thesis demonstrate that supervised learning is a promising approach to automatize log analysis. / Att identifiera orsaken till en misslyckad mjukvaruexekvering utifrån logg-filer kan vara en tidskrävande process. Verktyg för automatiserad logg-analysis är tänkta att effektivisera sådana processer, och kan bland annat användas för feldiagnos. Denna avhandling tillhandahåller tre övervakade modeller för feldiagnos av klustrad logg-data. Målet med avhandlingen är att jämföra modellernas prestanda på data från näringslivet, i syfte att utforska huruvida de valda maskininlärningsteknikerna är lämpliga för automatiserad logg-analys. En Random Forest, en SVM och en MLP genereras utifrån ett dataset bestående av 194 misslyckade exekveringar av tester på mikrotjänster, där varje exekvering resulterade i en stor uppsättning loggar. Modellerna finjusteras med hjälp av slumpmässig sökning och jämförs via precision, träffsäkerhet, F-poäng, noggrannhet och 5-faldig korsvalidering. Noggrannheten beräknas som medelvärdet av 50 datauppdelningar, och korsvalideringen tas fram separat från en enstaka uppsättning vikningar. Resultaten visar att Random Forest har högst medelvärde i noggrannhet (90%), jämfört med SVM (86%) och Neurala Nätverket (85%). Medelvärdet i korsvalidering är högst för SVM (95%), tätt följt av Random Forest (94%), och till sist, Neurala Nätverket (85%). Precisionen, träffsäkerheten och F-poängen är stabila och i enlighet med noggrannheten, även om precisionen är något högre än de andra två måtten. Enligt den här analysen har Random Forest överlag högst prestanda på datasetet, med hänsyn till noggrannheten och korsvalideringen, samt faktumet att denna modell har lägst komplexitet och därmed kortast träningstid, jämfört med de andra undersökta lösningarna. Sammantaget visar resultaten från denna avhandling att övervakad inlärning är ett lovande tillvägagångssätt för att automatisera logg-analys. Supervised Learning Failure Diagnosis Clustered Log Data Random Forest SVM MLP Övervakad inlärning feldiagnos klustrad logg-data Random Forest SVM MLP Computer and Information Sciences Data- och informationsvetenskap

Search results