Global ETD Search

1	Anomalidetektering i loggar med förstärkt inlärning / Anomaly detection in log files with reinforcement learning Lantz, Sofia January 2021 (has links) By using machine learning to monitor and find deviations in log data makes it easier for developers and can prevent a workflow from stopping. The goal of this project is to investigate if it is possible to find anomalies in log data using reinforcement learning. An anomaly detection model with reinforcement learning is compared to a machine learning method traditionally used for anomaly detection. The results show that reinforcement learning has an opportunity for a better or similar result as the traditional machine learning method. Maskininlärning Anomalidetektering Förstärkt inlärning Logganalys Computer Sciences Datavetenskap (datalogi)
2	HOTDETEKTERING MED LOGGANALYS : En jämförelse mellan Graylog och ManageEngine Eventlog Analyzer Ahmed, Said Hassan January 2023 (has links) I dagens digitala samhälle är företag beroende av internet för att leverera tjänster till sinakunder. Det medför att de riskerar att bli utsatta för cyberattacker som kan hota derasverksamhet. Hotdetektering med logganalys innebär insamling och analys av loggdata för attidentifiera anomalier som indikerar potentiella hot. I detta examensarbete jämförs två av demest populära logganalysverktygen för att avgöra vilket av dem som är lämpligast förhotdetektering. Verktygen Graylog och ManageEngine Eventlog Analyzer jämförs utifrånvilka hot de kan detektera, hur komplexa de är, hur mycket belastning de utgör påvärdenheten och vilka kostnader verktygen har. Syftet med detta examensarbete är attunderlätta valet av logganalysverktyg som kan användas i hotdetekteringssyfte. Arbetet bestårav en litteraturstudie och en laboration. För att utvärdera verktygen har det i en litteraturstudiesamlats in information om vilka hot verktygen kan detektera, vilka kostnader verktygen haroch vilka resurskrav verktygen har. I en laboration har verktygens komplexitet utvärderats. Ilaborationen utvärderades även verktygens belastning på värdenhetens CPU och RAM-minne.Resultatet visar att ManageEngine Eventlog Analyzer är det lämpligaste logganalysverktygetför hotdetektering därför att den detekterar fler hot i jämförelse med Graylog, samtidigt somdet är mindre komplext att använda på grund av den tillgängliga tekniska supporten. Hotdetektering logganalys graylog manageengine eventlog analyzer Computer and Information Sciences Data- och informationsvetenskap Elektroteknik och elektronik
3	Discover patterns within train log data using unsupervised learning and network analysis Guo, Zehua January 2022 (has links) With the development of information technology in recent years, log analysis has gradually become a hot research topic. However, manual log analysis requires specialized knowledge and is a time-consuming task. Therefore, more and more researchers are searching for ways to automate log analysis. In this project, we explore methods for train log analysis using natural language processing and unsupervised machine learning. Multiple language models are used in this project to extract word embeddings, one of which is the traditional language model TF-IDF, and the other three are the very popular transformer-based model, BERT, and its variants, the DistilBERT and the RoBERTa. In addition, we also compare two unsupervised clustering algorithms, the DBSCAN and the Mini-Batch k-means. The silhouette coefficient and Davies-Bouldin score are utilized for evaluating the clustering performance. Moreover, the metadata of the train logs is used to verify the effectiveness of the unsupervised methods. Apart from unsupervised learning, network analysis is applied to the train log data in order to explore the connections between the patterns, which are identified by train control system experts. Network visualization and centrality analysis are investigated to analyze the relationship and, in terms of graph theory, importance of the patterns. In general, this project provides a feasible direction to conduct log analysis and processing in the future. / I och med informationsteknologins utveckling de senaste åren har logganalys gradvis blivit ett hett forskningsämne. Manuell logganalys kräver dock specialistkunskap och är en tidskrävande uppgift. Därför söker fler och fler forskare efter sätt att automatisera logganalys. I detta projekt utforskar vi metoder för tåglogganalys med hjälp av naturlig språkbehandling och oövervakad maskininlärning. Flera språkmodeller används i detta projekt för att extrahera ordinbäddningar, varav en är den traditionella språkmodellen TF-IDF, och de andra tre är den mycket populära transformatorbaserade modellen, BERT, och dess varianter, DistilBERT och RoBERTa. Dessutom jämför vi två oövervakade klustringsalgoritmer, DBSCAN och Mini-Batch k-means. Siluettkoefficienten och Davies-Bouldin-poängen används för att utvärdera klustringsprestandan. Dessutom används tågloggarnas metadata för att verifiera effektiviteten hos de oövervakade metoderna. Förutom oövervakad inlärning tillämpas nätverksanalys på tågloggdata för att utforska sambanden mellan mönstren, som identifieras av experter på tågstyrsystem. Nätverksvisualisering och centralitetsanalys undersöks för att analysera sambandet och grafteoriskt betydelsen av mönstren mönstren. I allmänhet ger detta projekt en genomförbar riktning för att genomföra logganalys och bearbetning i framtiden. Log analysis Natural language processing Unsupervised learning Clustering Network analysis Logganalys Bearbetning av naturligt språk Oövervakat lärande Clustering Nätverksanalys Computer and Information Sciences Data- och informationsvetenskap
4	Predicting user churn using temporal information : Early detection of churning users with machine learning using log-level data from a MedTech application / Förutsägning av användaravhopp med tidsinformation : Tidig identifiering av avhoppande användare med maskininlärning utifrån systemloggar från en medicinteknisk produkt Marcus, Love January 2023 (has links) User retention is a critical aspect of any business or service. Churn is the continuous loss of active users. A low churn rate enables companies to focus more resources on providing better services in contrast to recruiting new users. Current published research on predicting user churn disregards time of day and time variability of events and actions by feature selection or data preprocessing. This thesis empirically investigates the practical benefits of including accurate temporal information for binary prediction of user churn by training a set of Machine Learning (ML) classifiers on differently prepared data. One data preparation approach was based on temporally sorted logs (log-level data set), and the other on stacked aggregations (aggregated data set) with additional engineered temporal features. The additional temporal features included information about relative time, time of day, and temporal variability. The inclusion of the temporal information was evaluated by training and evaluating the classifiers with the different features on a real-world dataset from a MedTech application. Artificial Neural Networks (ANNs), Random Forrests (RFs), Decision Trees (DTs) and naïve approaches were applied and benchmarked. The classifiers were compared with among others the Area Under the Receiver Operating Characteristics Curve (AUC), Positive Predictive Value (PPV) and True Positive Rate (TPR) (a.k.a. precision and recall). The PPV scores the classifiers by their accuracy among the positively labeled class, the TPR measures the recognized proportion of the positive class, and the AUC is a metric of general performance. The results demonstrate a statistically significant value of including time variation features overall and particularly that the classifiers performed better on the log-level data set. An ANN trained on temporally sorted logs performs best followed by a RF on the same data set. / Bevarande av användare är en kritisk aspekt för alla företag eller tjänsteleverantörer. Ett lågt användarbortfall gör det möjligt för företag att fokusera mer resurser på att tillhandahålla bättre tjänster istället för att rekrytera nya användare. Tidigare publicerad forskning om att förutsäga användarbortfall bortser från tid på dygnet och tidsvariationer för loggad användaraktivitet genom val av förbehandlingsmetoder eller variabelselektion. Den här avhandlingen undersöker empiriskt de praktiska fördelarna med att inkludera information om tidsvariabler innefattande tid på dygnet och tidsvariation för binär förutsägelse av användarbortfall genom att träna klassificerare på data förbehandlat på olika sätt. Två förbehandlingsmetoder används, en baserad på tidssorterade loggar (loggnivå) och den andra på packade aggregeringar (aggregerat) utökad med framtagna tidsvariabler. Inklusionen av tidsvariablerna utvärderades genom att träna och utvärdera en uppsättning MLklassificerare med de olika tidsvariablerna på en verklig datamängd från en digital medicinskteknisk produkt. ANNs, RFs, DTs och naiva tillvägagångssätt tillämpades och jämfördes på den aggregerade datamängden med och utan tidsvariationsvariablerna och på datamängden på loggnivå. Klassificerarna jämfördes med bland annat AUC, PPV och TPR. PPV betygsätter algoritmerna efter träffsäkerhet bland den positivt märkta klassen och TPR utvärderar hur stor del av den positiva klassen som identifierats medan AUC är ett mått av klassificerarnas allmänna prestanda. Resultaten visar ett betydande värde av att inkludera tidsvariationsvariablerna överlag och i synnerhet att klassificerarna presterade bättre på datauppsättningen på loggnivå. Ett ANN tränad på tidssorterade loggar presterar bäst följt av en RF på samma datamängd. User churn Customer attrition Artificial neural networks Log-level analysis Random forests Decision trees Användarbortfall Kundbortfall Artificiella neurala nätverk logganalys Slumpskogar Beslutsträd Computer and Information Sciences Data- och informationsvetenskap
5	Integrating Telecommunications-Specific Language Models into a Trouble Report Retrieval Approach / Integrering av telekommunikationsspecifika språkmodeller i en metod för hämtning av problemrapporter Bosch, Nathan January 2022 (has links) In the development of large telecommunications systems, it is imperative to identify, report, analyze and, thereafter, resolve both software and hardware faults. This resolution process often relies on written trouble reports (TRs), that contain information about the observed fault and, after analysis, information about why the fault occurred and the decision to resolve the fault. Due to the scale and number of TRs, it is possible that a newly written fault is very similar to previously written faults, e.g., a duplicate fault. In this scenario, it can be beneficial to retrieve similar TRs that have been previously created to aid the resolution process. Previous work at Ericsson [1], introduced a multi-stage BERT-based approach to retrieve similar TRs given a newly written fault observation. This approach significantly outperformed simpler models like BM25, but suffered from two major challenges: 1) it did not leverage the vast non-task-specific telecommunications data at Ericsson, something that had seen success in other work [2], and 2) the model did not generalize effectively to TRs outside of the telecommunications domain it was trained on. In this thesis, we 1) investigate three different transfer learning strategies to attain stronger performance on a downstream TR duplicate retrieval task, notably focusing on effectively integrating existing telecommunicationsspecific language data into the model fine-tuning process, 2) investigate the efficacy of catastrophic forgetting mitigation strategies when fine-tuning the BERT models, and 3) identify how well the models perform on out-of-domain TR data. We find that integrating existing telecommunications knowledge through the form of a pretrained telecommunications-specific language model into our fine-tuning strategies allows us to outperform a domain adaptation fine-tuning strategy. In addition to this, we find that Elastic Weight Consolidation (EWC) is an effective strategy for mitigating catastrophic forgetting and attaining strong downstream performance on the duplicate TR retrieval task. Finally, we find that the generalizability of models is strong enough to perform reasonably effectively on out-of-domain TR data, indicating that the approaches may be eligible in a real-world deployment. / Vid utvecklingen av stora telekommunikationssystem är det absolut nödvändigt att identifiera, rapportera, analysera och därefter lösa både mjukvaru och hårdvarufel. Denna lösningsprocess bygger ofta på noggrant skrivna felrapporter (TRs), som innehåller information om det observerade felet och, efter analys, information om varför felet uppstod och beslutet att åtgärda felet. På grund av skalan och antalet TR:er är det möjligt att ett nyskrivet fel är mycket likt tidigare skrivna fel, t.ex. ett duplikatfel. I det här scenariot kan det vara mycket fördelaktigt att hämta tidigare skapade, liknande TR:er för att underlätta upplösningsprocessen. Tidigare arbete på Ericsson [1], introducerade en flerstegs BERT-baserad metod för att hämta liknande TRs givet en nyskriven felobservation. Detta tillvägagångssätt överträffade betydligt enklare modeller som BM-25, men led av två stora utmaningar: 1) det utnyttjade inte den stora icke-uppgiftsspecifika telekommunikationsdatan hos Ericsson, något som hade sett framgång i annat arbete [2], och 2) modellen generaliserades inte effektivt till TR:er utanför den telekommunikationsdomän som den bildades på. I den här masteruppsatsen undersöker vi 1) tre olika strategier för överföringsinlärning för att uppnå starkare prestanda på en nedströms TR dubbletthämtningsuppgift, varav några fokuserar på att effektivt integrera fintliga telekommunikationsspecifika språkdata i modellfinjusteringsprocessen, 2) undersöker effektiviteten av katastrofala missglömningsreducerande strategier vid finjustering av BERT-modellerna, och 3) identifiera hur väl modellerna presterar på TR-data utanför domänen. Resultatet är genom att integrera befintlig telekommunikationskunskap i form av en förtränad telekommunikationsspecifik språkmodell i våra finjusteringsstrategier kan vi överträffa en finjusteringsstrategi för domänanpassning. Utöver detta har vi fåt fram att EWC är en effektiv strategi för att mildra katastrofal glömska och uppnå stark nedströmsprestanda på dubbla TR hämtningsuppgiften. Slutligen finner vi att generaliserbarheten av modeller är tillräckligt stark för att prestera någorlunda effektivt på TR-data utanför domänen, vilket indikerar att tillvägagångssätten som beskrivs i denna avhandling kan vara kvalificerade i en verklig implementering. information retrieval neural ranking trouble reports log analysis natural language processing informationssökning neural rangordning felrapporter logganalys naturlig språkbehandling Computer and Information Sciences Data- och informationsvetenskap
6	Anomaly Detection in Telecom Service Provider Network Infrastructure Security Logs using an LSTM Autoencoder : Leveraging Time Series Patterns for Improved Anomaly Detection / Avvikelsedetektering i säkerhetsloggar för nätverksinfrastruktur hos en telekomtjänstleverantör med en LSTM Autoencoder : Uttnyttjande av tidsseriemönster för förbättrad avvikelsedetektering Vlk, Vendela January 2024 (has links) New regulations are placed on Swedish Telecom Service Providers (TSPs) due to a rising concern for safeguarding network security and privacy in the face of ever-evolving cyber threats. These regulations demand that Swedish telecom companies expand their data security strategies with proactive security measures. Logs, serving as digital footprints in IT infrastructure, play a crucial role in identifying anomalies that could indicate security breaches. Deep Learning (DL) has been used to detect anomalies in logs due to its ability to discern intricate patterns within the data. By leveraging deep learning-based models, it is not only possible to identify anomalies but also to predict and mitigate potential threats within the telecom network. An LSTM autoencoder was implemented to detect anomalies in two separate multivariate temporal log datasets; the BETH cybersecurity dataset, and a Cisco log dataset that was created specifically for this thesis. The empirical results in this thesis show that the LSTM autoencoder reached an ROC AUC of 99.5% for the BETH dataset and 76.6% for the Cisco audit dataset. The use of an additional anomaly detection aid in the Cisco audit dataset let the model reach an ROC AUC of 99.6%. The conclusion that could be drawn from this work was that the systematic approach to developing a deep learning model for anomaly detection in log data was efficient. However, the study’s findings raise crucial considerations regarding the appropriateness of various log data for deep learning models used in anomaly detection. / Nya föreskrifter har införts för svenska telekomtjänsteleverantörer på grund av en ökad angelägenhet av att säkerställa nätverkssäkerhet och integritet inför ständigt föränderliga cyberhot. Dessa föreskrifter kräver att svenska telekomföretag utvidgar sina dataskyddsstrategier med proaktiva säkerhetsåtgärder. Loggar, som fungerar som digitala fotspår inom IT-infrastruktur, spelar en avgörande roll för att identifiera avvikelser som kan tyda på säkerhetsintrång. Djupinlärning har använts för att upptäcka avvikelser i loggar på grund av dess förmåga att urskilja intrikata mönster inom data. Genom att utnyttja modeller baserade på djupinlärning är det inte bara möjligt att identifiera avvikelser utan även att förutsäga samt mildra konsekvenserna av potentiella hot inom telekomnätet. En LSTM-autoencoder implementerades för att upptäcka avvikelser i två separata multivariata tidsserielogguppsättningar; BETH-cybersäkerhetsdatauppsättningen och en Cisco-loggdatauppsättning som skapades specifikt för detta arbete. De empiriska resultaten i denna avhandling visar att LSTM-autoencodern uppnådde en ROC AUC på 99.5% för BETH-datauppsättningen och 76.6% för Cisco-datauppsättningen. Användningen av ett ytterligare avvikelsedetekteringsstöd i Cisco-datauppsättningen möjliggjorde att modellen uppnådde en ROC AUC på 99.6%. Slutsatsen som kunde dras från detta arbete var att den systematiska metoden för att utveckla en djupinlärningsmodell för avvikelsedetektering i loggdata var effektiv. Dock väcker studiens resultat kritiska överväganden angående lämpligheten av olika loggdata för djupinlärningsmodeller som används för avvikelsedetektering. Anomaly detection Deep Learning LSTM Autoencoder Time series Log analysis Avvikelsedetektion Djupinlärning LSTM Autoencoder Tidsserier Logganalys Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
7	How to Estimate Local Performance using Machine learning Engineering (HELP ME) : from log files to support guidance / Att estimera lokal prestanda med hjälp av maskininlärning Ekinge, Hugo January 2023 (has links) As modern systems are becoming increasingly complex, they are also becoming more and more cumbersome to diagnose and fix when things go wrong. One domain where it is very important for machinery and equipment to stay functional is in the world of medical IT, where technology is used to improve healthcare for people all over the world. This thesis aims to help with reducing downtime on critical life-saving equipment by implementing automatic analysis of system logs that without any domain experts involved can give an indication of the state that the system is in. First, a literature study was performed where three potential candidates of suitable neural network architectures was found. Next, the networks were implemented and a data pipeline for collecting and labeling training data was set up. After training the networks and testing them on a separate data set, the best performing model out of the three was based on GRU (Gated Recurrent Unit). Lastly, this model was tested on some real world system logs from two different sites, one without known issues and one with slow image import due to network issues. The results showed that it was feasible to build such a system that can give indications on external parameters such as network speed, latency and packet loss percentage using only raw system logs as input data. GRU, 1D-CNN (1-Dimensional Convolutional Neural Network) and Transformer's Encoder are the three models that were tested, and the best performing model was shown to produce correct patterns even on the real world system logs. / I takt med att moderna system ökar i komplexitet så blir de även svårare att felsöka och reparera när det uppstår problem. Ett område där det är mycket viktigt att maskiner och utrustning fungerar korrekt är inom medicinsk IT, där teknik används för att förbättra hälso- och sjukvården för människor över hela världen. Syftet med denna avhandling är att bidra till att minska tiden som kritisk livräddande utrustning inte fungerar genom att implementera automatisk analys av systemloggarna som utan hjälp av experter inom området kan ge en indikation på vilket tillstånd som systemet befinner sig i. Först genomfördes en litteraturstudie där tre lovande typer av neurala nätverk valdes ut. Sedan implementerades dessa nätverk och det sattes upp en datapipeline för insamling och märkning av träningsdata. Efter att ha tränat nätverken och testat dem på en separat datamängd så visade det sig att den bäst presterande modellen av de tre var baserad på GRU (Gated Recurrent Unit). Slutligen testades denna modell på riktiga systemloggar från två olika sjukhus, ett utan kända problem och ett där bilder importerades långsamt på grund av nätverksproblem. Resultaten visade på att det är möjligt att konstruera ett system som kan ge indikationer på externa parametrar såsom nätverkshastighet, latens och paketförlust i procent genom att enbart använda systemloggar som indata. De tre modeller som testades var GRU, 1D-CNN (1-Dimensional Convolutional Neural Network) och Transformer's Encoder. Den bäst presterande modellen visade sig kunna producera korrekta mönster även för loggdata från verkliga system. Machine learning GRU 1D-CNN Transformer log analysis parameter estimation regression performance monitoring deep learning troubleshooting support Maskininlärning GRU 1D-CNN Transformer logganalys parameteruppskattning regression prestandaövervakning djupinlärning felsökning support Computer Sciences Datavetenskap (datalogi)
8	Om informationstekniskt bevis Ekfeldt, Jonas January 2016 (has links) Information technology evidence consists of a mix of representations of various applications of digital electronic equipment, and can be brought to the fore in all contexts that result in legal decisions. The occurrence of such evidence in legal proceedings, and other legal decision-making, is a phenomenon previously not researched within legal science in Sweden. The thesis examines some of the consequences resulting from the occurrence of information technology evidence within Swedish practical legal and judicial decision-making. The thesis has three main focal points. The first consists of a broad identification of legal problems that information technology evidence entails. The second focal point examines the legal terminology associated with information technology evidence. The third focal point consists of identifying sources of error pertaining to information technology evidence from the adjudicator’s point of view. The examination utilizes a Swedish legal viewpoint from a perspective of the public trust in courts. Conclusions include a number of legal problems in several areas, primarily in regards to the knowledge of the adjudicator, the qualification of different means of evidence and the consequences of representational evidence upon its evaluation. In order to properly evaluate information technology evidence, judges are – to a greater extent than for other types of evidence – in need of (objective) knowledge supplementary to that provided by parties and their witnesses and experts. Furthermore, the current Swedish evidence terminology has been identified as a complex of problems in and of itself. The thesis includes suggestions on certain additions to this terminology. Several sources of error have been identified as being attributable to different procedures associated with the handling of information technology evidence, in particular in relation to computer forensic investigations. There is a general need for future research focused on matters regarding both standards of proof for and evaluation of information technology evidence. In addition, a need for deeper legal scientific studies aimed at evidence theory has been identified, inter alia regarding the extent to which frequency theories are applicable in respect to information technology evidence. The need for related further discussions on future emerging areas such as negative evidence and predictive evidence are foreseen. accountability audit trail authenticity computer evidence criminal procedure data quality digital evidence digital images digital traces electronic evidence evaluation of evidence evidence law evidence terminology evidential value evidentiary facts forensic evidence handling of evidence ict-evidence informatics information quality information representation information security it-evidence law and information technology legal evidence log log analysis machine identity means of evidence personal identity reliability representation information screen dump sources of error traceability validity autenticitet bevisfakta bevismedel bevishantering bevisprövning bevisrätt bevisterminologi bevisvärdering datakvalitet digitala bevis digitala bilder digitala spår elektroniska bevis elektroniska spår felkällor filer forensiska bevis ikt informatik informationskvalitet informationsrepresentation informationsteknik informationstekniskt bevis it-bevis it-säkerhet juridiskt bevis logganalys maskinell identitet personell identitet principen om fri bevisprövning representationsinformation riktighet rättsinformatik skärmdump spårbarhet straffprocessrätt

Search results