• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 244
  • 16
  • 6
  • 5
  • 5
  • 2
  • 2
  • 2
  • 2
  • Tagged with
  • 316
  • 172
  • 154
  • 127
  • 112
  • 111
  • 87
  • 81
  • 77
  • 77
  • 64
  • 59
  • 59
  • 57
  • 56
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Omni SCADA intrusion detection

Gao, Jun 11 May 2020 (has links)
We investigate deep learning based omni intrusion detection system (IDS) for supervisory control and data acquisition (SCADA) networks that are capable of detecting both temporally uncorrelated and correlated attacks. Regarding the IDSs developed in this paper, a feedforward neural network (FNN) can detect temporally uncorrelated attacks at an F1 of 99.967±0.005% but correlated attacks as low as 58±2%. In contrast, long-short term memory (LSTM) detects correlated attacks at 99.56±0.01% while uncorrelated attacks at 99.3±0.1%. Combining LSTM and FNN through an ensemble approach further improves the IDS performance with F1 of 99.68±0.04% regardless the temporal correlations among the data packets. / Graduate
102

Purging Sensitive Data in Logs Using Machine Learning

Ljus, Simon January 2020 (has links)
This thesis investigates how to remove personal data from logs using machine learning when rule-based scripts are not enough and manual scanning is too extensive. Three types of machine learning models were created and compared. One word model using logistic regression, another word model using LSTM and a sentence model also using LSTM. Data logs were cleaned and annotated using rule-based scripts, datasets from various countries and dictionaries from various languages. The created dataset for the sentence based model was imbalanced, and a lite version of data augmentation was applied. A hyperparameter optimization library was used to find the best hyperparameter combination. The models learned the training and the validation set well but did perform worse on the test set consisting of log data from a different server logging other types of data. / Detta examensarbete undersöker om det är möjligt att skapa ett program som automatiskt identifierar och tar bort persondata från dataloggar med hjälp av maskinlärning. Att förstå innebörden av vissa ord kräver också kontext: Banan kan syfta på en banan som man kan äta eller en bana som man kan springa på. Kan en maskinlärningsmodell ta nytta av föregående och efterkommande ord i en sekvens av ord för att få en bättre noggrannhet på om ordet är känsligt eller ej. Typen av data som förekommer i loggarna kan vara bland annat namn, personnummer, användarnamn och epostadress. För att modellen ska kunna lära sig att känna igen datan krävs det att det finns data som är färdigannoterad med facit i hand. Telefonnummer, personnummer och epostadress kan bara se ut på ett visst sätt och behöver nödvändigtvis ingen maskininlärning för att kunna pekas ut. Kan man skapa en generell modell som fungerar på flera typer av dataloggar utan att använda regelbaserade algoritmer. Resultaten visar att den annoterade datan som användes för träning kan ha skiljt allt för mycket från de loggar som har testats på (osedd data), vilket betyder att modellen inte är bra på att generalisera.
103

DEFENDING BERT AGAINST MISSPELLINGS

Nivedita Nighojkar (8063438) 06 April 2021 (has links)
Defending models against Natural Language Processing adversarial attacks is a challenge because of the discrete nature of the text dataset. However, given the variety of Natural Language Processing applications, it is important to make text processing models more robust and secure. This paper aims to develop techniques that will help text processing models such as BERT to combat adversarial samples that contain misspellings. These developed models are more robust than off the shelf spelling checkers.
104

Předpovědi spotřebitelského chování v eshopech / Predicting purchasing intent on ecommerce websites

Vařeka, Marek January 2020 (has links)
This thesis analyzes behavior of customers on an e-commerce website in order to predict whether the customer is willing to buy something or is just window shopping. In addition the secondary model predicts, if the customer is going to leave the e-commerce website in next few clicks. To answer this questions different frameworks are tested. The base model used is the Logit model. The base model is compared with more sophisticated methods in machine learning - with neural networks. The best results were yielded by Recurrent neural network - the Long Short-Term Memory (LSTM). The results of the analysis confirm importance of the click stream data and calculated features that track user behavior on the e-commerce website, type of the page (product, category, information), product variance and category variance. The thesis emphasizes practical implications of this models. Two possible practical implementations are presented. The models are tested in novel ways to see how would they perform if implemented on the real e-commerce website.
105

Multivariate Time Series Prediction for DevOps : A first Step to Fault Prediction of the CI Infrastructure

Wang, Yiran January 2022 (has links)
The continuous integration infrastructure (CI servers) is commonly used as a shared test environment due to the need for collaborative and distributive development for the software products under growing scale and complexity in recent years. To ensure the stability of the CI servers, with the help of the constantly recorded measurement data of the servers, fault prediction is of great interest to software development companies. However, the lack of fault data is a typical challenge in learning the fault patterns directly. Alternatively, predicting the standard observations that represent the normal behavior of the CI servers can be viewed as an initial step toward fault prediction. Faults can then be identified and predicted by studying the difference between observed data and predicted standard data with enough fault data in the future. In this thesis, a long short-term memory (LSTM), a bidirectional LSTM (BiLSTM), and a vector autoregressive (VAR) models are developed. The models are compared on both one-step-ahead prediction and iteratively long-range prediction up to 60 steps (corresponds to 15 minutes for the CI servers analyzed in the thesis). To account for the uncertainties in the predictions, the LSTM-based models are trained to estimate predictive variance. The prediction intervals obtained are then compared with the VAR model. Moreover, since there are many servers in the CI infrastructure, it is of interest to investigate whether a model trained on one server can represent other servers. The investigation is carried out by applying the one-step-ahead LSTM model on a set of other servers and comparing the results. The LSTM model performs the best overall with only slightly better than the VAR model, whereas the BiLSTM model performs the worst in the one-step-ahead prediction. When taking the uncertainties into account, the LSTM model seems to estimate the assumed distribution the best with the highest log-likelihood. For long-range prediction, the VAR model surprisingly performs the best across almost all range lengths. Lastly, when applying the LSTM one-step-ahead model on the other servers, the performance differs from server to server, which indicates that it is less likely to achieve competitive performance when applying the same model on all servers.
106

Parasitic Tracking Mobile Wireless Networks / Parasitisk spårning av mobila trådlösa nätverk

Xu, Bowen January 2021 (has links)
Along with the growth and popularity of mobile networks, users enjoy more convenient connection and communication. However, exposure of user presence in mobile networks is becoming a major concern and motivated a plethora of LPPM Location Privacy Protection Mechanisms (LPPMs) have been proposed and analysed, notably considering powerful adversaries with rich data at their disposal, e.g., mobile network service providers or Location Based Services (LBS). In this thesis, we consider a complementary challenge: exposure of users to their peers or other nearby devices. In other words, we are concerned with devices in the vicinity that happen to eavesdrop (or learn in the context of a peer-to-peer protocol execution) MAC/IP addresses or Bluetooth device names, to link user activities over a large area (e.g., a city), and especially when a small subset of the mobile network devices parasitically logged such encounters, even scattered in space and time, and collaboratively breach user privacy. The eavesdroppers can be honest-but-curious network infrastructures such as wireless routers, base stations, or adversaries equipped with Bluetooth or WiFi sniffers. The goal of this thesis is to simulate location privacy attacks for mobile network and measure the location privacy exposure under these attacks. We consider adversaries with varying capabilities, e.g., number of deployable eavesdroppers in the network and coverage of eavesdropper, and evaluate the effect of such adversarial capabilities on privacy exposure of mobile users. We evaluate privacy exposure with two different metrics, i.e., Exposure Degree and Average Displacement Error (ADE).We use Exposure Degree as a preliminary metric to measure the general coverage of deployed eavesdroppers in the considered area. ADE is used to measure the average distance between user’s actual trace points and user’s trajectory predictions. We simulate three attack cases in our scheme. In the first case, we assume the attacker only acquires the collected data from users. We vary the number of receivers to test attack capacity. Exposure Degree is used to evaluate location privacy in this case. For the second and third cases, we assume the attacker also has some knowledge about users’ history traces. Thus, the attacker can utilize machine learning models to make prediction about user’s trace. We leverage Long Short-Term Memory (LSTM) neural network and Hidden Markov Model (HMM) to conduct real-time prediction and Heuristic LSTM to reconstruct more precise user trajectories. ADE is used to evaluate the degree of location privacy exposure in this cases. The experiment results show that LSTM performs better than HMM on trace prediction in our scheme. Higher number of eavesdroppers would decrease the ADE of LSTM model (increase user location privacy exposure). The increase of communication range of receiver can decrease ADE but will incur ADE increase if communication range successively increases. The Heuristic LSTM model performs better than LSTM to abuse user location privacy under the situation that the attacker reconstructs more precise users trajectories based on the in-complete observed trace sequence. / Tillsammans med mobilnätens tillväxt och popularitet, njuter användarna av bekvämare anslutning och kommunikation. Exponering av användarnas närvaro i mobilnät blir emellertid ett stort bekymmer och motiverade en uppsjö av Location Privacy Protection Mechanisms (LPPM) har föreslagits och analyserats, särskilt med tanke på kraftfulla motståndare med rik data till sitt förfogande, t.ex. mobila nätverksleverantörer eller Platsbaserade tjänster (LBS). I denna avhandling betraktar vi en kompletterande utmaning: exponering av användare för sina kamrater eller andra närliggande enheter. Med andra ord, vi är bekymrade över enheter i närheten som råkar avlyssna (eller lära sig i samband med exekvering av peer-to-peer-protokoll) MAC/IP-adresser eller Bluetooth-enhetsnamn, för att länka användaraktiviteter över ett stort område ( t.ex. en stad), och särskilt när en liten delmängd av mobilnätverksenheterna parasitiskt loggar sådana möten, till och med spridda i rymden och tiden, och tillsammans kränker användarnas integritet. Avlyssningarna kan vara ärliga men nyfikna nätverksinfrastrukturer som trådlösa routrar, basstationer eller motståndare utrustade med Bluetooth eller WiFi-sniffare. Målet med denna avhandling är att simulera platssekretessattacker för mobilnät och mäta platsens integritetsexponering under dessa attacker. Vi betraktar motståndare med varierande kapacitet, t.ex. antalet utplacerbara avlyssnare i nätverket och täckning av avlyssning, och utvärderar effekten av sådana motståndaregenskaper på mobilanvändares integritetsexponering. Vi utvärderar integritetsexponering med två olika mått, dvs. exponeringsgrad och genomsnittligt förskjutningsfel (ADE). Vi använder exponeringsgrad som ett preliminärt mått för att mäta den allmänna täckningen av utplacerade avlyssnare i det aktuella området. ADE används för att mäta det genomsnittliga avståndet mellan användarens faktiska spårpunkter och användarens banprognoser. Vi simulerar tre attackfall i vårt schema. I det första fallet antar vi att angriparen bara hämtar insamlad data från användare. Vi varierar antalet mottagare för att testa attackkapacitet. Exponeringsgrad används i detta fall för att utvärdera sekretess på plats. För det andra och tredje fallet antar vi att angriparen också har viss kunskap om användares historikspår. Således kan angriparen använda maskininlärningsmodeller för att förutsäga användarens spår. Vi utnyttjar Long Short-Term Memory (LSTM) neuralt nätverk och Hidden Markov Model (HMM) för att genomföra förutsägelser i realtid och Heuristic LSTM för att rekonstruera mer exakta användarbanor. ADE används för att utvärdera graden av platsexponering i detta fall. Experimentresultaten visar att LSTM presterar bättre än HMM på spårprognoser i vårt schema. Ett högre antal avlyssnare skulle minska ADE för LSTM -modellen (öka användarplatsens integritetsexponering). Ökningen av mottagarens kommunikationsområde kan minska ADE men kommer att medföra ADE -ökning om kommunikationsområdet successivt ökar. Den heuristiska LSTM-modellen fungerar bättre än LSTM för att missbruka användarplatsens integritet under situationen att angriparen rekonstruerar mer exakta användarbanor baserat på den fullständigt observerade spårningssekvensen.
107

HYBRID DATA-DRIVEN AND PHYSICS-BASED FLIGHT TRAJECTORY PREDICTION IN TERMINAL AIRSPACE

Hansoo Kim (10727661) 30 April 2021 (has links)
<div>With the growing demand of air traffic, it becomes more important and critical than ever to develop advanced techniques to control and monitor air traffic in terms of safety and efficiency. Especially, trajectory prediction can play a significant role on the improvement of the safety and efficiency because predicted trajectory information is used for air traffic management such as conflict detection and resolution, sequencing and scheduling. </div><div><div>In this work, we propose a new framework by integrating</div><div>the two methods, called hybrid data-driven and physics-based trajectory prediction. The proposed algorithm is applied to real air traffic surveillance data to demonstrate its performance.</div></div>
108

Sentiment Analysis of YouTube Public Videos based on their Comments

Kvedaraite, Indre January 2021 (has links)
With the rise of social media and publicly available data, opinion mining is more accessible than ever. It is valuable for content creators, companies and advertisers to gain insights into what users think and feel. This work examines comments on YouTube videos, and builds a deep learning classifier to automatically determine their sentiment. Four Long Short-Term Memory-based models are trained and evaluated. Experiments are performed to determine which deep learning model performs with the best accuracy, recall, precision, F1 score and ROC curve on a labelled YouTube Comment dataset. The results indicate that a BiLSTM-based model has the overall best performance, with the accuracy of 89%. Furthermore, the four LSTM-based models are evaluated on an IMDB movie review dataset, achieving an average accuracy of 87%, showing that the models can predict the sentiment of different textual data. Finally, a statistical analysis is performed on the YouTube videos, revealing that videos with positive sentiment have a statistically higher number of upvotes and views. However, the number of downvotes is not significantly higher in videos with negative sentiment.
109

Forecasting conflict using RNNs

Hellman, Simon January 2021 (has links)
The rise in machine learning has made the subject interesting for new types of uses. This Master thesis implements and evaluates an LSTM-based algorithm on the conflict forecasting problem. Data is structured in country-month pairs, with information about conflict, economy, demography, democracy and unrest. The goal is to forecast the probability of at least one conflict event in a country based on a window of historic information. Results show that the model is not as good as a Random Forest. There are also indications of a lack of data with the network having difficulty performing consistently and with learning curves not flattening. Naive models perform surprisingly well. The conclusion is that the problem needs some restructuring in order to improve performance compared to naive approaches. To help this endeavourpossible paths for future work has been identified.
110

Forecasting alarms using machine learning : Predicting tall oil production at Södra Cell

Korsbakke, Andreas, Lidmark, Joel January 2021 (has links)
Background. Tall oil production at Södra Cell is an important byproduct produced at the facility in Mörrum. This process is monitored using a vast system of interconnected sensors that continuously monitor the system. At this time, these systems are operated under manual control without any guidance from data-driven analysis. Therefore, we propose an integrated alarm detection system based on the sensor data. Objectives. This study investigates the possibility of using a data-driven analysis system to detect decreases in the targeted variable. Three different approaches are investigated and evaluated on their performance to understand how these approaches can be used to improve the production process by predicting the changes of the target value.  Methods. Three quasi-experiments are conducted to understand how well different machine learning methods can predict and be used in the production process of tall oil. Each experiment is executed independently of each other with their own setup. Results. Out of three different machine learning methods that were tested, had neural network perform the best, while the two methods that observe the historical data trends seem to have problems with the specific data set. Conclusions. From this research, it can be stated that a neural network algorithm can accurately predict changes in the chemical production process. There are multiple machine learning algorithms that can further be used to improve production at Södra Cell.

Page generated in 0.0513 seconds