Global ETD Search

1	Filtrace útoků na odepření služeb / Filtering of denial-of-service attacks Klimeš, Jan January 2019 (has links) This thesis deals with filtering selected DDoS attacks on denial of the service. The the toretical part deals with the problems of general mechanisms used for DDoS attacks, defense mechanisms and mechanisms of detection and filtration. The practical part deals with the filtering of attacks using the iptables and IPS Suricata firewall on the Linux operating system in an experimental workplace using a network traffic generator to verify its functionality and performance, including the statistical processing of output data from filter tools using the Elasticsearch database.
2	Anomaly Detection for Insider Threats : Comparative Evaluation of LSTM Autoencoders, Isolation Forest, and Elasticsearch on Two Datasets. / Anomalidetektion för interna hot : Utvärdering av LSTM-autoencoders, Isolation Forest och Elasticsearch på två dataset Fagerlund, Martin January 2024 (has links) Insider threat detection is one of cybersecurity’s most challenging and costly problems. Anomalous behaviour can take multiple shapes, which puts a great demand on the anomaly detection system. Significant research has been conducted in the area, but the existing experimental datasets’ absence of real data leaves uncertainty about the proposed systems’ realistic performance. This thesis introduces a new insider threat dataset consisting exclusively of events from real users. The dataset is used to evaluate the performance of various anomaly detection system techniques comparatively. Three anomaly detection techniques were evaluated: LSTM autoencoder, isolation forest, and Elasticsearch’s anomaly detection. The dataset’s properties inhibited any hyperparameter tuning of the LSTM autoencoders since the data lacks sufficient positive instances. Therefore, the architecture and hyperparameter settings are taken from the previously proposed research. The implemented anomaly detection models were also evaluated on the commonly used CERT v4.2 insider threat test dataset. The results show that the LSTM autoencoder provides better anomaly detection on the CERT v4.2 dataset regarding the accuracy, precision, recall, F1 score, and false positive rate compared to the other tested models. However, the investigated systems performed more similarly on the introduced dataset with real data. The LSTM autoencoder achieved the best recall, precision, and F1 score, the isolation forest showed almost as good F1 score with a lower false positive rate, and Elasticsearch’s anomaly detection reported the best accuracy and false positive rate. Additionally, the LSTM autoencoder generated the best ROC curve and precision-recall curve. While Elasticsearch’s anomaly detection showed promising results concerning the accuracy, it performed with low precision and was explicitly implemented to detect certain anomalies, which reduced its generalisability. In conclusion, the results show that the LSTM autoencoder is a feasible anomaly detection model for detecting abnormal behaviour in real user-behaviour logs. Secondly, Elasticsearch’s anomaly detection can be used but is better suited for less complex data analysis tasks. Further, the thesis analyzes the introduced dataset and problematizes its application. In the closing chapter, the study provides domains where further research should be conducted. / Interna hot är ett av de svåraste och mest kostsamma problemen inom cybersäkerhet. Avvikande beteende kan anta många olika former vilket innebär stora krav på de system som ska upptäcka dem. Mycket forskning har genomförts i detta område för att tillhandahålla kraftfulla system. Dessvärre saknar de existerande dataseten som används inom forskningen verklig data vilket gör evalueringen av systemens verkliga förmåga osäker. Denna rapport introducerar ett nytt dataset med data enbart från riktiga användare. Datasetet används för att analysera prestandan av tre olika anomalidetektionssystem: LSTM autoencoder, isolation forest och Elasticsearchs inbyggda anomalidetektering. Datasetets egenskaper förhindrade hyperparameterjustering av LSTM autoencoderna då datasetet innehåller för få positiva data punkter. Därav var arkitekturen och hyperparameterinställningar tagna från tidigare forskning. De implementerade modellerna var också jämförda på det välanvända CERT v4.2 datasetet. Resultaten från CERT v4.2 datasetet visade att LSTM autoencodern ger en bättre anomalidetektion än de andra modellerna när måtten noggrannhet, precision, recall, F1 poäng och andel falska positiva användes. När modellerna testades på det introducerade datasetet presterade de mer jämlikt. LSTM autoencodern presterar med bäst recall, precision och F1 poäng medan isolation forest nästan nådde lika hög F1 poäng men med lägre andel falska positiva predikteringar. Elasticsearchs anomalidetektering lyckades nå högst noggrannhet med lägst andel falsk positiva. Dessvärre med låg precision jämfört med de två andra modellerna. Elasticsearchs anomalidetektering var även tvungen att implementeras mer specifikt riktat mot anomalierna den skulle upptäcka vilket gör användningsområdet för den mindre generellt. Sammanfattningsvis visar resultaten att LSTM autoencoders är ett adekvat alternativ för att detektera abnormaliteter i loggar med händelser från riktiga användare. Dessutom är det möjligt till en viss gräns att använda Elasticsearchs anomalidetektering för dessa ändamål men den passar bättre för uppgifter med mindre komplexitet. Utöver modellernas resultat så analyseras det framtagna datasetet och några egenskaper specificeras som försvårar dess användning och trovärdighet. Avslutningsvis så preciseras intressanta relaterade områden där vidare forskning bör ske. Anomaly Detection LSTM autoencoder Elasticsearch Anomalidetektion LSTM-autoencoder Elasticsearch Computer and Information Sciences Data- och informationsvetenskap
3	Resource utilization comparison of Cassandra and Elasticsearch Selander, Nizar January 2019 (has links) Elasticsearch and Cassandra are two of the widely used databases today withElasticsearch showing a more recent resurgence due to its unique full text searchfeature, akin to that of a search engine, contrasting with the conventional querylanguage-based methods used to perform data searching and retrieval operations. The demand for more powerful and better performing yet more feature rich andflexible databases has ever been growing. This project attempts to study how the twodatabases perform under a specific workload of 2,000,000 fixed sized logs and underan environment where the two can be compared while maintaining the results of theexperiment meaningful for the production environment which they are intended for. A total of three benchmarks were carried, an Elasticsearch deployment using defaultconfiguration and two Cassandra deployments, a default configuration a long with amodified one which reflects a currently running configuration in production for thetask at hand. The benchmarks showed very interesting performance differences in terms of CPU,memory and disk space usage. Elasticsearch showed the best performance overallusing significantly less memory and disk space as well as CPU to some degree. However, the benchmarks were done in a very specific set of configurations and a veryspecific data set and workload. Those differences should be considered whencomparing the benchmark results. Databases Benchmark Performance Kubernetes Cassandra Elasticsearch Computer Systems Datorsystem
4	Improving an open source geocoding service / 改进开源地理编码服务 Rooth, Anton January 2018 (has links) There are many map providers on the market today. Anyone who wishes to use a licensed map-service in an application has to pay a license fee. This fee can become a big expense and affect the price that the end costumer has to pay.This thesis has investigated in how to set up and improve an open source geocoding service for it to measure against a licensed map-service. Geocoding is the technique of having an input address and returning a position which consists of a latitude and a longitude coordinate. The investigation has been done by implementing an open source solution as a proof of concept with the goal to answer the question to which extent is it feasible to develop an open source geocoding service to be as fast accurate and complete as a licensed map service.The open source solution has been developed in collaboration with TaxiCaller Nordic AB. In the implementation the Pelias project has been used as a geocoder together with map data from the datasets of OpenStreetMap and Who’s On First and Elasticsearch as the search engine. The work is based on functional, data and performance requirements set by TaxiCaller.The evaluation has shown that most of the requirements set for this work are achieved with the implemented open source geocoding service solution. Examples of these requirements are correctness of the search results and that the address, street, venue or intersection in the search results should be fully specified.The functional requirement to convert an intersection to coordinates is not achieved when the intersection can not be uniquely identified. The performance requirement to search for a venue is not. Also, the data requirement that the postal code in the search results should be fully specified is not achieved.Sometimes but not always a licensed map-service can provide better data. / 目前市场上有许多地图提供商。任何希望在应用程序中使用许可地图服务的人都必须支付许可费。这笔费用可能会成为一笔巨大的费用，并影响最终客户必须支付的价格。本文研究了如何建立和改进开源地理编码服务，以便对许可的地图服务进行测量。地理编码是具有输入地址并返回由纬度和经度坐标组成的位置的技术。调查是通过实施开源解决方案作为概念证明来完成的，其目的是回答开发开源地理编码服务的可行程度，以及与许可地图服务一样快速准确和完整的问题。开源解决方案是与TaxiCaller Nordic AB合作开发的。在实现中，Pelias项目已被用作地理编码器以及来自OpenStreetMap和Who's On First以及Elasticsearch数据集的地图数据作为搜索引擎。这项工作基于TaxiCaller设定的功能，数据和性能要求。评估表明，通过实施的开源地理编码服务解决方案，可以实现为此项工作设置的大部分要求。这些要求的示例是搜索结果的正确性，并且应完全指定搜索结果中的地址，街道，地点或交叉点。当无法唯一地识别交叉点时，不能实现将交叉点转换为坐标的功能要求。搜索场地的性能要求不是。此外，未实现搜索结果中的邮政编码应完全指定的数据要求。有时但并非总是获得许可的地图服务可以提供更好的数据。 Pelias Elasticsearch OpenStreetMap Who’s On First Geocoding Computer Engineering Datorteknik
5	BANDBREDDSREDOVISNING : Sammansättning av bandbreddsövervakningslösning för företagsmiljö / BANDWIDTH ACCOUNTING : Composition of bandwidth monitoring solution for business environment Lindberg, Daniel, Olsson, Kristoffer January 2017 (has links) Företaget Hi5 är i behov av en lösning som kan övervaka bandbreddsanvändningen till ochfrån deras verksamhet. Denna lösning ska kunna mäta använd trafik per IP över tid.Företaget vill ha denna lösning för att i framtiden kunna ha möjligheten att, med data somunderlag, debitera kunder som passerar en avtalad gräns för bandbreddsanvändning.Lösningen ska även ge Hi5 en överblick över all trafik till och från företaget.Denna rapport beskriver sammansättningen av en sådan lösning med hjälp av färdigatjänster och verktyg. Lösningen består av fyra moduler: pmacct, pmacct-to-elasticsearch,Elasticsearch och Grafana. ● Pmacct agerar insamlingsdel som fångar och aggregerar trafikdata.● Pmacct-to-elasticsearch fungerar som en brygga och flyttar data från pmacct till Elasticsearch.● Elasticsearch är backend-delen i lösningen som lagrar och organiserar data iindexfiler.● Grafana, frontend-delen, hämtar data från Elasticsearch och visualiserar denna iform av grafer och tabeller. Resultatet av arbetet är att Hi5 fick en lösning som uppfyllde deras önskemål, där kundersbandbreddsanvändning kan mätas och presenteras i ett användarvänligt gränssnitt. / The company Hi5 is in need of a solution that can monitor the bandwidth usage to and from their operations. This solution should be able to measure used traffic per IP over time. The company wants this solution in order to be able to, in the future, have the data needed available to charge customers who exceeds an agreed upon bandwidth limit. The solution should also give Hi5 an overview of all traffic from and to the company.This report describes the composition of such a solution using completed services and tools. The solution consists of four modules: pmacct, pmacct-to-elasticsearch, Elasticsearch and Grafana. ● Pmacct acts as the collector that catches and aggregates traffic data.● Pmacct-to-elasticsearch works as a bridge and moves the data from pmacct to Elasticsearch.● Elasticsearch is the backend part of the solution that stores and organizes the data in index files.● Grafana, the frontend, collects the data from Elasticsearch and visualizes it in the form of graphs and tables. The result of the project is that Hi5 got a solution that fulfilled their wishes, where customer’s bandwidth usage can be measured and presented in a user-friendly interface. bandwidth accounting pmacct elasticsearch grafana bandbreddsövervakning bandbreddsredovisning Communication Systems Kommunikationssystem
6	Hodnocení e-Word Of Mouth českých bank na Facebooku a webových komentářích / e-Word Of Mouth Evaluation of Czech banks on Facebook and web comments Škola, Petr January 2013 (has links) The diploma thesis analyzes internet discussions and Facebook sites that relate to banks and banking products. This analysis has been prepared for the potential value for marketing purposes, but also for other users outside the field of marketing. The main objective is to propose a regularly administered overview containing metrics and characteristics, which would provide information about the current status of the topics that were discussed at the monitored data sources. At first are described the methods of marketing research regarding to the possibility of obtaining data based from Internet discussions. Next are described the intermediate objectives. The base is description of downloading data from the mentioned sources. There have been identified two web pages and Facebook profiles of five banks, which are the data sources. Another objective is to create Java programs for downloading data and storing them in Elasticsearch. This data is enriched by sentiment analysis of users' comments. The main objective is based on defined metrics and characteristics that will be displayed. Subsequently, the data are analyzed using the proposed visualization in the application Kibana. The resulting data are interpreted, and there is designed a form of their distribution, which is the main objective of this work. The contribution of this work is the description of the processing of data that can be obtained from the website and public Facebook profiles with emphasis on their content, and their further enrichment and finally data visualization designed for wide range of audience.
7	Using clickstream data as implicit feedback in information retrieval systems / Användning av klickströmsdata som implicit återkoppling i informationssökningssystem Johansson, Henrik January 2018 (has links) This Master's thesis project aims to investigate if Wikipedia's clickstream data can be used to improve the retrieval performance of information retrieval systems. The project is conducted under the assumption that a traversal between two article connects the two articles in regards to content. To extract useful terms out of the clickstream data, it needed to be structured so that it given a Wikipedia article it is possible to find all of the in-going or out-going article traversals.The project settled on using the clickstream data in an automatic query expansion approach.Two expansion methods were investigated, one based on expanding with full article title so that the context would be preserved, and the other expanded with individual terms from the article titles.The structure of the data and two proposed methods were evaluated using a set of queries and relevance judgments. The results of the evaluation shows that the method that expands with individual terms performed better than the full article title expansion method and that the individual term method managed to increase the MAP with 11.24%. The expansion method was evaluated on two different query collections, and it was found that the proposed expansion method only improves the results where the average recall of the original queries are low.The thesis conclusion is that the clickstream can be used to improve retrieval performance for an information retrieval system. / Det här examensarbetets mål är att undersöka om Wikipedias klickströmsdata kan användas för att förbättra sökprestanda för informationsökningssystem. Arbetet har utförts under antagandet att en övergång mellan två artiklar på Wikipedia sammankopplar artiklarnas innehåll och är av intresse för användaren. För att kunna utnyttja klickströmsdatan krävs det att den struktureras på ett användbart sätt så att det givet en artikel går att se hur läsare har förflyttat sig ut eller in mot artikeln. Vi valde att utnyttja datamängden genom en automatisk sökfrågeexpansion. Två olika metoder togs fram, där den första expanderar sökfrågan med hela artikeltitlar medans den andra expanderar med enskilda ord ur en artikeltitel.Undersökningens resultat visar att den ordbaserade expansionsmetoden presterar bättre än metoden som expanderar med hela artikeltitlar. Den ordbaserade expansionsmetoden lyckades uppnå en förbättring för måttet MAP med 11.21%. Från arbetet kan man också se att expansionmetoden enbart förbättrar prestandan när täckningen för den ursprungliga sökfrågan är liten. Gällande strukturen på klickströmsdatan så presterade den utgående strukturen bättre än den ingående. Examensarbetets slutsats är att denna klickströmsdata lämpar sig bra för att förbättra sökprestanda för ett informationsökningssystem. query expansion search engine elasticsearch clickstream Computer Sciences Datavetenskap (datalogi)
8	Leveraging Transformer Models and Elasticsearch to Help Prevent and Manage Diabetes through EFT Cues Shah, Aditya Ashishkumar 16 June 2023 (has links) Diabetes in humans is a long-term (chronic) illness that affects how our body converts food into energy. Approximately one in ten individuals residing in the United States is affected with diabetes and more than 90% of those have type 2 diabetes (T2D). Human bodies fail to produce insulin in type 1 diabetes, causing you to take insulin for survival. However, with type 2 diabetes, the body can't use insulin well. A proven way to manage diabetes is through a positive mindset and a healthy lifestyle. Several studies have been conducted at Virginia Tech and the University of Buffalo on discovering different helpful characteristics in a person's day-to-day life, which relate to important events. They consider Episodic Fu- ture Thinking (EFT), where participants identify several events/actions that might occur at multiple future time frames (1 month to 10 years) in text-based descriptions (cues). This re- search aims to detect content characteristics from these EFT cues. However, class imbalance often presents a challenging issue when dealing with such domain-specific data. To mitigate this issue, this research employs Elasticsearch to address data imbalance and enhance the machine learning (ML) pipeline for improved accuracy of predictions. By leveraging Elas- ticsearch and transformer models, this study constructs classifiers and regression models, which can be utilized to identify various content characteristics from the cues. To the best of our knowledge, this work represents the first such attempt to employ natural language processing (NLP) techniques to analyze EFT cues and establish a correlation between those characteristics and their impacts on decision-making and health outcomes. / Master of Science / Diabetes is a serious and long-term illness that impacts how the body converts food into energy. It affects around one in ten individuals residing in the United States, and over 90% of these individuals have type 2 diabetes (T2D). While a positive attitude and healthy lifestyle can help with management of diabetes, it is unclear exactly which mental attitudes most affect health outcomes. To gain a better understanding of this relationship, researchers from Virginia Tech and the University of Buffalo conducted multiple studies on Episodic Future Thinking (EFT), where participants identify several events or actions that could take place in the future. This research uses natural language processing (NLP) to analyze the descriptions of these events (cues) and identify different characteristics that relate to a person's day-to-day life. With the help of Elasticsearch and transformer models, this work handles the data imbalance and improves the model predictions for different categories within cues. Overall, this research has the potential to provide valuable insights that can impact their diabetes risk, potentially leading to better management and prevention strategies and treatments. Natural Language Processing Deep Learning Elasticsearch Language models Diabetes.
9	Use Case Driven Evaluation of Database Systems for ILDA Thapa, Shova 18 November 2022 (has links) No description available. Computer Science Computer Engineering Databases Full-text search MySQL MariaDB MongoDB MongoDB Atlas Atlas Search Elasticsearch Solr Elasticsearch vs. Solr Atlas Search vs. Elasticsearch evaluation use cases ILDA ILDA dictionary dictionary Myaamia Indigenous language search features comparison
10	Utvärdering av sökmotorer i en svensk kontext / Evaluating search engines in a Swedish context Adolfsson, Alexander, Ovesson, Christoffer January 2023 (has links) The focus of this study was to evaluate different search engines on Swedish text. Information retrieval is widely used by both people and organizations, and it is important to be able to efficiently retrieve needed information at the right time. The study determined that relevance and speed are the most important factors in search engines. The evaluation measures the precision and recall which are relevance measurements, and speed of two search engines, Elastic search and MarkLogic. The evaluation has determined that there is no significant difference in the relevance of the retrieved results between the engines. The evaluation has also determined that there is a statistically significant difference in speed between the engines, with Elastic search outperforming MarkLogic. Both search engines performed very well in terms of successful searches, meaning to return a relevant document in the first 20 results. Both engines succeeded in fulfilling the information need 96% of the time. / Fokus för denna studie var att utvärdera olika sökmotorer på svensk text. Informationshämtning används i stor utsträckning av både människor och organisationer, och det är viktigt att effektivt kunna hämta nödvändig information vid rätt tidpunkt. Studien fastställde att relevans och hastighet är de viktigaste faktorerna för sökmotorer. Utvärderingen mäter precision och recall som är relevansmätvärden och responstid som hastighetmätvärde för två sökmotorer, Elasticsearch och MarkLogic. Utvärderingen har visat att det inte finns någon signifikant skillnad i relevansen av de hämtade resultaten mellan motorerna. Utvärderingen har också visat att det finns en statistiskt signifikant skillnad i hastighet mellan motorerna, där Elasticsearch överträffar MarkLogic. Båda sökmotorerna presterade väldigt bra när det gäller lyckade sökningar, vilket innebär att returnera ett relevant dokument i de första 20 resultaten. Båda motorerna lyckas uppfylla informationsbehovet 96% av tiden. Elasticsearch MarkLogic search engine search engine evaluation relevance evaluation precision och recall Elasticsearch MarkLogic sökmotor sökmotorsutvärdering relevansutvärdering precision och recall Information Systems

Search results