Global ETD Search

11	Machine Learning for a Network-based Intrusion Detection System : An application using Zeek and the CICIDS2017 dataset / Maskininlärning för ett Nätverksbaserat Intrångsdetekteringssystem : En tillämpning med Zeek och datasetet CICIDS2017 Gustavsson, Vilhelm January 2019 (has links) Cyber security is an emerging ﬁeld in the IT-sector. As more devices are connected to the internet, the attack surface for hackers is steadily increasing. Network-based Intrusion Detection Systems (NIDS) can be used to detect malicious traﬃc in networks and Machine Learning is an up and coming approach for improving the detection rate. In this thesis the NIDS Zeek is used to extract features based on time and data size from network traﬃc. The features are then analyzed with Machine Learning in Scikit-Learn in order to detect malicious trafﬁc. A 98.58% Bayesian detection rate was achieved for the CICIDS2017 which is about the same level as the results from previous works on CICIDS2017 (without Zeek). The best performing algorithms were K-Nearest Neighbors, Random Forest and Decision Tree. / IT-säkerhet är ett växande fält inom IT-sektorn. I takt med att allt ﬂer saker ansluts till internet, ökar även angreppsytan och risken för IT-attacker. Ett Nätverksbaserat Intrångsdetekteringssystem (NIDS) kan användas för att upptäcka skadlig traﬁk i nätverk och maskininlärning har blivit ett allt vanligare sätt att förbättra denna förmåga. I det här examensarbetet används ett NIDS som heter Zeek för att extrahera parametrar baserade på tid och datastorlek från nätverkstraﬁk. Dessa parametrar analyseras sedan med maskininlärning i Scikit-Learn för att upptäcka skadlig traﬁk. För datasetet CICIDS2017 uppnåddes en Bayesian detection rate på 98.58% vilket är på ungefär samma nivå som resultat från tidigare arbeten med CICIDS2017 (utan Zeek). Algoritmerna som gav bäst resultat var K-Nearest Neighbors, Random Forest och Decision Tree. Machine Learning Flow-based traffic characterization Intrusion Detection System (IDS) Zeek Bro CICIDS2017 Scikit-Learn Maskininlärning Flödesbaserad trafik-karaktärisering Intrångsdetekteringssystem (IDS) Zeek Bro CICIDS2017 Scikit-Learn Other Computer and Information Science Annan data- och informationsvetenskap Information Systems Communication Systems Kommunikationssystem
12	Validierung einer spezialisierten Studiendatenanalyse für Mobilitätsindikatoren durch Desktop-GIS Tümmler, Bartholomeus 03 May 2023 (has links) In dieser Arbeit wurde eine Studiendatenanalyse der TU Berlin zur Analyse von menschlichen Bewegungsdaten der Studie Mobil im Havelland der Charité Berlin anhand von Mobilitätsindikatoren auf Grundlage von zwei Testdatensätzen mithilfe der Desktop-GIS ArcGIS Pro und QGIS validiert. Des Weiteren wurde in dieser Arbeit anhand der Auswertungsergebnisse der Desktop-GIS ArcGIS Pro und QGIS diskutiert, inwieweit sich Analysen von Bewegungsdaten anhand von Mobilitätsindikatoren auch unter einem preissensiblen Anspruch mit einem Open-Source-System wie QGIS off the shelf durchführen lassen. Die Validierung hat ergeben, dass die Studiendatenanalyse der TU Berlin im Vergleich mit den Desktop-GIS gleichwertige und zum Teil sogar höherwertigere Ergebnisse generieren konnte. Vor allem der auf neuartige Verfahren aufbauende Stop & Go Classifier der Studiendatenanalyse der TU Berlin konnte mit seiner Performance bei der Detektion von Verweilorten überzeugen. Somit kann der Studiendatenanalyse der TU Berlin ohne Einschränkungen eine Eignung für die Auswertung der Bewegungsdaten der Studie Mobil im Havelland bescheinigt werden. In Bezug auf den Vergleich der Desktop-GIS kann festgehalten werden, dass solche Analysen mit QGIS möglich sind. Eine Umsetzung mit off the shelf Verfahren ist aber vor allem in Bezug auf den zentralen Aspekt der Detektion von Verweilorten bis dato mit QGIS nicht gewährleistet. Hier muss auf externe Python-Bibliotheken wie MovingPandas oder Scikit-mobility zurückgegriffen werden. / In this paper, a study data analysis of the TU Berlin for the analysis of human movement data of the study Mobil im Havelland based on mobility indicators is validated on the basis of two test data sets using the desktop GIS ArcGIS Pro and QGIS. Furthermore, this paper uses the evaluation results of the desktop GIS ArcGIS Pro and QGIS to discuss the extent to which analyses of movement data using mobility indicators can also be carried out off the shelf with an OSS such as QGIS under a price-sensitive claim. The validation showed that the TU Berlin's study data analysis was able to generate equivalent and in some cases even higher quality results compared to desktop GIS. The performance of the TU Berlin's Stop & Go Classifier, which is based on innovative procedures, was particularly convincing. Thus, the study data analysis of the TU Berlin can be certified without restrictions as suitable for the evaluation of the movement data of the study Mobile in Havelland. With regard to the comparison of desktop GIS, it can be stated that such analyses are possible with QGIS. However, an implementation with off-the-shelf methods is not yet guaranteed with QGIS, especially with regard to the central aspect of the detection of dwelling places. However, external Python libraries such as MovingPandas or scikit-mobility can be used here. info:eu-repo/classification/ddc/526 ddc:526
13	Evaluation of computational methods for data prediction Erickson, Joshua N. 03 September 2014 (has links) Given the overall increase in the availability of computational resources, and the importance of forecasting the future, it should come as no surprise that prediction is considered to be one of the most compelling and challenging problems for both academia and industry in the world of data analytics. But how is prediction done, what factors make it easier or harder to do, how accurate can we expect the results to be, and can we harness the available computational resources in meaningful ways? With efforts ranging from those designed to save lives in the moments before a near field tsunami to others attempting to predict the performance of Major League Baseball players, future generations need to have realistic expectations about prediction methods and analytics. This thesis takes a broad look at the problem, including motivation, methodology, accuracy, and infrastructure. In particular, a careful study involving experiments in regression, the prediction of continuous, numerical values, and classification, the assignment of a class to each sample, is provided. The results and conclusions of these experiments cover only the included data sets and the applied algorithms as implemented by the Python library. The evaluation includes accuracy and running time of different algorithms across several data sets to establish tradeoffs between the approaches, and determine the impact of variations in the size of the data sets involved. As scalability is a key characteristic required to meet the needs of future prediction problems, a discussion of some of the challenges associated with parallelization is included. / Graduate / 0984 / erickson@uvic.ca regression classification evaluation data analysis prediction machine learning supervised learning linear regression support vector machine nearest neighbor logistic regression gaussian naive bayes stochastic gradient descent scikit-learn decision tree
14	3D rekonstrukce z více pohledů kamer / 3D reconstruction from multiple views Sládeček, Martin January 2019 (has links) This thesis deals with the task of three-dimensional scene reconstruction using image data obtained from multiple views. It is assumed that intrinsic parameters of the utilized cameras are known. The theoretical chapters describe the basic priciples of individual reconstruction steps. Variuous possible implementaions of data model suitable for this task are also described. The practical part also includes a comparison of false keypoint correspondence filtering, implementation of polar stereo rectification and comparison of disparity map calculation methods that are bundled with the OpenCV library. In the final portion of the thesis, examples of reconstructed 3D models are presented and discussed.
15	Webový simulátor fotbalových lig a turnajů / Web Simulator of Football Leagues and Championships Urbanczyk, Martin January 2019 (has links) This thesis is about the creation of a simulator of football leagues and championships. I studied the problematics of football competitions and their systems and also about the base of machine learning. There was also an analysis of similar and existing solutions and I took inspiration for my proposal from them. After that, I made the design of the whole simulator structure and of all of its key parts. Then the simulator was implemented and tested. The application allows simulating top five competitions in UEFA club coefficients rating.
16	Data-driven decision support in digital retailing Sweidan, Dirar January 2023 (has links) In the digital era and advent of artificial intelligence, digital retailing has emerged as a notable shift in commerce. It empowers e-tailers with data-driven insights and predictive models to navigate a variety of challenges, driving informed decision-making and strategic formulation. While predictive models are fundamental for making data-driven decisions, this thesis spotlights binary classifiers as a central focus. These classifiers reveal the complexities of two real-world problems, marked by their particular properties. Specifically, binary decisions are made based on predictions, relying solely on predicted class labels is insufficient because of the variations in classification accuracy. Furthermore, prediction outcomes have different costs associated with making different mistakes, which impacts the utility. To confront these challenges, probabilistic predictions, often unexplored or uncalibrated, is a promising alternative to class labels. Therefore, machine learning modelling and calibration techniques are explored, employing benchmark data sets alongside empirical studies grounded in industrial contexts. These studies analyse predictions and their associated probabilities across diverse data segments and settings. The thesis found, as a proof of concept, that specific algorithms inherently possess calibration while others, with calibrated probabilities, demonstrate reliability. In both cases, the thesis concludes that utilising top predictions with the highest probabilities increases the precision level and minimises the false positives. In addition, adopting well-calibrated probabilities is a powerful alternative to mere class labels. Consequently, by transforming probabilities into reliable confidence values through classification with a rejection option, a pathway emerges wherein confident and reliable predictions take centre stage in decision-making. This enables e-tailers to form distinct strategies based on these predictions and optimise their utility. This thesis highlights the value of calibrated models and probabilistic prediction and emphasises their significance in enhancing decision-making. The findings have practical implications for e-tailers leveraging data-driven decision support. Future research should focus on producing an automated system that prioritises high and well-calibrated probability predictions while discarding others and optimising utilities based on the costs and gains associated with the different prediction outcomes to enhance decision support for e-tailers. / <p>The current thesis is a part of the industrial graduate school in digital retailing (INSiDR) at the University of Borås and funded by the Swedish Knowledge Foundation.</p> Digital Retailing Decision Support Probabilistic Prediction Calibration Product Returns Customer Churn Binary Classification Scikit-Learn Other Computer and Information Science Annan data- och informationsvetenskap Computer Sciences Datavetenskap (datalogi) Computer Systems Datorsystem Software Engineering Programvaruteknik Business Administration Företagsekonomi
17	Využití umělé inteligence v technické diagnostice / Utilization of artificial intelligence in technical diagnostics Konečný, Antonín January 2021 (has links) The diploma thesis is focused on the use of artificial intelligence methods for evaluating the fault condition of machinery. The evaluated data are from a vibrodiagnostic model for simulation of static and dynamic unbalances. The machine learning methods are applied, specifically supervised learning. The thesis describes the Spyder software environment, its alternatives, and the Python programming language, in which the scripts are written. It contains an overview with a description of the libraries (Scikit-learn, SciPy, Pandas ...) and methods — K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Trees (DT) and Random Forests Classifiers (RF). The results of the classification are visualized in the confusion matrix for each method. The appendix includes written scripts for feature engineering, hyperparameter tuning, evaluation of learning success and classification with visualization of the result.
18	Improved in silico methods for target deconvolution in phenotypic screens Mervin, Lewis January 2018 (has links) Target-based screening projects for bioactive (orphan) compounds have been shown in many cases to be insufficiently predictive for in vivo efficacy, leading to attrition in clinical trials. Phenotypic screening has hence undergone a renaissance in both academia and in the pharmaceutical industry, partly due to this reason. One key shortcoming of this paradigm shift is that the protein targets modulated need to be elucidated subsequently, which is often a costly and time-consuming procedure. In this work, we have explored both improved methods and real-world case studies of how computational methods can help in target elucidation of phenotypic screens. One limitation of previous methods has been the ability to assess the applicability domain of the models, that is, when the assumptions made by a model are fulfilled and which input chemicals are reliably appropriate for the models. Hence, a major focus of this work was to explore methods for calibration of machine learning algorithms using Platt Scaling, Isotonic Regression Scaling and Venn-Abers Predictors, since the probabilities from well calibrated classifiers can be interpreted at a confidence level and predictions specified at an acceptable error rate. Additionally, many current protocols only offer probabilities for affinity, thus another key area for development was to expand the target prediction models with functional prediction (activation or inhibition). This extra level of annotation is important since the activation or inhibition of a target may positively or negatively impact the phenotypic response in a biological system. Furthermore, many existing methods do not utilize the wealth of bioactivity information held for orthologue species. We therefore also focused on an in-depth analysis of orthologue bioactivity data and its relevance and applicability towards expanding compound and target bioactivity space for predictive studies. The realized protocol was trained with 13,918,879 compound-target pairs and comprises 1,651 targets, which has been made available for public use at GitHub. Consequently, the methodology was applied to aid with the target deconvolution of AstraZeneca phenotypic readouts, in particular for the rationalization of cytotoxicity and cytostaticity in the High-Throughput Screening (HTS) collection. Results from this work highlighted which targets are frequently linked to the cytotoxicity and cytostaticity of chemical structures, and provided insight into which compounds to select or remove from the collection for future screening projects. Overall, this project has furthered the field of in silico target deconvolution, by improving the performance and applicability of current protocols and by rationalizing cytotoxicity, which has been shown to influence attrition in clinical trials.

Search results