Global ETD Search

301	Video Traffic Classification : A Machine Learning approach with Packet Based Features using Support Vector Machine / Videotrafikklassificering : En Maskininlärningslösning med Paketbasereade Features och Supportvektormaskin Westlinder, Simon January 2016 (has links) Internet traffic classification is an important field which several stakeholders are dependent on for a number of different reasons. Internet Service Providers (ISPs) and network operators benefit from knowing what type of traffic that propagates over their network in order to correctly treat different applications. Today Deep Packet Inspection (DPI) and port based classification are two of the more commonly used methods in order to classify Internet traffic. However, both of these techniques fail when the traffic is encrypted. This study explores a third method, classifying Internet traffic by machine learning in which the classification is realized by looking at Internet traffic flow characteristics instead of actual payloads. Machine learning can solve the inherent limitations that DPI and port based classification suffers from. In this study the Internet traffic is divided into two classes of interest: Video and Other. There exist several machine learning methods for classification, and this study focuses on Support Vector Machine (SVM) to classify traffic. Several traffic characteristics are extracted, such as individual payload sizes and the longest consecutive run of payload packets in the downward direction. Several experiments using different approaches are conducted and the achieved results show that overall accuracies above 90% are achievable. / HITS, 4707 Supervised Machine Learning SVM Video traffic classification Computer Sciences Datavetenskap (datalogi)
302	Ichthyoplankton Classification Tool using Generative Adversarial Networks and Transfer Learning Aljaafari, Nura 15 April 2018 (has links) The study and the analysis of marine ecosystems is a significant part of the marine science research. These systems are valuable resources for fisheries, improving water quality and can even be used in drugs production. The investigation of ichthyoplankton inhabiting these ecosystems is also an important research field. Ichthyoplankton are fish in their early stages of life. In this stage, the fish have relatively similar shape and are small in size. The currently used way of identifying them is not optimal. Marine scientists typically study such organisms by sending a team that collects samples from the sea which is then taken to the lab for further investigation. These samples need to be studied by an expert and usually end needing a DNA sequencing. This method is time-consuming and requires a high level of experience. The recent advances in AI have helped to solve and automate several difficult tasks which motivated us to develop a classification tool for ichthyoplankton. We show that using machine learning techniques, such as generative adversarial networks combined with transfer learning solves such a problem with high accuracy. We show that using traditional machine learning algorithms fails to solve it. We also give a general framework for creating a classification tool when the dataset used for training is a limited dataset. We aim to build a user-friendly tool that can be used by any user for the classification task and we aim to give a guide to the researchers so that they can follow in creating a classification tool. Deep learning transfer learning ichthyoplankton semi-supervised learning marine Generative adversarial Networks
303	Learning in the Presence of Skew and Missing Labels Through Online Ensembles and Meta-reinforcement Learning Vafaie, Parsa 07 September 2021 (has links) Data streams are large sequences of data, possibly endless and temporarily ordered, that are common-place in Internet of Things (IoT) applications such as intrusion detection in computer networking, fraud detection in financial institutions, real-time tumor tracking in radiotherapy and social media analysis. Algorithms learning from such streams need to be able to construct near real-time models that continuously adapt to potential changes in patterns, in order to retain high performance throughout the stream. It follows that there are numerous challenges involved in supervised learning (or so-called classification) in such environments. One of the challenges in learning from streams is multi-class imbalance, in which the rates of instances in the different class labels differ substantially. Notably, classification algorithms may become biased towards the classes with more frequent instances, sacrificing the performance of the less frequent or so-called minority classes. Further, minority instances often arrive infrequently and in bursts, making accurate model construction problematic. For example, network intrusion detection systems must be able to distinguish between normal traffic and multiple minority classes corresponding to a variety of different types of attacks. Further, having labels for all instances are often infeasible, since we might have missing or late-arriving labels. For instance, when learning from a stream regarding the task of detecting network intrusions, the true label for all instances might not be available, or it might take time until the label is made available, especially for new types of attacks. In this thesis, we contribute to the advancements of online learning from evolving streams by focusing on the above-mentioned areas of multi-class imbalance and missing labels. First, we introduce a multi-class online ensemble algorithm designed to maintain a balanced performance over all classes. Specifically, our approach samples instances with replacement while dynamically increasing the weights of under-represented classes, in order to produce models that benefit all classes. Our experimental results show that our online ensemble method performs well against multi-class imbalanced data in various datasets. We further continue our study by introducing an approach to dealing with missing labels that utilize both labelled and unlabelled data to increase a model’s performance. That is, our method utilizes labelled data for pseudo-labelling unlabelled instances, allowing the model to perform better in environments where labels are scarce. More specifically, our approach features a meta-reinforcement learning agent, trained on multiple-source streams, that can effectively select the prediction of a K nearest neighbours (K-NN) classifier as the label for unlabelled instances. Extensive experiments on benchmark datasets demonstrate the value and effectiveness of our approach and confirm that our method outperforms state-of-the-art. Machine learning Data streams Imbalanced learning Semi-supervised learning Meta-learning
304	Optimization of Insert-Tray Matching using Machine Learning Hedberg, Karolina January 2021 (has links) The manufacturing process of carbide inserts at Sandvik Coromant consists of several operations. During some of these, the inserts are positioned on trays. For some inserts the trays are pre-defined but for others the insert-tray matching is partly improvised. The goal of this thesis project is to examine whether machine learning can be used to predict which tray to use for a given insert. It is also investigated which insert features are determining for the choice of tray. The study is done with insert and tray data from four blasting operations and considers a set of standardized inserts since it is assumed that the tray matching for these is well tuned. The algorithm that is used for the predictions is the supervised learning algorithm k-nearest neighbors. The problem of identifying the determining features is regarded as a feature selection problem and is done with the ReliefF algorithm. From the classification results it is seen that the classifiers are overfitting. The main reason for this is probably that the datasets contain features that together are uniquely defining for which tray is used. This was not detected during the feature selection since ReliefF identifies features that are individually relevant to the output. An idea to avoid overfitting the classifiers is to exclude these defining features from the dataset. Further work is thus recommended. Machine learning Supervised learning Feature selection Computer and Information Sciences Data- och informationsvetenskap
305	Identifying Crime Hotspot: Evaluating the suitability of Supervised and Unsupervised Machine learning Hussein, Abdul Aziz 05 October 2021 (has links) No description available. Information Technology Crime hotspots machine learning supervised learning unsupervised learning classification clustering
306	Object Detection and Semantic Segmentation Using Self-Supervised Learning Gustavsson, Simon January 2021 (has links) In this thesis, three well known self-supervised methods have been implemented and trained on road scene images. The three so called pretext tasks RotNet, MoCov2, and DeepCluster were used to train a neural network self-supervised. The self-supervised trained networks where then evaluated on different amount of labeled data on two downstream tasks, object detection and semantic segmentation. The performance of the self-supervised methods are compared to networks trained from scratch on the respective downstream task. The results show that it is possible to achieve a performance increase using self-supervision on a dataset containing road scene images only. When only a small amount of labeled data is available, the performance increase can be substantial, e.g., a mIoU from 33 to 39 when training semantic segmentation on 1750 images with a RotNet pre-trained backbone compared to training from scratch. However, it seems that when a large amount of labeled images are available (>70000 images), the self-supervised pretraining does not increase the performance as much or at all. Self-supervised learning Computer vision
307	Self-supervised učení v aplikacích počítačového vidění / Self-supervised learning in computer vision applications Vančo, Timotej January 2021 (has links) The aim of the diploma thesis is to make research of the self-supervised learning in computer vision applications, then to choose a suitable test task with an extensive data set, apply self-supervised methods and evaluate. The theoretical part of the work is focused on the description of methods in computer vision, a detailed description of neural and convolution networks and an extensive explanation and division of self-supervised methods. Conclusion of the theoretical part is devoted to practical applications of the Self-supervised methods in practice. The practical part of the diploma thesis deals with the description of the creation of code for working with datasets and the application of the SSL methods Rotation, SimCLR, MoCo and BYOL in the role of classification and semantic segmentation. Each application of the method is explained in detail and evaluated for various parameters on the large STL10 dataset. Subsequently, the success of the methods is evaluated for different datasets and the limiting conditions in the classification task are named. The practical part concludes with the application of SSL methods for pre-training the encoder in the application of semantic segmentation with the Cityscapes dataset.
308	How to annotate in video for training machine learning with a good workflow Jakob, Persson January 2021 (has links) Artificial intelligence and machine learning is used in a lot of different areas, one of those areas is image recognition. In the production of a TV-show or film, image recognition can be used to help the editors to find specific objects, scenes, or people in the video content, which speeds up the production. But image recognition is not working perfect all the time and can not be used in the production of a TV-show or film as it is intended to. Therefore the image recognition algorithms needs to be trained on large datasets to become better. But to create these datasets takes time and tools that can let users create specific datasets and retrain algorithms to become better is needed. The aim of this master thesis was to investigate if it was possible to create a tool that can annotate objects and people in video content and using the data as training sets, and a tool that can retrain the output of an image recognition to make the image recognition become better. It was also important that the tools have a good workflow for the users. The study consisted of a theoretical study to gain more knowledge about annotation, and how to make a good UX-design with a good workflow. Interviews were also held to get more knowledge of what the requirements of the product was. It resulted in a user scenario and a workflow that was used together with the knowledge from the theoretical study to create a hi-fi prototype by using an iterative process with usability testing. This resulted in a final hi-fi prototype with a good design and a good workflow for the users, where it is possible to annotate objects and people with a bounding box, and where it is possible to retrain an image recognition program that has been used on video content. / Artificiell intelligens och maskininlärning används inom många olika områden, ett av dessa områden är bildigenkänning. Vid produktionen av ett TV-program eller av en film kan bildigenkänning användas för att hjälpa redigerarna att hitta specifika objekt, scener eller personer i videoinnehållet, vilket påskyndar produktionen. Men bildigenkänningsprogram fungerar inte alltid helt perfekt och kan inte användas i produktionen av ett TV-program eller film som det är tänkt att användas i det sammanhanget. För att förbättra bildigenkänningsprogram så behöver dess algoritm tränas på stora datasets av bilder och labels. Men att skapa dessa datasets tar tid och det behövs program som kan skapa datasets och återträna algoritmer för bildigenkänning så att de fungerar bättre. Syftet med detta examensarbete var att undersöka om det var möjligt att skapa ett verktyg som kan markera(annotera) objekt och personer i video och använda datat som träningsdata för algoritmer. Men även att skapa ett verktyg som kan återträna algoritmer för bildigenkänning så att de blir bättre utifrån datat man får från ett bildigenkänningprogram. Det var också viktigt att dessa verktyg hade ett bra arbetsflöde för användarna. Studien bestod av en teoretisk studie för att få mer kunskap om annoteringar i video och hur man skapar bra UX-design med ett bra arbetsflöde. Intervjuer hölls också för att få mer kunskap om kraven på produkten och vilka som skulle använda den. Det resulterade i ett användarscenario och ett arbetsflöde som användes tillsammans med kunskapen från den teoretiska studien för att skapa en hi-fi prototyp, där en iterativ process med användbarhetstestning användes. Detta resulterade i en slutlig hi-fi prototyp med bra design och ett bra arbetsflöde för användarna där det är möjligt att markera(annotera) objekt och personer med en bounding box och där det är möjligt att återträna algoritmer för bildigenkänning som har körts på video. Video annotation tool Machine learning Logger User experience Supervised learning Interaction Technologies Interaktionsteknik
309	Arbres de décision et forêts aléatoires pour variables groupées / Decisions trees and random forests for grouped variables Poterie, Audrey 18 October 2018 (has links) Dans de nombreux problèmes en apprentissage supervisé, les entrées ont une structure de groupes connue et/ou clairement identifiable. Dans ce contexte, l'élaboration d'une règle de prédiction utilisant les groupes plutôt que les variables individuelles peut être plus pertinente tant au niveau des performances prédictives que de l'interprétation. L'objectif de la thèse est de développer des méthodes par arbres adaptées aux variables groupées. Nous proposons deux approches qui utilisent la structure groupée des variables pour construire des arbres de décisions. La première méthode permet de construire des arbres binaires en classification. Une coupure est définie par le choix d'un groupe et d'une combinaison linéaire des variables du dit groupe. La seconde approche, qui peut être utilisée en régression et en classification, construit un arbre non-binaire dans lequel chaque coupure est un arbre binaire. Ces deux approches construisent un arbre maximal qui est ensuite élagué. Nous proposons pour cela deux stratégies d'élagage dont une est une généralisation du minimal cost-complexity pruning. Les arbres de décision étant instables, nous introduisons une méthode de forêts aléatoires pour variables groupées. Outre l'aspect prédiction, ces méthodes peuvent aussi être utilisées pour faire de la sélection de groupes grâce à l'introduction d'indices d'importance des groupes. Ce travail est complété par une partie indépendante dans laquelle nous nous plaçons dans un cadre d'apprentissage non supervisé. Nous introduisons un nouvel algorithme de clustering. Sous des hypothèses classiques, nous obtenons des vitesses de convergence pour le risque de clustering de l'algorithme proposé. / In many problems in supervised learning, inputs have a known and/or obvious group structure. In this context, elaborating a prediction rule that takes into account the group structure can be more relevant than using an approach based only on the individual variables for both prediction accuracy and interpretation. The goal of this thesis is to develop some tree-based methods adapted to grouped variables. Here, we propose two new tree-based approaches which use the group structure to build decision trees. The first approach allows to build binary decision trees for classification problems. A split of a node is defined according to the choice of both a splitting group and a linear combination of the inputs belonging to the splitting group. The second method, which can be used for prediction problems in both regression and classification, builds a non-binary tree in which each split is a binary tree. These two approaches build a maximal tree which is next pruned. To this end, we propose two pruning strategies, one of which is a generalization of the minimal cost-complexity pruning algorithm. Since decisions trees are known to be unstable, we introduce a method of random forests that deals with groups of inputs. In addition to the prediction purpose, these new methods can be also use to perform group variable selection thanks to the introduction of some measures of group importance, This thesis work is supplemented by an independent part in which we consider the unsupervised framework. We introduce a new clustering algorithm. Under some classical regularity and sparsity assumptions, we obtain the rate of convergence of the clustering risk for the proposed alqorithm. Groupe de variable Sélection de groupes de variables Analyse de regroupements Decisions trees Supervised learning (Machine learning) 519.5
310	Injector diagnosis based on engine angular velocity pulse pattern recognition Nyman, David January 2020 (has links) In a modern diesel engine, a fuel injector is a vital component. The injectors control the fuel dosing into the combustion chambers. The accuracy in the fuel dosing is very important as inaccuracies have negative effects on engine out emissions and the controllability. Because of this, a diagnosis that can classify the conditions of the injectors with good accuracy is highly desired. A signal that contains information about the injectors condition, is the engine angular velocity. In this thesis, the classification performance of six common machine learning methods is evaluated. The input to the methods is the engine angular velocity. In addition to the classification performance, also the computational cost of the methods, in a deployed state, is analysed. The methods are evaluated on data from a Scania truck that has been run just like any similar commercial vehicle. The six methods evaluated are: logistic regression, kernel logistic regression, linear discriminant analysis, quadratic discriminant analysis, fully connected neural networks and, convolutional neural networks. The results show that the neural networks achieve the best classification performance. Furthermore, the neural networks also achieve the best classification performance from a, in a deployed state, computational cost effectiveness perspective. Results also indicate that the neural networks can avoid false alarms and maintain high sensitivity. machine learning supervised machine learning neural network internal combustion engine fault diagnosis Engineering and Technology Teknik och teknologier

Search results