• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 585
  • 295
  • 86
  • 38
  • 15
  • 11
  • 6
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 1180
  • 808
  • 410
  • 291
  • 285
  • 277
  • 203
  • 196
  • 190
  • 140
  • 121
  • 119
  • 119
  • 117
  • 116
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
401

Estimating Market Risk of Private Real Estate Assets

Widigsson, Eric, Wolf-Watz, Björn January 2024 (has links)
This study aims to estimate the market risk of private real estate assets, specifically examining Swedish real estate companies, and seeks to identify the best model for estimating the quarterly squared return. An important assumption in this study is that private real estate assets are assumed to have the same market risk as publicly traded assets, all else being equal. With this assumption, the studied methods can be applied to publicly traded companies and evaluated based on the realized stock returns of these traded companies.  The study examines two primary techniques for estimating the risk of private real estate assets: desmoothing of appraisal based returns and supervised learning on listed peers. Desmoothing is a technique used to estimate new economic returns from smoothed real estate appraisal returns. The original desmoothing method outlined by Geltner (1991) introduces AR desmoothing and is examined along with the MA desmoothing model presented by Getmansky et al. (2004). Performing these desmoothing techniques yields a new time series of returns that can be utilized in an EWMA (Exponentially Weighted Moving Average) estimation for predicting the squared return of the next quarter. The supervised learning on listed peers, on the other hand, is performed by studying similar listed assets and training the ability to predict the squared return based on explanatory variables representing selected key figures of the companies’ financials. Five supervised learning models are examined: Linear Regression, Lasso Regression, Ridge Regression, Elastic Net Regularization, and Random Forest Regression.  The results show that four out of the five supervised learning models are superior to the desmoothing models. In particular, Random Forest Regression, Ridge Regression, and Lasso Regression yield the best estimates of the quarterly squared return. However, since this study assesses risk over a quarterly time period, the lack of data is significant, affecting the statistical confidence of the results.  Although the superiority of the supervised learning models in terms of predicting the squared return is evident, the results from the desmoothing reveal some interesting properties about the techniques. AR desmoothing reduces the disparity between the sample variance of the stock compared to the original NAV time series, whereas MA desmoothing drastically increases the correlation of the desmoothed returns with the stock returns.
402

Semi-supervised learning in exemplar based neural networks

Bharadwaj, Madan 01 October 2003 (has links)
No description available.
403

Μελέτη και σχεδίαση συστήματος ανάλυσης εικόνας κατατμημένου σπερματικού DNA με χρήση τεχνικών υπολογιστικής νοημοσύνης / Study and design of an image analysis system for sperm DNA fragmentation using computational intelligence techniques

Αλμπάνη, Ελένη 13 July 2010 (has links)
Ιατρικές έρευνες έχουν δείξει ότι η ανδρική υπογονιμότητα σχετίζεται άμεσα με την ύπαρξη κατατμημένου DNA στον πυρήνα των σπερματοζωαρίων. Οι διαταραχές στις τιμές της συγκέντρωσης σπερματοζωαρίων, της κινητικότητάς τους, του όγκου της εκσπερμάτισης και στη μορφολογία τους που παρατηρούνται σε ένα σπερμοδιάγραμμα έχουν σα βαθύτερο αίτιο την ύπαρξη κατατμημένου DNA. Το εργαστήριο πειραματικής εμβρυολογίας και ιστολογίας της Ιατρικής Αθηνών χρησιμοποιεί τη μέθοδο TUNEL (deoxynucleotidyl transferase-mediated dUTP nick end labeling) για να σηματοδοτήσει τα άκρα κάθε τμήματος του DNA με χρώμα διαφορετικό από αυτό που χρησιμοποιεί για το υπόλοιπο τμήμα του DNA. Αποτέλεσμα της επεξεργασίας που υφίστανται τα σπερματοζωάρια σε μια αντικειμενοφόρο πλάκα είναι ένα σύνολο από μπλε φθορίζοντα σπερματοζωάρια με πιθανό κόκκινο στο πυρήνα τους, στην περίπτωση που υπάρχει κατατμημένο DNA. Όσο μεγαλύτερος είναι ο βαθμός κατάτμησης, τόσο περισσότερο είναι το κόκκινο και τόσο περισσότερο παθολογικό το σπερματοζωάριο και άρα λιγότερο ικανό να γονιμοποιήσει. Τη διαδικασία της TUNEL ακολουθεί η φωτογράφηση της αντικειμενοφόρου πλάκας με κάμερα υψηλής ανάλυσης και μεγάλης ευαισθησίας, ειδική για εφαρμογές φθορισμού. Στη συνέχεια, οι εικόνες επεξεργάζονται με ειδικό λογισμικό, όπως έχει προταθεί στο «Automatic Analysis of TUNEL assay Microscope Images» από τους Kontaxakis et al. στο 2007 IEEE International Symposium on Signal Processing and Information Technology. Το αποτέλεσμα της επεξεργασίας των εικόνων είναι η ταξινόμηση των αντικειμένων που απεικονίζονται σε ομάδες από α) σπερματοζωάρια μονήρη β) επικαλυπτόμενα και γ) «σκουπίδια» όπως λευκοκύτταρα ή θραύσματα σπερματοζωαρίων. Στη συνέχεια για κάθε μονήρες σπερματοζωάριο γίνεται ο υπολογισμός των κόκκινων και μπλε pixels. Κατ’ αυτό τον τρόπο έχουμε ποσοτικοποιημένη την έκταση του κερματισμού κάθε σπερματοζωαρίου. Στόχος της διπλωματικής εργασίας είναι αρχικά η μελέτη και στη συνέχεια η σχεδίαση και υλοποίηση ενός συστήματος, το οποίο λαμβάνοντας υπόψη τα δεδομένα από την επεξεργασία εικόνας καθώς και δεδομένα που είναι γνωστά από το σπερμοδιάγραμμα, όπως η κινητικότητα και η συγκέντρωση των σπερματοζωαριών, χρησιμοποιώντας τεχνικές της υπολογιστικής νοημοσύνης θα εκπαιδεύεται και θα ταξινομεί αυτόματα ασθενείς ανάλογα με το συνολικό βαθμό κερματισμού του DNA τους. Τέλος, θα υπολογίζει και ένα κατώφλι ή μία περιοχή τιμών άνω της οποίας ένας ασθενής θα χαρακτηρίζεται ως στείρος. Απώτερος στόχος είναι να γίνει όλη η παραπάνω διαδικασία ένας έλεγχος ρουτίνας για τα εργαστήρια που ασχολούνται με την ανδρική υπογονιμότητα και την τεχνητή γονιμοποίηση, προφυλάσσοντας ζευγάρια από άσκοπες και επιβλαβείς για την υγεία της γυναίκας προσπάθειες τεχνητής γονιμοποίησης. / Studies have proven that male infertility is directly connected with the existence of fragmented DNA in sperm nucleus Structural disorders and functional abnormalities are often present in spermatozoa from infertile men, as they are the impact of DNA fragmentation. The histology and embryology laboratory in Medical School in Athens uses the TUNEL assay to mark the edges of DNA helix with color different from the rest of the helix. The result of this procedure is that the human spermatozoa are blue and in the interior of every cell, an area proportional to the degree of the cell DNA fragmentation has been stained in reddish color. The more reddish the area is, the more fragmented the DNA is and the more infertile the patient is. The TUNEL assay is followed by image collection using a camera of high sensitivity appropriate for fluorescence applications. Afterwards, the obtained images are processed as described in “Automatic Analysis of TUNEL assay Microscope Images” at IEEE International Symposium on Signal Processing and Information Technology in 2007. The results of the processing above is image segmentation, shapes classification in 3 groups, solitary spermatozoa, overlapped spermatozoa and debris and at last the area measurement of red pixel for each solitary spermatozoon. This way, we have in numbers how much fragmented the DNA is. This master thesis aims at the study and the design of a system, that taking into consideration the data from the image analysis accompanied by the data from the basic sperm analysis, like sperm concentration and motility, and using computational intelligence techniques, it will be trained and will automatically classify the patients according their DNA fragmentation degree. In the end, it will estimate a threshold or an area of values above which a patient will be considered as infertile. Our ultimate goal is the above procedure to be a routine for the labs that are dealing with male infertility and artificial insemination, so that couples are protected against pointless and prejudicial artificial insemination attempts.
404

Análise de similaridade entre classes e padrões de ativação neuronal. / Analysis of similarity between classes and patterns of neuronal activation.

SARAIVA, Eugênio de Carvalho. 04 April 2018 (has links)
Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-04-04T21:48:36Z No. of bitstreams: 1 EUGÊNIO DE CARVALHO SARAIVA - DISSERTAÇÃO PPGCC 2014..pdf: 2813039 bytes, checksum: 9b76f48c8df4aee95923a8ce5f0385ce (MD5) / Made available in DSpace on 2018-04-04T21:48:36Z (GMT). No. of bitstreams: 1 EUGÊNIO DE CARVALHO SARAIVA - DISSERTAÇÃO PPGCC 2014..pdf: 2813039 bytes, checksum: 9b76f48c8df4aee95923a8ce5f0385ce (MD5) Previous issue date: 2014-07-30 / Há um número crescente de tecnologias que fazem uso de algoritmos de classificação para a automação de tarefas. Em particular, em Neurociências, algoritmos de classificação foram usados para testar hipóteses sobre o funcionamento do sistema nervoso central. No entanto, a relação entre as classes de padrões de ativação neuronal de áreas específicas do cérebro, como resultado de experiências sensoriais tem recebido pouca atenção. No contexto da Neurociência Computacional, este trabalho apresenta uma análise do nível de similaridade entre classes de padrões de ativação neuronal, com o uso das abordagens de aprendizagem não supervisionada e semi-supervisionada, em áreas específicas do cérebro de ratos em contato com objetos, obtidos durante um experimento envolvendo exploração livre de objetos pelos animais. As classes foram definidas de acordo com determinados tratamentos construídos com níveis específicos de um conjunto de 8 fatores (Animal, Região do Cérebro, Objeto ou Par de Objeto, Algoritmo de Agrupamento, Métrica, Bin, Janela e Intervalo de Contato). No total foram analisados 327.680 tratamentos. Foram definidas hipóteses quanto à relação de cada um dos fatores para com o nível de similaridade existente entre os tratamentos. As hipóteses foram verificadas por meio de testes estatísticos entre as distribuições que representavam cada uma das classes. Foram realizados testes de normalidade (Shapiro-Wilk, QQ-plot), análise de variância e um teste para diferenças entre tendência central (Kruskal-Wallis). Com base nos resultados encontrados nos estudos utilizando abordagem não supervisionada, foi inferido que os processos de aquisição e de definição dos padrões de ativação por um observador foram sujeitos a uma quantidade não significativa de ruídos causados por motivos não controláveis. Pela abordagem semisupervisionada, foi observado que nem todos os graus de similaridade entre pares de classes de objetos são iguais a um dado tratamento, o que indicou que a similaridade entre classes de padrões de ativação neuronal é sensível a todos os fatores analisados e fornece evidências da complexidade na codificação neuronal. / There are a growing number of technologies that make use of classification algorithms for automating tasks. In particular, in Neuroscience, classification algorithms were used to test hypotheses about the functioning of the central nervous system. However, the relationship between the classes of patterns of neuronal activation in specific brain areas as a result of sensorial experience has received little attention. In the context of Computational Neuroscience , this paper presents an analysis of the level of similarity between classes of patterns of neuronal activation with the use of learning approaches unsupervised and semi - supervised in specific areas of rat brain in contact with objects , obtained during an experiment involving free exploration of objects by animals. The classes were defined according to certain treatments constructed with specific levels with set of 8 factors (Animal, Brain Region, Object or Pair of Objects, Clustering Algorithm, Metric, Bin, Window and Interval Contact). In total 327.680 treatments were analyzed. Hypotheses regarding the relationship of each of the factors with the existing level of similarity between treatments were defined. The hypotheses were tested through between statistical distributions representing each class tests. The tests applied where the tests for normality (Shapiro-Wilk, QQ–plot), analysis of variance and a test for differences in central tendency (Kruskal-Wallis) were performed. Based on the results found in studies using an unsupervised approach, it was inferred that the process of acquisition and definition of patterns of activation by an observer was not subject to a significant amount of noise caused by uncontrollable reasons. For the semi-supervised approach, it was observed that not all degrees of similarity between pairs of classes of objects are equal to a given treatment, which indicated that the similarity between classes of patterns of neuronal activation is sensitive to all the factors analyzed and provides evidence about the complexity of neuronal coding.
405

Bayes Optimal Feature Selection for Supervised Learning

Saneem Ahmed, C G January 2014 (has links) (PDF)
The problem of feature selection is critical in several areas of machine learning and data analysis such as, for example, cancer classification using gene expression data, text categorization, etc. In this work, we consider feature selection for supervised learning problems, where one wishes to select a small set of features that facilitate learning a good prediction model in the reduced feature space. Our interest is primarily in filter methods that select features independently of the learning algorithm to be used and are generally faster to implement compared to other types of feature selection algorithms. Many common filter methods for feature selection make use of information-theoretic criteria such as those based on mutual information to guide their search process. However, even in simple binary classification problems, mutual information based methods do not always select the best set of features in terms of the Bayes error. In this thesis, we develop a general approach for selecting a set of features that directly aims to minimize the Bayes error in the reduced feature space with respect to the loss or performance measure of interest. We show that the mutual information based criterion is a special case of our setting when the loss function of interest is the logarithmic loss for class probability estimation. We give a greedy forward algorithm for approximately optimizing this criterion and demonstrate its application to several supervised learning problems including binary classification (with 0-1 error, cost-sensitive error, and F-measure), binary class probability estimation (with logarithmic loss), bipartite ranking (with pairwise disagreement loss), and multiclass classification (with multiclass 0-1 error). Our experiments suggest that the proposed approach is competitive with several state-of-the art methods.
406

Bank Customer Churn Prediction : A comparison between classification and evaluation methods

Tandan, Isabelle, Goteman, Erika January 2020 (has links)
This study aims to assess which supervised statistical learning method; random forest, logistic regression or K-nearest neighbor, that is the best at predicting banks customer churn. Additionally, the study evaluates which cross-validation set approach; k-Fold cross-validation or leave-one-out cross-validation that yields the most reliable results. Predicting customer churn has increased in popularity since new technology, regulation and changed demand has led to an increase in competition for banks. Thus, with greater reason, banks acknowledge the importance of maintaining their customer base.   The findings of this study are that unrestricted random forest model estimated using k-Fold is to prefer out of performance measurements, computational efficiency and a theoretical point of view. Albeit, k-Fold cross-validation and leave-one-out cross-validation yield similar results, k-Fold cross-validation is to prefer due to computational advantages.   For future research, methods that generate models with both good interpretability and high predictability would be beneficial. In order to combine the knowledge of which customers end their engagement as well as understanding why. Moreover, interesting future research would be to analyze at which dataset size leave-one-out cross-validation and k-Fold cross-validation yield the same results.
407

Online Unsupervised Domain Adaptation / Online-övervakad domänanpassning

Panagiotakopoulos, Theodoros January 2022 (has links)
Deep Learning models have seen great application in demanding tasks such as machine translation and autonomous driving. However, building such models has proved challenging, both from a computational perspective and due to the requirement of a plethora of annotated data. Moreover, when challenged on new situations or data distributions (target domain), those models may perform inadequately. Such examples are transitioning from one city to another, different weather situations, or changes in sunlight. Unsupervised Domain adaptation (UDA) exploits unlabelled data (easy access) to adapt models to new conditions or data distributions. Inspired by the fact that environmental changes happen gradually, we focus on Online UDA. Instead of directly adjusting a model to a demanding condition, we constantly perform minor adaptions to every slight change in the data, creating a soft transition from the current domain to the target one. To perform gradual adaptation, we utilized state-of-the-art semantic segmentation approaches on increasing rain intensities (25, 50, 75, 100, and 200mm of rain). We demonstrate that deep learning models can adapt substantially better to hard domains when exploiting intermediate ones. Moreover, we introduce a model switching mechanism that allows adjusting back to the source domain, after adaptation, without dropping performance. / Deep Learning-modeller har sett stor tillämpning i krävande uppgifter som maskinöversättning och autonom körning. Att bygga sådana modeller har dock visat sig vara utmanande, både ur ett beräkningsperspektiv och på grund av kravet på en uppsjö av kommenterade data. Dessutom, när de utmanas i nya situationer eller datadistributioner (måldomän), kan dessa modeller prestera otillräckligt. Sådana exempel är övergång från en stad till en annan, olika vädersituationer eller förändringar i solljus. Unsupervised Domain adaptation (UDA) utnyttjar omärkt data (enkel åtkomst) för att anpassa modeller till nya förhållanden eller datadistributioner. Inspirerade av att miljöförändringar sker gradvis, fokuserar vi på Online UDA. Istället för att direkt anpassa en modell till ett krävande tillstånd, gör vi ständigt mindre anpassningar till varje liten förändring i data, vilket skapar en mjuk övergång från den aktuella domänen till måldomänen. För att utföra gradvis anpassning använde vi toppmoderna semantiska segmenteringsmetoder för att öka regnintensiteten (25, 50, 75, 100 och 200 mm regn). Vi visar att modeller för djupinlärning kan anpassa sig betydligt bättre till hårda domäner när man utnyttjar mellanliggande. Dessutom introducerar vi en modellväxlingsmekanism som tillåter justering tillbaka till källdomänen, efter anpassning, utan att tappa prestanda.
408

The Role of Data in Projected Quantum Kernels: The Higgs Boson Discrimination / Datans roll i projicerade kvantkärnor: Higgs Boson-diskriminering

Di Marcantonio, Francesco January 2022 (has links)
The development of quantum machine learning is bridging the way to fault tolerant quantum computation by providing algorithms running on the current noisy intermediate scale quantum devices.However, it is difficult to find use-cases where quantum computers exceed their classical counterpart.The high energy physics community is experiencing a rapid growth in the amount of data physicists need to collect, store, and analyze within the more complex experiments are being conceived.Our work approaches the study of a particle physics event involving the Higgs boson from a quantum machine learning perspective.We compare quantum support vector machine with the best classical kernel method grounding our study in a new theoretical framework based on metrics observing at three different aspects: the geometry between the classical and quantum learning spaces, the dimensionality of the feature space, and the complexity of the ML models.We exploit these metrics as a compass in the parameter space because of their predictive power. Hence, we can exclude those areas where we do not expect any advantage in using quantum models and guide our study through the best parameter configurations.Indeed, how to select the number of qubits in a quantum circuits and the number of datapoints in a dataset were so far left to trial and error attempts.We observe, in a vast parameter region, that the used classical rbf kernel model overtakes the performances of the devised quantum kernels.We include in this study the projected quantum kernel - a kernel able to reduce the expressivity of the traditional fidelity quantum kernel by projecting its quantum state back to an approximate classical representation through the measurement of local quantum systems.The Higgs dataset has been proved to be low dimensional in the quantum feature space meaning that the quantum encoding selected is not enough expressive for the dataset under study.Nonetheless, the optimization of the parameters on all the kernels proposed, classical and quantum, revealed a quantum advantage for the projected kernel which well classify the Higgs boson events and surpass the classical ML model. / Utvecklingen inom kvantmaskininlärning banar vägen för nya algoritmer att lösa krävande kvantberäkningar på dagens brusfyllda kvantkomponenter. Däremot är det en utmaning att finna användningsområden för vilka algoritmer som dessa visar sig mer effektiva än sina klassiska motsvarigheter. Forskningen inom högenergifysik upplever för tillfället en drastisk ökning i mängden data att samla, lagra och analysera inom mer komplexa experiment. Detta arbete undersöker Higgsbosonen ur ett kvantmaskinsinlärningsperspektiv. Vi jämför "quantum support vector machine" med den främsta klassiska metoden med avseende på tre olika metriker: geometrin av inlärningsrummen, dimensionaliteten av egenskapsrummen, och tidskomplexiteten av maskininlärningsmetoderna. Dessa tre metriker används för att förutsäga hur problemet manifesterar sig i parameterrummet. På så vis kan vi utesluta regioner i rummet där kvantalgoritmer inte förväntas överprestera klassiska algoritmer. Det finns en godtycklighet i hur antalet qubits och antalet datapunkter bestämms, och resultatet beror på dessa parametrar.I en utbredd region av parameterrummet observerar vi dock att den klassiska rbf-kärnmodellen överpresterar de studerade kvantkärnorna. I denna studie inkluderar vi en projicerad kvantkärna - en kärna som reducerar det totala kvanttillståndet till en ungefärlig klassisk representation genom att mäta en lokal del av kvantsystemet.Den studerade Higgs-datamängden har visat sig vara av låg dimension i kvantegenskapsrummet. Men optimering av parametrarna för alla kärnor som undersökts, klassiska såväl som kvantmekaniska, visade på ett visst kvantövertag för den projicerade kärnan som klassifierar de undersöka Higgs-händelserna som överstiger de klassiska maskininlärningsmodellerna.
409

[pt] APRENDIZADO SEMI E AUTO-SUPERVISIONADO APLICADO À CLASSIFICAÇÃO MULTI-LABEL DE IMAGENS DE INSPEÇÕES SUBMARINAS / [en] SEMI AND SELF-SUPERVISED LEARNING APPLIED TO THE MULTI-LABEL CLASSIFICATION OF UNDERWATER INSPECTION IMAGE

AMANDA LUCAS PEREIRA 11 July 2023 (has links)
[pt] O segmento offshore de produção de petróleo é o principal produtor nacional desse insumo. Nesse contexto, inspeções submarinas são cruciais para a manutenção preventiva dos equipamentos, que permanecem toda a vida útil em ambiente oceânico. A partir dos dados de imagem e sensor coletados nessas inspeções, especialistas são capazes de prevenir e reparar eventuais danos. Tal processo é profundamente complexo, demorado e custoso, já que profissionais especializados têm que assistir a horas de vídeos atentos a detalhes. Neste cenário, o presente trabalho explora o uso de modelos de classificação de imagens projetados para auxiliar os especialistas a encontrarem o(s) evento(s) de interesse nos vídeos de inspeções submarinas. Esses modelos podem ser embarcados no ROV ou na plataforma para realizar inferência em tempo real, o que pode acelerar o ROV, diminuindo o tempo de inspeção e gerando uma grande redução nos custos de inspeção. No entanto, existem alguns desafios inerentes ao problema de classificação de imagens de inspeção submarina, tais como: dados rotulados balanceados são caros e escassos; presença de ruído entre os dados; alta variância intraclasse; e características físicas da água que geram certas especificidades nas imagens capturadas. Portanto, modelos supervisionados tradicionais podem não ser capazes de cumprir a tarefa. Motivado por esses desafios, busca-se solucionar o problema de classificação de imagens submarinas a partir da utilização de modelos que requerem menos supervisão durante o seu treinamento. Neste trabalho, são explorados os métodos DINO (Self-DIstillation with NO labels, auto-supervisionado) e uma nova versão multi-label proposta para o PAWS (Predicting View Assignments With Support Samples, semi-supervisionado), que chamamos de mPAWS (multi-label PAWS). Os modelos são avaliados com base em sua performance como extratores de features para o treinamento de um classificador simples, formado por uma camada densa. Nos experimentos realizados, para uma mesma arquitetura, se obteve uma performance que supera em 2.7 por cento o f1-score do equivalente supervisionado. / [en] The offshore oil production segment is the main national producer of this input. In this context, underwater inspections are crucial for the preventive maintenance of equipment, which remains in the ocean environment for its entire useful life. From the image and sensor data collected in these inspections,experts are able to prevent and repair damage. Such a process is deeply complex, time-consuming and costly, as specialized professionals have to watch hours of videos attentive to details. In this scenario, the present work explores the use of image classification models designed to help experts to find the event(s) of interest in under water inspection videos. These models can be embedded in the ROV or on the platform to perform real-time inference,which can speed up the ROV, monitor notification time, and greatly reduce verification costs. However, there are some challenges inherent to the problem of classification of images of armored submarines, such as: balanced labeled data are expensive and scarce; the presence of noise among the data; high intraclass variance; and some physical characteristics of the water that achieved certain specificities in the captured images. Therefore, traditional supervised models may not be able to fulfill the task. Motivated by these challenges, we seek to solve the underwater image classification problem using models that require less supervision during their training. In this work, they are explorers of the DINO methods (Self-Distillation with NO labels, self-supervised) anda new multi-label version proposed for PAWS (Predicting View AssignmentsWith Support Samples, semi-supervised), which we propose as mPAWS (multi-label PAWS). The models are evaluated based on their performance as features extractors for training a simple classifier, formed by a dense layer. In the experiments carried out, for the same architecture, a performance was obtained that exceeds by 2.7 percent the f1-score of the supervised equivalent.
410

Automating debugging through data mining / Automatisering av felsökning genom data mining

Thun, Julia, Kadouri, Rebin January 2017 (has links)
Contemporary technological systems generate massive quantities of log messages. These messages can be stored, searched and visualized efficiently using log management and analysis tools. The analysis of log messages offer insights into system behavior such as performance, server status and execution faults in web applications. iStone AB wants to explore the possibility to automate their debugging process. Since iStone does most parts of their debugging manually, it takes time to find errors within the system. The aim was therefore to find different solutions to reduce the time it takes to debug. An analysis of log messages within access – and console logs were made, so that the most appropriate data mining techniques for iStone’s system would be chosen. Data mining algorithms and log management and analysis tools were compared. The result of the comparisons showed that the ELK Stack as well as a mixture between Eclat and a hybrid algorithm (Eclat and Apriori) were the most appropriate choices. To demonstrate their feasibility, the ELK Stack and Eclat were implemented. The produced results show that data mining and the use of a platform for log analysis can facilitate and reduce the time it takes to debug. / Dagens system genererar stora mängder av loggmeddelanden. Dessa meddelanden kan effektivt lagras, sökas och visualiseras genom att använda sig av logghanteringsverktyg. Analys av loggmeddelanden ger insikt i systemets beteende såsom prestanda, serverstatus och exekveringsfel som kan uppkomma i webbapplikationer. iStone AB vill undersöka möjligheten att automatisera felsökning. Eftersom iStone till mestadels utför deras felsökning manuellt så tar det tid att hitta fel inom systemet. Syftet var att därför att finna olika lösningar som reducerar tiden det tar att felsöka. En analys av loggmeddelanden inom access – och konsolloggar utfördes för att välja de mest lämpade data mining tekniker för iStone’s system. Data mining algoritmer och logghanteringsverktyg jämfördes. Resultatet av jämförelserna visade att ELK Stacken samt en blandning av Eclat och en hybrid algoritm (Eclat och Apriori) var de lämpligaste valen. För att visa att så är fallet så implementerades ELK Stacken och Eclat. De framställda resultaten visar att data mining och användning av en plattform för logganalys kan underlätta och minska den tid det tar för att felsöka.

Page generated in 0.09 seconds