Global ETD Search

441	Performance comparison of data mining algorithms for imbalanced and high-dimensional data Rubio Adeva, Daniel January 2023 (has links) Artificial intelligence techniques, such as artificial neural networks, random forests, or support vector machines, have been used to address a variety of problems in numerous industries. However, in many cases, models have to deal with issues such as imbalanced data or high multi-dimensionality. This thesis implements and compares the performance of support vector machines, random forests, and neural networks for a new bank account fraud detection, a use case defined by imbalanced data and high multi-dimensionality. The neural network achieved both the best AUC-ROC (0.889) and the best average precision (0.192). However, the results of the study indicate that the difference between the models’ performance is not statistically significant to reject the initial hypothesis that assumed equal model performances. / Artificiell intelligens, som artificiella neurala nätverk, random forests eller support vector machines, har använts för att lösa en mängd olika problem inom många branscher. I många fall måste dock modellerna hantera problem som obalanserade data eller hög flerdimensionalitet. Denna avhandling implementerar och jämför prestandan hos support vector machines, random forests och neurala nätverk för att upptäcka bedrägerier med nya bankkonton, ett användningsfall som definieras av obalanserade data och hög flerdimensionalitet. Det neurala nätverket uppnådde både den bästa AUC-ROC (0,889) och den bästa genomsnittliga precisionen (0,192). Resultaten av studien visar dock att skillnaden mellan modellernas prestanda inte är statistiskt signifikant för att förkasta den ursprungliga hypotesen som antog lika modellprestanda. Data science neural network random forest support vector machine imbalanced data average precision ROC Datavetenskap neuralt nätverk slumpmässig skog stödvektormaskin obalanserad data medelprecision ROC Computer and Information Sciences Data- och informationsvetenskap
442	Automatic Pronoun Resolution for Swedish / Automatisk pronomenbestämning på svenska Ahlenius, Camilla January 2020 (has links) This report describes a quantitative analysis performed to compare two different methods on the task of pronoun resolution for Swedish. The first method, an implementation of Mitkov’s algorithm, is a heuristic-based method — meaning that the resolution is determined by a number of manually engineered rules regarding both syntactic and semantic information. The second method is data-driven — a Support Vector Machine (SVM) using dependency trees and word embeddings as features. Both methods are evaluated on an annotated corpus of Swedish news articles which was created as a part of this thesis. SVM-based methods significantly outperformed the implementation of Mitkov’s algorithm. The best performing SVM model relies on tree kernels applied to dependency trees. The model achieved an F1-score of 0.76 for the positive class and 0.9 for the negative class, where positives are pairs of pronoun and noun phrase that corefer, and negatives are pairs that do not corefer. / Rapporten beskriver en kvantitativ analys som genomförts för att jämföra två olika metoder för automatisk pronomenbestämning på svenska. Den första metoden, en implementation av Mitkovs algoritm, är en heuristisk metod vilket innebär att pronomenbestämningen görs med ett antal manuellt utformade regler som avser att fånga både syntaktisk och semantisk information. Den andra metoden är datadriven, en stödvektormaskin (SVM) som använder dependensträd och ordvektorer som särdrag. Båda metoderna utvärderades med hjälp av en annoterad datamängd bestående av svenska nyhetsartiklar som skapats som en del av denna avhandling. Den datadrivna metoden överträffade Mitkovs algoritm. Den SVM-modell som ger bäst resultat bygger på trädkärnor som tillämpas på dependensträd. Modellen uppnådde ett F1-värde på 0.76 för den positiva klassen och 0.9 för den negativa klassen, där de positiva datapunkterna utgörs av ett par av pronomen och nominalfras som korefererar, och de negativa datapunkterna utgörs av par som inte korefererar. Pronoun resolution Mitkov’s algorithm Support Vector Machine Supervised learning SVM-Light-TK Tree kernels Dependency trees Word embeddings Pronomenbestämning Mitkovs algoritm Stödvektormaskin Övervakad inlärning SVM-Light-TK Trädkärnor Dependensträd Ordvektorer Computer and Information Sciences Data- och informationsvetenskap
443	Real-time Classification of Multi-sensor Signals with Subtle Disturbances Using Machine Learning : A threaded fastening assembly case study / Realtidsklassificering av multi-sensorsignaler med små störningar med hjälp av maskininlärning : En fallstudie inom åtdragningsmontering Olsson, Theodor January 2021 (has links) Sensor fault detection is an actively researched area and there are a plethora of studies on sensor fault detection in various applications such as nuclear power plants, wireless sensor networks, weather stations and nuclear fusion. However, there does not seem to be any study focusing on detecting sensor faults in the threaded fastening assembly application. Since the threaded fastening tools use torque and angle measurements to determine whether or not a screw or bolt has been fastened properly, faulty measurements from these sensors can have dire consequences. This study aims to investigate the use of machine learning to detect a subtle kind of sensor faults, common in this application, that are difficult to detect using canonical model-based approaches. Because of the subtle and infrequent nature of these faults, a two-stage system was designed. The first component of this system is given sensor data from a tightening and then tries to classify each data point in the sensor data as normal or faulty using a combination of low-pass filtering to generate residuals and a support vector machine to classify the residual points. The second component uses the output from the first one to determine if the complete tightening is normal or faulty. Despite the modest performance of the first component, with the best model having an F1-score of 0.421 for classifying data points, the design showed promising performance for classifying the tightening signals, with the best model having an F1-score of 0.976. These results indicate that there indeed exist patterns in these kinds of torque and angle multi-sensor signals that make machine learning a feasible approach to classify them and detect sensor faults. / Sensorfeldetektering är för nuvarande ett aktivt forskningsområde med mängder av studier om feldetektion i olika applikationer som till exempel kärnkraft, trådlösa sensornätverk, väderstationer och fusionskraft. Ett applikationsområde som inte verkar ha undersökts är det inom åtdragningsmontering. Eftersom verktygen inom åtdragningsmontering använder mätvärden på vridmoment och vinkel för att avgöra om en skruv eller bult har dragits åt tillräckligt kan felaktiga mätvärden från dessa sensorer få allvarliga konsekvenser. Målet med denna studie är att undersöka om det går att använda maskininlärning för att detektera en subtil sorts sensorfel som är vanlig inom åtdragningsmontering och har visat sig vara svåra att detektera med konventionella modell-baserade metoder. I och med att denna typ av sensorfel är både subtila och infrekventa designades ett system bestående av två komponenter. Den första får sensordata från en åtdragning och försöker klassificera varje datapunkt som antingen normal eller onormal genom att uttnyttja en kombination av lågpassfiltrering för att generera residualer och en stödvektormaskin för att klassificera dessa. Den andra komponenten använder resultatet från den första komponenten för att avgöra om hela åtdragningen ska klassificeras som normal eller onormal. Trots att den första komponenten hade ett ganska blygsamt resultat på att klassificera datapunkter så visade systemet som helhet mycket lovande resultat på att klassificera hela åtdragningar. Dessa resultat indikerar det finns mönster i denna typ av sensordata som gör maskininlärning till ett lämpligt verktyg för att klassificera datat och detektera sensorfel. Multivariate time series classification Residual-based fault detection Low-pass filter Support vector machine Threaded fastening assembly Multivariat tidserieklassificering Residual-baserad feldetektering Lågpass-filter Stödvektormaskin Åtdragningsmontering Computer and Information Sciences Data- och informationsvetenskap
444	Efficient Data Driven Multi Source Fusion Islam, Muhammad Aminul 10 August 2018 (has links) Data/information fusion is an integral component of many existing and emerging applications; e.g., remote sensing, smart cars, Internet of Things (IoT), and Big Data, to name a few. While fusion aims to achieve better results than what any one individual input can provide, often the challenge is to determine the underlying mathematics for aggregation suitable for an application. In this dissertation, I focus on the following three aspects of aggregation: (i) efficient data-driven learning and optimization, (ii) extensions and new aggregation methods, and (iii) feature and decision level fusion for machine learning with applications to signal and image processing. The Choquet integral (ChI), a powerful nonlinear aggregation operator, is a parametric way (with respect to the fuzzy measure (FM)) to generate a wealth of aggregation operators. The FM has 2N variables and N(2N − 1) constraints for N inputs. As a result, learning the ChI parameters from data quickly becomes impractical for most applications. Herein, I propose a scalable learning procedure (which is linear with respect to training sample size) for the ChI that identifies and optimizes only data-supported variables. As such, the computational complexity of the learning algorithm is proportional to the complexity of the solver used. This method also includes an imputation framework to obtain scalar values for data-unsupported (aka missing) variables and a compression algorithm (lossy or losselss) of the learned variables. I also propose a genetic algorithm (GA) to optimize the ChI for non-convex, multi-modal, and/or analytical objective functions. This algorithm introduces two operators that automatically preserve the constraints; therefore there is no need to explicitly enforce the constraints as is required by traditional GA algorithms. In addition, this algorithm provides an efficient representation of the search space with the minimal set of vertices. Furthermore, I study different strategies for extending the fuzzy integral for missing data and I propose a GOAL programming framework to aggregate inputs from heterogeneous sources for the ChI learning. Last, my work in remote sensing involves visual clustering based band group selection and Lp-norm multiple kernel learning based feature level fusion in hyperspectral image processing to enhance pixel level classification. data/information fusion multiple kernel learning Choquet integral genetic algorithm support vector machine classification clustering remote sensing missing data band grouping data-driven learning goal programming fuzzy measure fuzzy integral optimization
445	A Spatial-Temporal Contextual Kernel Method for Generating High-Quality Land-Cover Time Series Wehmann, Adam 25 September 2014 (has links) No description available. Remote Sensing Computer Science Geography Geographic Information Science land cover change detection trajectory classification multi-temporal time series spatial temporal contextual classification hierarchical kernel Markovian Markov random field support vector machine LCC LULCC LCLUC MRF SVM MSVC HMSVC
446	Characterization of the Frictional-Shear Damage Properties of Scaffold-Free Engineered Cartilage and Reduction of Damage Susceptibility by Upregulation of Collagen Content Whitney, G. Adam 09 February 2015 (has links) No description available. Biomedical Engineering Engineering Biomedical Research Biomechanics Materials Science engineered cartilage scaffold-free frictional-shear damage biphasic lubrication model collagen upregulation biglycan tribology signal processing compositional-damage model PC-QSM arthritis
447	Estimating Per-pixel Classification Confidence of Remote Sensing Images Jiang, Shiguo 19 December 2012 (has links) No description available. Geographic Information Science Geography Remote Sensing spatial data quality GIS remote sensing image classification classification confidence sample design classification error posterior probability entropy maximum likelihood support vector machine neural network boosted decision tree
448	A Comparative Study of Machine Learning Algorithms Le Fort, Eric January 2018 (has links) The selection of machine learning algorithm used to solve a problem is an important choice. This paper outlines research measuring three performance metrics for eight different algorithms on a prediction task involving under- graduate admissions data. The algorithms that were tested are k-nearest neighbours, decision trees, random forests, gradient tree boosting, logistic regression, naive bayes, support vector machines, and artificial neural net- works. These algorithms were compared in terms of accuracy, training time, and execution time. / Thesis / Master of Applied Science (MASc) Machine Learning Comparative Study Data Science University Admissions Software Engineering Computer Science K-Nearest Neighbours Decision Tree Random Forest Gradient Tree Boosting Logistic Regression Naive Bayes Support Vector Machine Neural Network
449	Grön AI : En analys av maskininlärningsalgoritmers prestanda och energiförbrukning Berglin, Caroline, Ellström, Julia January 2024 (has links) Trots de framsteg som gjorts inom artificiell intelligens (AI) och maskininlärning (ML), uppkommer utmaningar gällande deras miljöpåverkan. Fokuset på att skapa avancerade och träffsäkra modeller innebär ofta att omfattande beräkningsresurser krävs, vilket leder till en hög energiförbrukning. Syftet med detta arbete är att undersöka ämnet grön AI och sambandet mellan prestanda och energiförbrukning hos två ML-algoritmer. De algoritmer som undersöks är beslutsträd och stödvektormaskin (SVM), med hjälp av två dataset: Bank Marketing och MNIST. Prestandan mäts med utvärderingsmåtten noggrannhet, precision, recall och F1-poäng, medan energiförbrukningen mäts med verktyget Intel VTune Profiler. Arbetets resultat visar att en högre prestanda resulterade i en högre energiförbrukning, där SVM presterade bäst men också förbrukade mest energi i samtliga tester. Vidare visar resultatet att optimering av modellerna resulterade både i en förbättrad prestanda men också i en ökad energiförbrukning. Samma resultat kunde ses när ett större dataset användes. Arbetet anses inte bidra med resultat eller riktlinjer som går att generalisera till andra arbeten. Däremot bidrar arbetet med en förståelse och medvetenhet kring miljöaspekterna gällande AI, vilket kan användas som en grund för att undersöka ämnet vidare. Genom en ökad medvetenhet kan ett gemensamt ansvar tas för att utveckla AI-lösningar som inte bara är kraftfulla och effektiva, utan också hållbara. / Despite the advancements made in artificial intelligence (AI) and machine learning (ML), challenges regarding their environmental impact arise. The focus on creating advanced and accurate models often requires extensive computational resources, leading to a high energy consumption. The purpose of this work is to explore the topic of green AI and the relationship between performance and energy consumption of two ML algorithms. The algorithms being evaluated are decision trees and support vector machines (SVM), using two datasets: Bank Marketing and MNIST. Performance is measured using the evaluation metrics accuracy, precision, recall, and F1-score, while energy consumption is measured using the Intel VTune Profiler tool. The results show that higher performance resulted in higher energy consumption, with SVM performing the best but also consuming the most energy in all tests. Furthermore, the results show that optimizing the models resulted in both improved performance and increased energy consumption. The same results were observed when a larger dataset was used. This work is not considered to provide results or guidelines that can be generalized to other studies. However, it contributes to an understanding and awareness of the environmental aspects of AI, which can serve as a foundation for further exploration of the topic. Through increased awareness, shared responsibility can be taken to develop AI solutions that are not only powerful and efficient but also sustainable. Green AI artificial intelligence (AI) machine learning (ML) performance energy consumption decision tree support vector machine (SVM). Grön AI artificiell intelligens (AI) maskininlärning (ML) prestanda energiförbrukning beslutsträd stödvektormaskin (SVM). Software Engineering Programvaruteknik
450	Analýza cytologických snímků / Analysis of cytology images Pavlík, Jan January 2012 (has links) This master’s thesis is focused on automating the process of differential leukocyte count in peripherial blood using image processing. It deals with the design of the processing of digital images - from scanning and image preprocessing, segmentation nucleus and cytoplasm, feature selection and classifier, including testing on a set of images that were scanned in the context of this work. This work introduces used segmentation methods and classification procedures which separate nucleus and the cytoplasm of leukocytes. A statistical analysis is performed on the basis of these structures. Following adequate statistical parameters, a set of features has been chosen. This data then go through a classification process realized by three artificial neural networks. Overall were classified 5 types of leukocytes: neutropfiles, lymphocytes, monocytes, eosinophiles and basophiles. The sensitivity and specificity of the classification made for 4 out of 5 leukocyte types (neutropfiles, lymphocytes, monocytes, eosinophiles) is higher than 90 %. Sensitivity of classiffication basophiles was evaluated at 75 % and specificity at 67 %. The total ability of classification has been tested on 111 leukocytes and was approximately 91% successful. All algorithms were created in the MATLAB program.

Search results