Global ETD Search

41	Near Sets in Set Pattern Classification Uchime, Chidoteremndu Chinonyelum 06 February 2015 (has links) This research is focused on the extraction of visual set patterns in digital images, using relational properties like nearness and similarity measures, as well as descriptive properties such as texture, colour and image gradient directions. The problem considered in this thesis is application of topology in visual set pattern discovery, and consequently pattern generation. A visual set pattern is a collection of motif patterns generated from different unique points called seed motifs in the set. Each motif pattern is a descriptive neighbourhood of a seed motif. Such a neighbourhood is a set of points that are descriptively near a seed motif. A new similarity distance measure based on dot product between image feature vectors was introduced in this research, for image classification with the generated visual set patterns. An application of this approach to pattern generation can be useful in content based image retrieval and image classification. pattern pattern generation pattern recognition proximity space visual set pattern motif pattern seed motif description perception probe function feature vector binary classification K-means CBIR salient similarity measure sensitivity specificity
42	Bank Customer Churn Prediction : A comparison between classification and evaluation methods Tandan, Isabelle, Goteman, Erika January 2020 (has links) This study aims to assess which supervised statistical learning method; random forest, logistic regression or K-nearest neighbor, that is the best at predicting banks customer churn. Additionally, the study evaluates which cross-validation set approach; k-Fold cross-validation or leave-one-out cross-validation that yields the most reliable results. Predicting customer churn has increased in popularity since new technology, regulation and changed demand has led to an increase in competition for banks. Thus, with greater reason, banks acknowledge the importance of maintaining their customer base. The findings of this study are that unrestricted random forest model estimated using k-Fold is to prefer out of performance measurements, computational efficiency and a theoretical point of view. Albeit, k-Fold cross-validation and leave-one-out cross-validation yield similar results, k-Fold cross-validation is to prefer due to computational advantages. For future research, methods that generate models with both good interpretability and high predictability would be beneficial. In order to combine the knowledge of which customers end their engagement as well as understanding why. Moreover, interesting future research would be to analyze at which dataset size leave-one-out cross-validation and k-Fold cross-validation yield the same results. machine learning cross-validation k-fold leave-one-out random forest decision trees k-nearest neighbor logistic regression supervised learning supervised statistical learning binary classification customer churn bank customer churn. Probability Theory and Statistics Sannolikhetsteori och statistik
43	Building Information Extraction and Refinement from VHR Satellite Imagery using Deep Learning Techniques Bittner, Ksenia 26 March 2020 (has links) Building information extraction and reconstruction from satellite images is an essential task for many applications related to 3D city modeling, planning, disaster management, navigation, and decision-making. Building information can be obtained and interpreted from several data, like terrestrial measurements, airplane surveys, and space-borne imagery. However, the latter acquisition method outperforms the others in terms of cost and worldwide coverage: Space-borne platforms can provide imagery of remote places, which are inaccessible to other missions, at any time. Because the manual interpretation of high-resolution satellite image is tedious and time consuming, its automatic analysis continues to be an intense field of research. At times however, it is difficult to understand complex scenes with dense placement of buildings, where parts of buildings may be occluded by vegetation or other surrounding constructions, making their extraction or reconstruction even more difficult. Incorporation of several data sources representing different modalities may facilitate the problem. The goal of this dissertation is to integrate multiple high-resolution remote sensing data sources for automatic satellite imagery interpretation with emphasis on building information extraction and refinement, which challenges are addressed in the following: Building footprint extraction from Very High-Resolution (VHR) satellite images is an important but highly challenging task, due to the large diversity of building appearances and relatively low spatial resolution of satellite data compared to airborne data. Many algorithms are built on spectral-based or appearance-based criteria from single or fused data sources, to perform the building footprint extraction. The input features for these algorithms are usually manually extracted, which limits their accuracy. Based on the advantages of recently developed Fully Convolutional Networks (FCNs), i.e., the automatic extraction of relevant features and dense classification of images, an end-to-end framework is proposed which effectively combines the spectral and height information from red, green, and blue (RGB), pan-chromatic (PAN), and normalized Digital Surface Model (nDSM) image data and automatically generates a full resolution binary building mask. The proposed architecture consists of three parallel networks merged at a late stage, which helps in propagating fine detailed information from earlier layers to higher levels, in order to produce an output with high-quality building outlines. The performance of the model is examined on new unseen data to demonstrate its generalization capacity. The availability of detailed Digital Surface Models (DSMs) generated by dense matching and representing the elevation surface of the Earth can improve the analysis and interpretation of complex urban scenarios. The generation of DSMs from VHR optical stereo satellite imagery leads to high-resolution DSMs which often suffer from mismatches, missing values, or blunders, resulting in coarse building shape representation. To overcome these problems, a methodology based on conditional Generative Adversarial Network (cGAN) is developed for generating a good-quality Level of Detail (LoD) 2 like DSM with enhanced 3D object shapes directly from the low-quality photogrammetric half-meter resolution satellite DSM input. Various deep learning applications benefit from multi-task learning with multiple regression and classification objectives by taking advantage of the similarities between individual tasks. Therefore, an observation of such influences for important remote sensing applications such as realistic elevation model generation and roof type classification from stereo half-meter resolution satellite DSMs, is demonstrated in this work. Recently published deep learning architectures for both tasks are investigated and a new end-to-end cGAN-based network is developed, which combines different models that provide the best results for their individual tasks. To benefit from information provided by multiple data sources, a different cGAN-based work-flow is proposed where the generative part consists of two encoders and a common decoder which blends the intensity and height information within one network for the DSM refinement task. The inputs to the introduced network are single-channel photogrammetric DSMs with continuous values and pan-chromatic half-meter resolution satellite images. Information fusion from different modalities helps in propagating fine details, completes inaccurate or missing 3D information about building forms, and improves the building boundaries, making them more rectilinear. Lastly, additional comparison between the proposed methodologies for DSM enhancements is made to discuss and verify the most beneficial work-flow and applicability of the resulting DSMs for different remote sensing approaches. Digital Surface Models Data Fusion Building Shape Refinement Building Footprint Binary Classification Deep Learning Fully Convolutional Networks Remote Sensing ddc:004 ddc:550 ddc:510
44	Incorporating speaker’s role in classification of text-based dialogues Stålhandske, Therese January 2020 (has links) Dialogues are an interesting type of document, as they contain a speaker role feature not found in other types of texts. Previous work has included incorporating a speaker role dependency in text-generation, but little has been done in the realm of text classification. In this thesis, we incorporate speaker role dependency in a classification model by creating different speaker dependent word representations and simulating a conversation within neural networks. The results show a significant improvement in the performance of the binary classification of dialogues, with incorporated speaker role information. Further, by extracting attention weights from the model, we are given an insight into how the speaker’s role affects the interpretation of utterances, giving an intuitive explanation of our model. / Konversationer är en speciell typ av text, då den innehåller information om talare som inte hittas i andra typer av dokument. Tidigare arbeten har inkluderat en talares roll i generering av text, men lite har gjorts inom textklassificering. I det här arbetet, introducerar vi deltagarens roller till en klassifikationsmodell. Detta görs genom att skapa ordrepresentationer, som är beroende på deltagaren i konversationen, samt simulering av en konversation inom ett neuralt nätverk. Resultaten visar en signifikant förbättring av prestandan i binär klassificering av dialoger, med talares roll inkluderat. Vidare, genom utdragning av attentionvikterna, kan vi få en bättre överblick över hur en talares roll påverkar tolkningen av yttranden, vilket i sin tur ger en mer intuitiv förklaring av vår modell. Natural Language Processing Text classification Binary Classification Conversations Speaker context Hierarchical Attention Networks Attention Role Dependent Classification Model Språkteknologi klassificering av konversationer textklassifisering neurala nätverk Computer and Information Sciences Data- och informationsvetenskap
45	Time Series Analysis and Binary Classification in a Car-Sharing Service : Application of data-driven methods for analysing trends, seasonality, residuals and prediction of user demand / Tidsseriaanalys och binär klassificering i en bildelningstjänst : Applicering av datadrivna metoder för att analysera trender, säsongsvaritoner, residuals samt predicering av användares efterfrågan Uhr, Aksel January 2023 (has links) Researchers have estimated a 20-percentage point increase in the world’s population residing in urban areas between 2011 and 2050. The increase in denser cities results in opportunities and challenges. Two of the challenges concern sustainability and mobility. With the advancement in technology, smart mobility and car-sharing have emerged as a part of the solution. It has been estimated by research that car-sharing reduces toxic emissions and reduces car ownership, thus decreasing the need for private cars to some extent. Despite being a possible solution to the future’s mobility challenges in urban areas, car-sharing providers suffer from profitability issues. To keep assisting society in the transformation to sustainable mobility alternatives in the future, profitability needs to be reached. Two central challenges to address to reach profitability are user segmentation and demand forecasting. This study focuses on the latter problem and the aim is to understand the demand of different car types and car-sharing users’ individual demands. Quantitative research was conducted, namely, time series analysis and binary classification were selected to answer the research questions. It was concluded that there are a trend, seasonality and residual patterns in the time series capturing bookings per car type per week. However, the patterns were not extensive. Subsequently, a random forest was trained on a data set utilizing moving average feature engineering and consisting of weekly bookings of users having at least 33 journeys during an observation period over 66 weeks (N = 1335705). The final model predicted who is likely to use the service in the upcoming week in an attempt to predict individual demand. In terms of metrics, the random forest achieved a score of .89 in accuracy (both classes), .91 in precision (positive class), .73 in recall (positive class) and .82 in F1-score (positive class). We, therefore, concluded that a machine learning model can predict weekly individual demand fairly well. Future research involves further feature engineering and mapping the predictions to business actions. / Forskare har estimerat att världens befolkning som kommer bo i stadsområden kommer öka med 20 procentenheter. Ökningen av mer tätbeboliga städer medför såväl möjligheter som utmaningar. Två av utmaningarna berör hållbarhet och mobilitet. Med teknologiska framsteg har så kallad smart mobilitet och bildelning blivit en del av lösningen. Annan forskning har visat att bildelning minskar utsläpp av skadliga ämnen och minskar ägandet av bilar, vilket därmed till viss del minskar behovet av privata bilar. Trots att det är en möjlig lösning på framtidens mobilitetsutmaningar och behov i stadsområden, lider bildelningstjänster av lönsamhetsproblem. För att fortsätta bidra till samhället i omställningen till hållbara mobilitetsalternativ i framtiden, så måste lönsamhet nås. Två centrala utmaningar för att uppnå lönsamhet är användarsegmentering och efterfrågeprognoser. Denna studie fokuserar på det sistnämnda problemet. Syftet med studien är att förstå efterfrågan på olika typer av bilar samt individuell efterfrågan hos bildelninganvändare. Kvantitativ forskning genomfördes, nämligen tidsserieanalys och binär klassificering för att besvara studiens forskningsfrågor. Efter att ha genomfört statistiska tidsserietester konstaterades det att det finns trender, säsongsvariationer och residualmönster i tidsserier som beskriver bokningar per biltyp per vecka. Dessa mönster var dock inte omfattande. Därefter tränades ett så kallat random forest på en datamängd med hjälp av rörliga medelvärden (eng. moving average). Denna datamängd bestod av veckovisa bokningar från användare som hade minst 33 resor under en observationsperiod på 66 veckor (N = 1335705). Den slutliga modellen förutsade vilka som sannolikt skulle använda tjänsten kommande vecka i ett försök att prognostisera individuell efterfrågan. Med avseende på metriker uppnådde modellen ett resultat på 0,89 i noggrannhet (för båda klasserna), 0,91 i precision (positiva klassen), 0,73 i recall (positiva klassen) och 0,82 i F1-poäng (positiv klass). Vi drog därför slutsatsen att en maskininlärningsmodell kan förutsäga veckovis individuell efterfrågan relativt bra med avseende på dess slutgiltiga användning. Framtida forskning innefattar ytterligare dataselektion, samt kartläggning av prognosen till affärsåtgärder Smart mobility Car-sharing Time series analysis Demand prediction Machine learning Supervised learning Binary classification Random forest Smart mobilitet Bildelning Tidsseriaanalys Efterfrågansprediktering Maskininlärning Väglett lärande Binär klassificering Slumpmässiga skogar Computer and Information Sciences Data- och informationsvetenskap
46	Machine Learning based Predictive Data Analytics for Embedded Test Systems Al Hanash, Fayad January 2023 (has links) Organizations gather enormous amounts of data and analyze these data to extract insights that can be useful for them and help them to make better decisions. Predictive data analytics is a crucial subfield within data analytics that make accurate predictions. Predictive data analytics extracts insights from data by using machine learning algorithms. This thesis presents the supervised learning algorithm to perform predicative data analytics in Embedded Test System at the Nordic Engineering Partner company. Predictive Maintenance is a concept that is often used in manufacturing industries which refers to predicting asset failures before they occur. The machine learning algorithms used in this thesis are support vector machines, multi-layer perceptrons, random forests, and gradient boosting. Both binary and multi-class classifier have been provided to fit the models, and cross-validation, sampling techniques, and a confusion matrix have been provided to accurately measure their performance. In addition to accuracy, recall, precision, f1, kappa, mcc, and roc auc measurements are used as well. The prediction models that are fitted achieve high accuracy. Machine learning Artificial Intelligence Predictive data analytics Embedded test systems Confusion matrix Predictive maintenance Support vector machines Random forest Gradient Boosting Multi-layer perceptron Binary classification Multi-class classification Computer Sciences Datavetenskap (datalogi)
47	Employee Churn Prediction in Healthcare Industry using Supervised Machine Learning / Förutsägelse av Personalavgång inom Sjukvården med hjälp av Övervakad Maskininlärning Gentek, Anna January 2022 (has links) Given that employees are one of the most valuable assets of any organization, losing an employee has a detrimental impact on several aspects of business activities. Loss of competence, deteriorated productivity and increased hiring costs are just a small fraction of the consequences associated with high employee churn. To deal with this issue, organizations within many industries rely on machine learning and predictive analytics to model, predict and understand the cause of employee churn so that appropriate proactive retention strategies can be applied. However, up to this date, the problem of excessive churn prevalent in the healthcare industry has not been addressed. To fill this research gap, this study investigates the applicability of a machine learning-based employee churn prediction model for a Swedish healthcare organization. We start by extracting relevant features from real employee data followed by a comprehensive feature analysis using Recursive Feature Elimination (RFE) method. A wide range of prediction models including traditional classifiers, such as Random Forest, Support Vector Machine and Logistic Regression are then implemented. In addition, we explore the performance of ensemble machine learning model, XGBoost and neural networks, specifically Artificial Neural Network (ANN). The results of this study show superiority of an SVM model with a recall of 94.8% and a ROC-AUC accuracy of 91.1%. Additionally, to understand and identify the main churn contributors, model-agnostic interpretability methods are examined and applied on top of the predictions. The analysis has shown that wellness contribution, employment rate and number of vacations days as well as number of sick day are strong indicators of churn among healthcare employees. / Det sägs ofta att anställda är en verksamhets mest värdefulla tillgång. Att förlora en anställd har därmed ofta skadlig inverkan på flera aspekter av affärsverksamheter. Därtill hör bland annat kompetensförlust, försämrad produktivitet samt ökade anställningskostnader. Dessa täcker endast en bråkdel av konsekvenserna förknippade med en för hög personalomsättningshastighet. För att hantera och förstå hög personalomsättning har många verksamheter och organisationer börjat använda sig av maskininlärning och statistisk analys där de bland annat analyserar beteendedata i syfte att förutsäga personalomsättning samt för att proaktivt skapa en bättre arbetsmiljö där anställda väljer att stanna kvar. Trots att sjukvården är en bransch som präglas av hög personalomsättning finns det i dagsläget inga studier som adresserar detta uppenbara problem med utgångspunkt i maskininlärning. Denna studien undersöker tillämpbarheten av maskininlärningsmodeller för att modellera och förutsäga personalomsättning i en svensk sjukvårdsorganisation. Med utgångspunkt i relevanta variabler från faktisk data på anställda tillämpar vi Recursive Feature Elimination (RFE) som den primära analysmetoden. I nästa steg tillämpar vi flertalet prediktionsmodeller inklusive traditionella klassificerare såsom Random Forest, Support Vector Machine och Logistic Regression. Denna studien utvärderar också hur pass relevanta Neural Networks eller mer specifikt Artificial Neural Networks (ANN) är i syfte att förutse personalomsättning. Slutligen utvärderar vi precisionen av en sammansatt maskininlärningsmodell, Extreme Gradient Boost. Studiens resultat påvisar att SVM är en överlägsen model med 94.8% noggranhet. Resultaten från studien möjliggör även identifiering av variabler som mest bidrar till personalomsättning. Vår analys påvisar att variablerna relaterade till avhopp är friskvårdbidrag, sysselsättningsgrad, antal semesterdagar samt sjuktid är starkt korrelerade med personalomsättning i sjukvården. Employee churn Churn Prediction Predictive modeling Machine learning Deep-Learning Data mining Binary Classification Personalomsättning Avhoppsanalys Prediktiv Modellering Maskininlärning Datautvinning Binär Klassificering Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
48	Data-driven decision support in digital retailing Sweidan, Dirar January 2023 (has links) In the digital era and advent of artificial intelligence, digital retailing has emerged as a notable shift in commerce. It empowers e-tailers with data-driven insights and predictive models to navigate a variety of challenges, driving informed decision-making and strategic formulation. While predictive models are fundamental for making data-driven decisions, this thesis spotlights binary classifiers as a central focus. These classifiers reveal the complexities of two real-world problems, marked by their particular properties. Specifically, binary decisions are made based on predictions, relying solely on predicted class labels is insufficient because of the variations in classification accuracy. Furthermore, prediction outcomes have different costs associated with making different mistakes, which impacts the utility. To confront these challenges, probabilistic predictions, often unexplored or uncalibrated, is a promising alternative to class labels. Therefore, machine learning modelling and calibration techniques are explored, employing benchmark data sets alongside empirical studies grounded in industrial contexts. These studies analyse predictions and their associated probabilities across diverse data segments and settings. The thesis found, as a proof of concept, that specific algorithms inherently possess calibration while others, with calibrated probabilities, demonstrate reliability. In both cases, the thesis concludes that utilising top predictions with the highest probabilities increases the precision level and minimises the false positives. In addition, adopting well-calibrated probabilities is a powerful alternative to mere class labels. Consequently, by transforming probabilities into reliable confidence values through classification with a rejection option, a pathway emerges wherein confident and reliable predictions take centre stage in decision-making. This enables e-tailers to form distinct strategies based on these predictions and optimise their utility. This thesis highlights the value of calibrated models and probabilistic prediction and emphasises their significance in enhancing decision-making. The findings have practical implications for e-tailers leveraging data-driven decision support. Future research should focus on producing an automated system that prioritises high and well-calibrated probability predictions while discarding others and optimising utilities based on the costs and gains associated with the different prediction outcomes to enhance decision support for e-tailers. / <p>The current thesis is a part of the industrial graduate school in digital retailing (INSiDR) at the University of Borås and funded by the Swedish Knowledge Foundation.</p> Digital Retailing Decision Support Probabilistic Prediction Calibration Product Returns Customer Churn Binary Classification Scikit-Learn Other Computer and Information Science Annan data- och informationsvetenskap Computer Sciences Datavetenskap (datalogi) Computer Systems Datorsystem Software Engineering Programvaruteknik Business Administration Företagsekonomi
49	La reconnaissance automatique des brins complémentaires : leçons concernant les habiletés des algorithmes d'apprentissage automatique en repliement des acides ribonucléiques Chasles, Simon 07 1900 (has links) L'acide ribonucléique (ARN) est une molécule impliquée dans de nombreuses fonctions cellulaires comme la traduction génétique et la régulation de l’expression des gènes. Les récents succès des vaccins à ARN témoignent du rôle que ce dernier peut jouer dans le développement de traitements thérapeutiques. La connaissance de la fonction d’un ARN passe par sa séquence et sa structure lesquelles déterminent quels groupes chimiques (et de quelles manières ces groupes chimiques) peuvent interagir avec d’autres molécules. Or, les structures connues sont rares en raison du coût et de l’inefficacité des méthodes expérimentales comme la résonnance magnétique nucléaire et la cristallographie aux rayons X. Par conséquent, les méthodes calculatoires ne cessent d’être raffinées afin de déterminer adéquatement la structure d’un ARN à partir de sa séquence. Compte tenu de la croissance des jeux de données et des progrès incessants de l’apprentissage profond, de nombreuses architectures de réseaux neuronaux ont été proposées afin de résoudre le problème du repliement de l’ARN. Toutefois, les jeux de données actuels et la nature des mécanismes de repliement de l’ARN dressent des obstacles importants à l’application de l’apprentissage statistique en prédiction de structures d’ARN. Ce mémoire de maîtrise se veut une couverture des principaux défis inhérents à la résolution du problème du repliement de l’ARN par apprentissage automatique. On y formule une tâche fondamentale afin d’étudier le comportement d’une multitude d’algorithmes lorsque confrontés à divers contextes statistiques, le tout dans le but d’éviter le surapprentissage, problème dont souffre une trop grande proportion des méthodes publiées jusqu’à présent. / Ribonucleic acid (RNA) is a molecule involved in many cellular functions like translation and regulation of gene expression. The recent success of RNA vaccines demonstrates the role RNA can play in the development of therapeutic treatments. The function of an RNA depends on its sequence and structure, which determine which chemical groups (and in what ways these chemical groups) can interact with other molecules. However, only a few RNA structures are known due to the high cost and low throughput of experimental methods such as nuclear magnetic resonance and X-ray crystallography. As a result, computational methods are constantly being refined to accurately determine the structure of an RNA from its sequence. Given the growth of datasets and the constant progress of deep learning, many neural network architectures have been proposed to solve the RNA folding problem. However, the nature of current datasets and RNA folding mechanisms hurdles the application of statistical learning to RNA structure prediction. Here, we cover the main challenges one can encounter when solving the RNA folding problem by machine learning. With an emphasis on overfitting, a problem that affects too many of the methods published so far, we formulate a fundamental RNA problem to study the behaviour of a variety of algorithms when confronted with various statistical contexts. Intelligence artificielle Apprentissage automatique Réseau de neurones Classification binaire Surapprentissage Acide ribonucléique Repliement Prédiction de structure Nucléotide Complémentarité Artificial intelligence Machine learning Neural network Binary classification Overfitting Ribonucleic acid Folding Structure prediction Nucleotide Complementarity
50	Realization of Model-Driven Engineering for Big Data: A Baseball Analytics Use Case Koseler, Kaan Tamer 27 April 2018 (has links) No description available. Computer Science

Search results