281 |
Optimisation of autoencoders for prediction of SNPs determining phenotypes in wheatNair, Karthik January 2021 (has links)
The increase in demand for food has resulted in increased demand for tools that help streamline plant breeding process in order to create new varieties of crops. Identifying the underlying genetic mechanism of favourable characteristics is essential in order to make the best breeding decisions. In this project we have developed a modified autoencoder model which allows for lateral phenotype injection into the latent layer, in order to identify causal SNPs for phenotypes of interest in wheat. SNP and phenotype data for 500 samples of Lantmännen SW Seed provided by Lantmännen was used to train the network. Artificial phenotype created using a single SNP was used during training instead of real phenotype, since the relationship between the phenotype and SNP is already known. The modified training model with lateral phenotype injection showed significant increase in genotype concordance of the artificial phenotype when compared to the control model without phenotype injection. Causal SNP was successfully identified by using concordance terrain graph, where the difference in concordance of individual SNPs between the modified modified model and control model was plotted against the genomic position of each SNP. The model requires further testing to elucidate its behaviour for phenotypes linked to multiple SNPs.
|
282 |
[en] A MIP APPROACH FOR COMMUNITY DETECTION IN THE STOCHASTIC BLOCK MODEL / [pt] UMA ABORDAGEM DE PROGRAMAÇÃO INTEIRA MISTA PARA DETECÇÃO DE COMUNIDADES NO STOCHASTIC BLOCK MODELBRENO SERRANO DE ARAUJO 04 November 2020 (has links)
[pt] O Degree-Corrected Stochastic Block Model (DCSBM) é um modelo popular para geração de grafos aleatórios com estrutura de comunidade, dada uma sequência de graus esperados. O princípio básico de algoritmos que utilizam o DCSBM para detecção de comunidades é ajustar os parâmetros do modelo a dados observados, de forma a encontrar a estimativa de máxima verossimilhança, ou maximum likelihood estimate (MLE), dos parâmetros do modelo. O problema de otimização para o MLE é comumente resolvido por meio de heurísticas. Neste trabalho, propomos métodos de programação matemática, para resolver de forma exata o problema de otimização descrito, e comparamos os métodos propostos com heurísticas baseadas no algoritmo de expectation-maximization (EM). Métodos exatos são uma ferramenta fundamental para a avaliação de heurísticas, já que nos permitem identificar se uma solução heurística é sub-ótima e medir seu gap de otimalidade. / [en] The Degree-Corrected Stochastic Block Model (DCSBM) is a popular model to generate random graphs with community structure given an expected degree sequence. The standard approach of community detection algorithms based on the DCSBM is to search for the model parameters which are the most likely to have produced the observed network data, via maximum likelihood estimation (MLE). Current techniques for the MLE problem are heuristics and therefore do not guarantee convergence to the optimum. We present
mathematical programming formulations and exact solution methods that can provably find the model parameters and community assignments of maximum likelihood given an observed graph. We compare the proposed exact methods with classical heuristic algorithms based on expectation-maximization (EM).
The solutions given by exact methods give us a principled way of recognizing when heuristic solutions are sub-optimal and measuring how far they are from optimality.
|
283 |
Customer segmentation of retail chain customers using cluster analysis / Kundsegmentering av detaljhandelskunder med klusteranalysBergström, Sebastian January 2019 (has links)
In this thesis, cluster analysis was applied to data comprising of customer spending habits at a retail chain in order to perform customer segmentation. The method used was a two-step cluster procedure in which the first step consisted of feature engineering, a square root transformation of the data in order to handle big spenders in the data set and finally principal component analysis in order to reduce the dimensionality of the data set. This was done to reduce the effects of high dimensionality. The second step consisted of applying clustering algorithms to the transformed data. The methods used were K-means clustering, Gaussian mixture models in the MCLUST family, t-distributed mixture models in the tEIGEN family and non-negative matrix factorization (NMF). For the NMF clustering a slightly different data pre-processing step was taken, specifically no PCA was performed. Clustering partitions were compared on the basis of the Silhouette index, Davies-Bouldin index and subject matter knowledge, which revealed that K-means clustering with K = 3 produces the most reasonable clusters. This algorithm was able to separate the customer into different segments depending on how many purchases they made overall and in these clusters some minor differences in spending habits are also evident. In other words there is some support for the claim that the customer segments have some variation in their spending habits. / I denna uppsats har klusteranalys tillämpats på data bestående av kunders konsumtionsvanor hos en detaljhandelskedja för att utföra kundsegmentering. Metoden som använts bestod av en två-stegs klusterprocedur där det första steget bestod av att skapa variabler, tillämpa en kvadratrotstransformation av datan för att hantera kunder som spenderar långt mer än genomsnittet och slutligen principalkomponentanalys för att reducera datans dimension. Detta gjordes för att mildra effekterna av att använda en högdimensionell datamängd. Det andra steget bestod av att tillämpa klusteralgoritmer på den transformerade datan. Metoderna som användes var K-means klustring, gaussiska blandningsmodeller i MCLUST-familjen, t-fördelade blandningsmodeller från tEIGEN-familjen och icke-negativ matrisfaktorisering (NMF). För klustring med NMF användes förbehandling av datan, mer specifikt genomfördes ingen PCA. Klusterpartitioner jämfördes baserat på silhuettvärden, Davies-Bouldin-indexet och ämneskunskap, som avslöjade att K-means klustring med K=3 producerar de rimligaste resultaten. Denna algoritm lyckades separera kunderna i olika segment beroende på hur många köp de gjort överlag och i dessa segment finns vissa skillnader i konsumtionsvanor. Med andra ord finns visst stöd för påståendet att kundsegmenten har en del variation i sina konsumtionsvanor.
|
284 |
Deep Scenario Generation of Financial Markets / Djup scenario generering av finansiella marknaderCarlsson, Filip, Lindgren, Philip January 2020 (has links)
The goal of this thesis is to explore a new clustering algorithm, VAE-Clustering, and examine if it can be applied to find differences in the distribution of stock returns and augment the distribution of a current portfolio of stocks and see how it performs in different market conditions. The VAE-clustering method is as mentioned a newly introduced method and not widely tested, especially not on time series. The first step is therefore to see if and how well the clustering works. We first apply the algorithm to a dataset containing monthly time series of the power demand in Italy. The purpose in this part is to focus on how well the method works technically. When the model works well and generates proper results with the Italian Power Demand data, we move forward and apply the model on stock return data. In the latter application we are unable to find meaningful clusters and therefore unable to move forward towards the goal of the thesis. The results shows that the VAE-clustering method is applicable for time series. The power demand have clear differences from season to season and the model can successfully identify those differences. When it comes to the financial data we hoped that the model would be able to find different market regimes based on time periods. The model is though not able distinguish different time periods from each other. We therefore conclude that the VAE-clustering method is applicable on time series data, but that the structure and setting of the financial data in this thesis makes it to hard to find meaningful clusters. The major finding is that the VAE-clustering method can be applied to time series. We highly encourage further research to find if the method can be successfully used on financial data in different settings than tested in this thesis. / Syftet med den här avhandlingen är att utforska en ny klustringsalgoritm, VAE-Clustering, och undersöka om den kan tillämpas för att hitta skillnader i fördelningen av aktieavkastningar och förändra distributionen av en nuvarande aktieportfölj och se hur den presterar under olika marknadsvillkor. VAE-klusteringsmetoden är som nämnts en nyinförd metod och inte testad i stort, särskilt inte på tidsserier. Det första steget är därför att se om och hur klusteringen fungerar. Vi tillämpar först algoritmen på ett datasätt som innehåller månatliga tidsserier för strömbehovet i Italien. Syftet med denna del är att fokusera på hur väl metoden fungerar tekniskt. När modellen fungerar bra och ger tillfredställande resultat, går vi vidare och tillämpar modellen på aktieavkastningsdata. I den senare applikationen kan vi inte hitta meningsfulla kluster och kan därför inte gå framåt mot målet som var att simulera olika marknader och se hur en nuvarande portfölj presterar under olika marknadsregimer. Resultaten visar att VAE-klustermetoden är väl tillämpbar på tidsserier. Behovet av el har tydliga skillnader från säsong till säsong och modellen kan framgångsrikt identifiera dessa skillnader. När det gäller finansiell data hoppades vi att modellen skulle kunna hitta olika marknadsregimer baserade på tidsperioder. Modellen kan dock inte skilja olika tidsperioder från varandra. Vi drar därför slutsatsen att VAE-klustermetoden är tillämplig på tidsseriedata, men att strukturen på den finansiella data som undersöktes i denna avhandling gör det svårt att hitta meningsfulla kluster. Den viktigaste upptäckten är att VAE-klustermetoden kan tillämpas på tidsserier. Vi uppmuntrar ytterligare forskning för att hitta om metoden framgångsrikt kan användas på finansiell data i andra former än de testade i denna avhandling
|
285 |
TEMPORAL DIET AND PHYSICAL ACTIVITY PATTERN ANALYSIS, UNSUPERVISED PERSON RE-IDENTIFICATION, AND PLANT PHENOTYPINGJiaqi Guo (18108289) 06 March 2024 (has links)
<p dir="ltr">Both diet and physical activity are known to be risk factors for obesity and chronic diseases such as diabetes and metabolic syndrome. We explore a distance-based approach for clustering daily physical activity time series to find temporal physical activity patterns among U.S. adults (ages 20-65). We further extend this approach to integrate both diet and physical activity, and find joint temporal diet and physical activity patterns. Our experiments indicate that the integration of diet, physical activity, and time has the potential to discover joint patterns with association to health. </p><p dir="ltr">Unsupervised domain adaptive (UDA) person re-identification (re-ID) aims to learn identity information from labeled images in source domains and apply it to unlabeled images in a target domain. We propose a deep learning architecture called Synthesis Model Bank (SMB) to deal with illumination variation in unsupervised person re-ID. From our experiments, the proposed SMB outperforms other synthesis methods on several re-ID benchmarks. </p><p dir="ltr">Recent technology advancement introduced modern high-throughput methodologies such as Unmanned Aerial Vehicles (UAVs) to replace the traditional, labor-intensive phenotyping. For many UAV phenotyping analysis, the first step is to extract the smallest groups of plants called “plots” that have the same genotype. We propose an optimization-based, rotation-adaptive approach for extracting plots in a UAV RGB orthomosaic image. From our experiments, the proposed method achieves better plot extraction accuracy compared to existing approaches, and does not require training data.</p>
|
286 |
Quantifying Gait Characteristics and Neurological Effects in people with Spinal Cord Injury using Data-Driven Techniques / Kvantifiering av gångens egenskaper och neurologisk funktionens effekt hos personer med ryggmärgsskada med hjälp av datadrivna metoderTruong, Minh January 2024 (has links)
Spinal cord injury, whether traumatic or nontraumatic, can partially or completely damage sensorimotor pathways between the brain and the body, leading to heterogeneous gait abnormalities. Mobility impairments also depend on other factors such as age, weight, time since injury, pain, and walking aids used. The ASIA Impairment Scale is recommended to classify injury severity, but is not designed to characterize individual ambulatory capacity. Other standardized tests based on subjective or timing/distance assessments also have only limited ability to determine an individual's capacity. Data-driven techniques have demonstrated effectiveness in analysing complexity in many domains and may provide additional perspectives on the complexity of gait performance in persons with spinal cord injury. The studies in this thesis aimed to address the complexity of gait and functional abilities after spinal cord injury using data-driven approaches. The aim of the first manuscript was to characterize the heterogeneous gait patterns in persons with incomplete spinal cord injury. Dissimilarities among gait patterns in the study population were quantified with multivariate dynamic time warping. Gait patterns were classified into six distinct clusters using hierarchical agglomerative clustering. Through random forest classifiers with explainable AI, peak ankle plantarflexion during swing was identified as the feature that most often distinguished most clusters from the controls. By combining clinical evaluation with the proposed methods, it was possible to provide comprehensive analyses of the six gait clusters. The aim of the second manuscript was to quantify sensorimotor effects on walking performance in persons with spinal cord injury. The relationships between 11 input features and 2 walking outcome measures - distance walked in 6 minutes and net energy cost of transport - were captured using 2 Gaussian process regression models. Explainable AI revealed the importance of muscle strength on both outcome measures. Use of walking aids also influenced distance walked, and cardiovascular capacity influenced energy cost. Analyses for each person also gave useful insights into individual performance. The findings from these studies demonstrate the large potential of advanced machine learning and explainable AI to address the complexity of gait function in persons with spinal cord injury. / Skador på ryggmärgen, oavsett om de är traumatiska eller icke-traumatiska, kan helt eller delvis skada sensoriska och motoriska banor mellan hjärnan och kroppen, vilket påverkar gången i varierande grad. Rörelsenedsättningen beror också på andra faktorer såsom ålder, vikt, tid sedan skadan uppstod, smärta och gånghjälpmedel. ASIA-skalan används för att klassificera ryggmärgsskadans svårighetsgrad, men är inte utformad för att karaktärisera individens gångförmåga. Andra standardiserade tester baserade på subjektiva eller tids och avståndsbedömningar har också begränsad möjlighet att beskriva individuell kapacitet. Datadrivna metoder är kraftfulla och kan ge ytterligare perspektiv på gångens komplexitet och prestation. Studierna i denna avhandling syftar till att analysera komplexa relationer mellan gång, motoriska samt sensoriska funktion efter ryggmärgsskada med hjälp av datadrivna metoder. Syftet med den första studien är att karaktärisera de heterogena gångmönster hos personer med inkomplett ryggmärgsskada. Multivariat dynamisk tidsförvrägning (eng: Multivariate dynamic time warping) användes för att kvantifiera gångskillnader i studiepopulationen. Hierarkisk agglomerativ klusteranalys (eng: hierarchical agglomerative clustering) delade upp gång i sex distinkta kluster, varav fyra hade lägre hastighet än kontroller. Med hjälp av förklarbara AI (eng: explainable AI) identifierades det att fotledsvinkeln i svingfasen hade störst påverkan om vilken kluster som gångmönstret hamnat i. Genom att kombinera klinisk undersökning med datadrivna metoder kunde vi beskriva en omfattande bild av de sex gångklustren. Syftet med den andra manuskriptet är att kvantifiera sensoriska och motoriska faktorerans påverkan på gångförmåga efter ryggmärgsskada. Med hjälp av två Gaussian process-regressionsmodeller identiferades sambanden mellan 11 beskrivande faktorer och 2 gång prestationsmått, nämligen gångavstånd på 6 minuter samt metabola energiåtgång. Med hjälp av förklarbar AI påvisades det stora påverkan av muskelstyrka på både gångsträckan och energiåtgång. Gånghjälpmedlet samt kardiovaskulär kapaciteten hade också betydande påverkan på gångprestation. Enskilda analyser gav insiktsfull information om varje individ. Resultaten från dessa studier visar på potentiella tillämpningar av avancerad maskininlärning och AI metoder för att analysera komplexa relationer mellan funktion och motorisk prestation efter ryggmärgsskada. / <p>QC 20240221</p>
|
287 |
DETERMINING STRUCTURE AND GROWTH CHARACTERISTICS OF OXIDEHETEROSTRUCTURES THROUGH DEPOSITION AND DATA SCIENCE: TOWARDS SINGLE CRYSTAL BATTERIESFraser, Kimberly 27 January 2023 (has links)
No description available.
|
288 |
Intelligence Extraction Using Machine Learning for Threat Identification Purposes : An Overview / Inhämtande av underrättelseinformation genom maskininlärning för identifikation av hotLindgren, Jonatan January 2022 (has links)
Radar is an invaluable tool for detecting and assessing threats on land, on the seas and in the air. To properly evaluate threats, radar operators construct threat libraries where the signal characteristics of emitters are stored and mapped to specific types of platforms. In this project, methods for constructing these threat detection libraries from data obtained during real-life scenarios are investigated. A number of machine learning approaches are investigated and validated using general and method specific scoring methods. Using density based clustering methods and non-linear data transformation it is shown that Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) and spatial consistency metrics can be used to deinterleave and group signals to radar trace emitting platforms, from which suitable library parameters can be extracted. The results show that traditional metrics for evaluating cluster methods are not suited for evaluating data containing spatial information. / Radar är ett ovärderligt verktyg för att upptäcka och identifiera hot på land, till havs och i luften. För att kunna utvärdera olika former av hot använder sig radaroperatörer av hotbibliotek, vilka består av olika radarplattformers signalparametrar. I det här projektet undersöks olika metoder för att bygga hotbibliotek med hjälp av verkliga data insamlat under flygningar i Sverige. Olika maskininlärningsmetoder undersöks och utvärderas med hjälp av både generella och specifika utvärderingsmetoder. Genom att använda sig av densitets- baserade klustringsmetoder och olinjära metoder för att transformera data så visas att hierarkisk densitetsbaserad spatial klustring för tillämningar med störningar (HDBSCAN) och utvärderingsmetoder som baseras på spatial karaktäristik kan användas för att separera och gruppera radarkällor, vilka kan användas för att finna parametrar för att bygga hotbibliotek. Det visas även att traditionella metoder för att utvärdera klustringsresultat inte lämpar sig för att utvärdera spatiala data.
|
289 |
Matching Sticky Notes Using Latent Representations / Matchning av klisterlappar med hjälp av latent representationGarcía San Vicent, Javier January 2022 (has links)
his project addresses the issue of accurately identifying repeated images of sticky notes. Due to environmental conditions and the 3D location of the camera, different pictures taken of sticky notes may look distinct enough to be hard to determine if they belong to the same note. More specifically, this thesis aims to create latent representations of these pictures of sticky notes to encode their content so that all the pictures of the same note have a similar representation that allows to identify them. Thus, those representations must be invariant to light conditions, blur and camera position. To that end, a Siamese neural architecture will be trained based on data augmentation methods. The method consists of learning to embed two augmented versions of the same image into similar representations. This architecture has been trained with unsupervised learning and fine-tuned with supervised learning to detect if two representations belong or not to the same note. The performance of ResNet, EfficientNet and Vision Transformers in encoding the images into their representations has been compared with different configurations. The results show that, while the most complex models overfit small amounts of data, the simplest encoders are capable of properly identifying more than 95% of the sticky notes in grey scale. Those models can create invariant representations that are close to each other in the latent space for pictures of the same sticky note. Gathering more data could result in an improvement of the performance of the model and the possibility of applying it to other fields such as handwritten documents. / Detta projekt tar upp frågan om att identifiera upprepade bilder av klisterlappar. På grund av miljöförhållanden och kamerans 3D-placering kan olika bilder som tagits till klisterlappar se tillräckligt distinkta ut för att det ska vara svårt att avgöra om de faktiskt tillhör samma klisterlappar. Mer specifikt är syftet med denna avhandling att skapa latenta representationer av bilder av klisterlappar som kodar deras innehåll, så att alla bilder av en klisterlapp har en liknande representation som gör det möjligt att identifiera dem. Sålunda måste representationerna vara oföränderliga för ljusförhållanden, oskärpa och kameraposition. För det ändamålet kommer en enkel siamesisk neural arkitektur att tränas baserad på dataförstärkningsmetoder. Metoden går ut på att lära sig att göra representationerna av två förstärkta versioner av en bild så lika som möjligt. Genomatt tillämpa vissa förbättringar av arkitekturen kan oövervakat lärande användas för att träna nätverket. Prestandan hos ResNet, EfficientNet och Vision Transformers när det gäller att koda bilderna till deras representationer har jämförts med olika konfigurationer. Resultaten visar att även om de mest komplexa modellerna överpassar små mängder data, kan de enklaste kodarna korrekt identifiera mer än 95% av klisterlapparna. Dessa modeller kan skapa oföränderliga representationer som är nära i det latenta utrymmet för bilder av samma klisterlapp. Att samla in mer data kan resultera i en förbättring av modellens prestanda och möjligheten att tillämpa den på andra områden som till exempel handskrivna dokument.
|
290 |
<b>DEVELOPING A RESPONSIBLE AI INSTRUCTIONAL FRAMEWORK FOR ENHANCING AI LEGISLATIVE EFFICACY IN THE UNITED STATES</b>Kylie Ann Kristine Leonard (17583945) 09 December 2023 (has links)
<p dir="ltr">Artificial Intelligence (AI) is anticipated to exert a considerable impact on the global Gross Domestic Product (GDP), with projections estimating a contribution of 13 trillion dollars by the year 2030 (IEEE Board of Directors, 2019). In light of this influence on economic, societal, and intellectual realms, it is imperative for Policy Makers to acquaint themselves with the ongoing developments and consequential impacts of AI. The exigency of their preparedness lies in the potential for AI to evolve in unpredicted directions should proactive measures not be promptly instituted.</p><p dir="ltr">This paper endeavors to address a pivotal research question: " Do United States Policy Makers have a sufficient knowledgebase to understand Responsible AI in relation to Machine Learning to pass Artificial Intelligence legislation; and if they do not, how should a pedological instructional framework be created to give them the necessary knowledge?" The pursuit of answers to this question unfolded through the systematic review, gap analysis, and formulation of an instructional framework specifically tailored to elucidate the intricacies of Machine Learning. The findings of this study underscore the imperative for policymakers to undergo educational initiatives in the realm of artificial intelligence. Such educational interventions are deemed essential to empower policymakers with the requisite understanding for formulating effective regulatory frameworks that ensure the development of Responsible AI. The ethical dimensions inherent in this technological landscape warrant consideration, and policymakers must be equipped with the necessary cognitive tools to navigate these ethical quandaries adeptly.</p><p dir="ltr">In response to this exigency, the present study has undertaken the design and development of an instructional framework. This framework is conceived as a strategic intervention to address the evident cognitive gap existing among policymakers concerning the nuances of AI. By imparting an understanding of AI-related concepts, the framework aspires to cultivate a more informed and discerning governance ethos among policymakers, thus contributing to the responsible and ethical deployment of AI technologies.</p>
|
Page generated in 0.0918 seconds