Global ETD Search

41	Využití fuzzy množin ve shlukové analýze se zaměřením na metodu Fuzzy C-means Clustering / Fuzzy Sets Use in Cluster Analysis with a Special Attention to a Fuzzy C-means Clustering Method Camara, Assa January 2020 (has links) This master thesis deals with cluster analysis, more specifically with clustering methods that use fuzzy sets. Basic clustering algorithms and necessary multivariate transformations are described in the first chapter. In the practical part, which is in the third chapter we apply fuzzy c-means clustering and k-means clustering on real data. Data used for clustering are the inputs of chemical transport model CMAQ. Model CMAQ is used to approximate concentration of air pollutants in the atmosphere. To the data we will apply two different clustering methods. We have used two different methods to select optimal weighting exponent to find data structure in our data. We have compared all 3 created data structures. The structures resembled each other but with fuzzy c-means clustering, one of the clusters did not resemble any of the clustering inputs. The end of the third chapter is dedicated to an attempt to find a regression model that finds the relationship between inputs and outputs of model CMAQ.
42	ASSESSING THE PERFORMANCE OF PROCEDURALLY GENERATED TERRAINS USING HOUDINI’S CLUSTERING METHOD Varisht Raheja (8797292) 05 May 2020 (has links) <p>Terrain generation is a convoluted and a popular topic in the VFX industry. Whether you are part of the film/TV or gaming industry, a terrain, is a highly nuanced feature that is usually present. Regardless of walking on a desert like terrain in the film, Blade Runner 2049 or fighting on different planets like in Avatar, 3D terrains is a major part of any digital media. The purpose of this thesis is about developing a workflow for large-scale terrains using complex data sets and utilizing this workflow to maintain a balance between the procedural content and the artistic input made especially for smaller companies which cannot afford an enhanced pipeline to deal with major technical complications. The workflow consists of two major elements, development of the tool used to optimize the workflow and the recording and maintaining of the efficiency in comparison to the older workflow. </p> <p> </p> <p> My research findings indicate that despite the increase in overall computational abilities, one of the many issues that are still present is generating a highly advanced terrain with the added benefits of the artists and users’ creative variations. Reducing the overall time to simulate and compute a highly realistic and detailed terrain is the main goal, thus this thesis will present a method to overcome the speed deficiency while keeping the details of the terrain present.</p> Computer Graphics Computer Gaming and Animation Houdini FX Terrain generation Realistic terrain Satelite data lidar VEX Python Clustering K-means clustering algorithms HDA performance measure
43	Using Machine Learning for Predictive Maintenance in Modern Ground-Based Radar Systems / Användning av maskininlärning för förutsägbart underhåll i moderna markbaserade radarsystem Faraj, Dina January 2021 (has links) Military systems are often part of critical operations where unplanned downtime should be avoided at all costs. Using modern machine learning algorithms it could be possible to predict when, where, and at what time a fault is likely to occur which enables time for ordering replacement parts and scheduling maintenance. This thesis is a proof of concept study for anomaly detection in monitoring data, i.e., sensor data from a ground based radar system as an initial experiment to showcase predictive maintenance. The data in this thesis was generated by a Giraffe 4A during normal operation, i.e., no anomalous data with known failures was provided. The problem setting is originally an unsupervised machine learning problem since the data is unlabeled. Speculative binary labels are introduced (start-up state and steady state) to approximate a classification accuracy. The system is functioning correctly in both phases but the monitoring data looks differently. By showing that the two phases can be distinguished, it is possible to assume that anomalous data during break down can be detected as well. Three different machine learning classifiers, i.e., two unsupervised classifiers, K-means clustering and isolation forest and one supervised classifier, logistic regression are evaluated on their ability to detect the start-up phase each time the system is turned on. The classifiers are evaluated graphically and based on their accuracy score. All three classifiers recognize a start up phase for at least four out of seven subsystems. By only analyzing their accuracy score it appears that logistic regression outperforms the other models. The collected results manifests the possibility to distinguish between start-up and steady state both in a supervised and unsupervised setting. To select the most suitable classifier, further experiments on larger data sets are necessary. / Militära system är ofta en del av kritiska operationer där oplanerade driftstopp bör undvikas till varje pris. Med hjälp av moderna maskininlärningsalgoritmer kan det vara möjligt att förutsäga när och var ett fel kommer att inträffa. Detta möjliggör tid för beställning av reservdelar och schemaläggning av underhåll. Denna uppsats är en konceptstudie för detektion av anomalier i övervakningsdata från ett markbaserat radarsystem som ett initialt experiment för att studera prediktivt underhåll. Datat som används i detta arbete kommer från en Saab Giraffe 4A radar under normal operativ drift, dvs. ingen avvikande data med kända brister tillhandahölls. Problemställningen är ursprungligen ett oövervakat maskininlärningsproblem eftersom datat saknar etiketter. Spekulativa binära etiketter introduceras (uppstart och stabil fas) för att uppskatta klassificeringsnoggrannhet. Systemet fungerar korrekt i båda faserna men övervakningsdatat ser annorlunda ut. Genom att visa att de två faserna kan urskiljas, kan man anta att avvikande data också går att detektera när fel uppstår. Tre olika klassificeringsmetoder dvs. två oövervakade maskininlärningmodeller, K-means klustring och isolation forest samt en övervakad modell, logistisk regression utvärderas utifrån deras förmåga att upptäcka uppstartfasen varje gång systemet slås på. Metoderna utvärderas grafiskt och baserat på deras träffsäkerhet. Alla tre metoderna känner igen en startfas för minst fyra av sju delsystem. Genom att endast analysera deras noggrannhetspoäng, överträffar logistisk regression de andra modellerna. De insamlade resultaten demonstrerar möjligheten att skilja mellan uppstartfas och stabil fas, både i en övervakad och oövervakad miljö. För att välja den bästa metoden är det nödvändigt med ytterligare experiment på större datamängder. Predictive Maintenance Machine learning Isolation forest K-means clustering Logistic regression Radar systems. Prediktivt underhåll Maskininlärning Isolation forest K-means klustring Logistisk regression Radarsystem. Mathematics Matematik
44	Clustering Methods as a Recruitment Tool for Smaller Companies / Klustermetoder som ett verktyg i rekrytering för mindre företag Thorstensson, Linnea January 2020 (has links) With the help of new technology it has become much easier to apply for a job. Reaching out to a larger audience also results in a lot of more applications to consider when hiring for a new position. This has resulted in that many big companies uses statistical learning methods as a tool in the first step of the recruiting process. Smaller companies that do not have access to the same amount of historical and big data sets do not have the same opportunities to digitalise their recruitment process. Using topological data analysis, this thesis explore how clustering methods can be used on smaller data sets in the early stages of the recruitment process. It also studies how the level of abstraction in data representation affects the results. The methods seem to perform well on higher level job announcements but struggles on basic level positions. It also shows that the representation of candidates and jobs has a huge impact on the results. / Ny teknologi har förenklat processen för att söka arbete. Detta har resulterat i att företag får tusentals ansökningar som de måste ta hänsyn till. För att förenkla och påskynda rekryteringsprocessen har många stora företag börjat använda sig av maskininlärningsmetoder. Mindre företag, till exempel start-ups, har inte samma möjligheter för att digitalisera deras rekrytering. De har oftast inte tillgång till stora mängder historisk ansökningsdata. Den här uppsatsen undersöker därför med hjälp av topologisk dataanalys hur klustermetoder kan användas i rekrytering på mindre datauppsättningar. Den analyserar också hur abstraktionsnivån på datan påverkar resultaten. Metoderna visar sig fungera bra för jobbpositioner av högre nivå men har problem med jobb på en lägre nivå. Det visar sig också att valet av representation av kandidater och jobb har en stor inverkan på resultaten. Statistics Clustering Mapper K-means clustering Hierarchical clustering Principal component analysis recruitment Statistik Klustermetoder PCA rekrytering Probability Theory and Statistics Sannolikhetsteori och statistik
45	Implementation of Hierarchical and K-Means Clustering Techniques on the Trend and Seasonality Components of Temperature Profile Data Ogedegbe, Emmanuel 01 December 2023 (has links) (PDF) In this study, time series decomposition techniques are used in conjunction with Kmeans clustering and Hierarchical clustering, two well-known clustering algorithms, to climate data. Their implementation and comparisons are then examined. The main objective is to identify similar climate trends and group geographical areas with similar environmental conditions. Climate data from specific places are collected and analyzed as part of the project. The time series is then split into trend, seasonality, and residual components. In order to categorize growing regions according to their climatic inclinations, the deconstructed time series are then submitted to K-means clustering and Hierarchical clustering with dynamic time warping. In order to understand how different regions’ climates compare to one another and how regions cluster based on the general trend of the temperature profile over the course of the full growing season as opposed to the seasonality component for the various locations, the created clusters are evaluated. Time series data K-Means Clustering Hierarchical Clustering Applied Mathematics Computer Sciences Data Science Statistics and Probability
46	Identification of spatiotemporal nutrient patterns and associated ecohydrological trends in the tampa bay coastal region Wimberly, Brent 01 May 2012 (has links) Improvements for environmental monitoring and assessment were achieved to advance our understanding of sea-land interactions and nutrient cycling in a coastal bay.; The comprehensive assessment techniques for monitoring of water quality of a coastal bay can be diversified via an extensive investigation of the spatiotemporal nutrient patterns and the associated eco-hydrological trends in a coastal urban region. With this work, it is intended to thoroughly investigate the spatiotemporal nutrient patterns and associated eco-hydrological trends via a two part inquiry of the watershed and its adjacent coastal bay. The findings show that the onset of drought lags the crest of the evapotranspiration and precipitation curve during each year of drought. During the transition year, ET and precipitation appears to start to shift back into the analogous temporal pattern as the 2005 wet year. NDVI shows a flat receding tail for the September crest in 2005 due to the hurricane impact signifying that the hurricane event in October dampening the severity of the winter dry season in which alludes to relative system memory. The k-means model with 8 clusters is the optimal choice, in which cluster 2 at Lower Tampa Bay had the minimum values of total nitrogen (TN) concentrations, chlorophyll a (Chl-a) concentrations, and ocean color values in every season as well as the minimum concentration of total phosphorus (TP) in three consecutive seasons in 2008. Cluster 5, located in Middle Tampa Bay, displayed elevated TN concentrations, ocean color values, and Chl-a concentrations, suggesting that high colored dissolved organic matter values are linked with some nutrient sources. The data presented by the gravity modeling analysis indicate that the Alafia River Basin is the major contributor of nutrients in terms of both TP and TN values in all seasons. Such ecohydrological evaluation can be applied for supporting the LULC management of climatic vulnerable regions as well as further enrich the comprehensive assessment techniques for estimating and examining the multi-temporal impacts and dynamic influence of urban land use and land cover. Civil Engineering
47	Development of novel unsupervised and supervised informatics methods for drug discovery applications Mohiddin, Syed B. 22 February 2006 (has links) No description available. Engineering, Chemical Unsupervised Classification Supervised Classification Principal Component Analysis Partial Least Squares Hierarchical K-means Clustering Identifying Diverse Molecular Targets
48	Epigenetic Responses of Arabidopsis to Abiotic Stress Laliberte, Suzanne Rae 17 March 2023 (has links) Weed resistance to control measures, particularly herbicides, is a growing problem in agriculture. In the case of herbicides, resistance is sometimes connected to genetic changes that directly affect the target site of the herbicide. Other cases are less straightforward where resistance arises without such a clear-cut mechanism. Understanding the genetic and gene regulatory mechanisms that may lead to the rapid evolution of resistance in weedy species is critical to securing our food supply. To study this phenomenon, we exposed young Arabidopsis plants to sublethal levels of one of four weed management stressors, glyphosate herbicide, trifloxysulfuron herbicide, mechanical clipping, and shading. To evaluate responses to these stressors we collected data on gene expression and regulation via epigenetic modification (methylation) and small RNA (sRNA). For all of the treatments except shade, the stress was limited in duration, and the plants were allowed to recover until flowering, to identify changes that persist to reproduction. At flowering, DNA for methylation bisulfite sequencing, RNA, and sRNA were extracted from newly formed rosette leaf tissue. Analyzing the individual datasets revealed many differential responses when compared to the untreated control for gene expression, methylation, and sRNA expression. All three measures showed increases in differential abundance that were unique to each stressor, with very little overlap between stressors. Herbicide treatments tended to exhibit the largest number of significant differential responses, with glyphosate treatment most often associated with the greatest differences and contributing to overlap. To evaluate how large datasets from methylation, gene expression, and sRNA analyses could be connected and mined to link regulatory information with changes in gene expression, the information from each dataset and for each gene was united in a single large matrix and mined with classification algorithms. Although our models were able to differentiate patterns in a set of simulated data, the raw datasets were too noisy for the models to consistently identify differentially expressed genes. However, by focusing on responses at a local level, we identified several genes with differential expression, differential sRNA, and differential methylation. While further studies will be needed to determine whether these epigenetic changes truly influence gene expression at these sites, the changes detected at the treatment level could prime the plants for future incidents of stress, including herbicides. / Doctor of Philosophy / Growing resistance to herbicides, particularly glyphosate, is one of the many problems facing agriculture. The rapid rise of resistance across herbicide classes has caused some to wonder if there is a mechanism of adaptation that does not involve mutations. Epigenetics is the study of changes in the phenotype that cannot be attributed to changes in the genotype. Typically, studies revolve around two features of the chromosomes: cytosine methylation and histone modifications. The former can influence how proteins interact with DNA, and the latter can influence protein access to DNA. Both can affect each other in self-reinforcing loops. They can affect gene expression, and DNA methylation can be directed by small RNA (sRNA), which can also influence gene expression through other pathways. To study these processes and their role in abiotic stress response, we aimed to analyze sRNA, RNA, and DNA from Arabidopsis thaliana plants under stress. The stresses applied were sublethal doses of the herbicides, glyphosate and trifloxysulfuron, as well as mechanical clipping and shade to represent other weed management stressors. The focus of the project was to analyze these responses individually and together to find epigenetic responses to stresses routinely encountered by weeds. We tested RNA for gene expression changes under our stress conditions and identified many, including some pertaining to DNA methylation regulation. The herbicide treatments were associated with upregulated defense genes and downregulated growth genes. Shade treated plants had many downregulated defense and other stress response genes. We also detected differential methylation and sRNA responses when compared to the control plants. Changes to methylation and sRNA only accounted for about 20% of the variation in gene expression. While attempting to link the epigenetic process of methylation to gene expression, we connected all the data sets and developed computer programs to try to make correlations. While these methods worked on a simulated dataset, we did not detect broad patterns of changes to epigenetic pathways that correlated strongly with gene expression in our experiment's data. There are many factors that can influence gene expression that could create noise that would hinder the algorithms' abilities to detect differentially expressed genes. This does not, however, rule out the possibility of epigenetic influence on gene expression in local contexts. Through scoring the traits of individual genes, we found several that interest us for future studies. epigenetics weeds bioinformatics RNA Seq differential expression analysis whole genome bisulfite sequencing data mining k-means clustering decision tree random forest multi-'omics
49	Detection and Classification of Sparse Traffic Noise Events / Detektering och klassificering av bullerhändelser från gles trafik Golshani, Kevin, Ekberg, Elias January 2023 (has links) Noise pollution is a big health hazard for people living in urban areas, and its effects on humans is a growing field of research. One of the major contributors to urban noise pollution is the noise generated by traffic. Noise simulations can be made in order to build noise maps used for noise management action plans, but in order to test their accuracy real measurements needs to be done, in this case in the form of noise measurements taken adjacent to a road. The aim of this project is to test machine learning based methods in order to develop a robust way of detecting and classifying vehicle noise in sparse traffic conditions. The primary focus is to detect traffic noise events, and the secondary focus is to classify what kind of vehicle is producing the noise. The data used in this project comes from sensors installed on a testbed at a street in southern Stockholm. The sensors include a microphone that is continuously measuring the local noise environment, a radar that detects each time a vehicle is passing by, and a camera that also detects a vehicle by capturing its license plate. Only sparse traffic noises are considered for this thesis, as such the audio recordings used are those where the radar has only detected one vehicle in a 40 second window. This makes the data gathered weakly labeled. The resulting detection method is a two-step process: First, the unsupervised learning method k-means is implemented for the generation of strong labels. Second, the supervised learning method random forest or support vector machine uses the strong labels in order to classify audio features. The detection system of sparse traffic noise achieved satisfactory results. However, the unsupervised vehicle classification method produced inadequate results and the clustering could not differentiate different vehicle classes based on the noise data. / Buller är en stor hälsorisk för människor som bor i stadsområden, och dess effekter på människor är ett växande forskningsfält. En av de största bidragen till stadsbuller är oljud som genereras av trafiken. Man kan utföra simuleringar i syfte att skapa bullerkartor som kan användas till planer för att minska dessa ljud. För att testa deras noggrannhet måste verkliga mätningar tas, i detta fall i formen av ljudmätningar tagna intill en väg. Syftet med detta projekt är att testa maskininlärningsmetoder för att utveckla ett robust sätt att detektera och klassificera fordonsljud i glesa trafikförhållanden. Primärt fokus ligger på att detektera bullerhändelser från trafiken, och sekundärt fokus är att försöka klassificera vilken typ av fordon som producerade ljudet. Datan som används i detta projekt kommer från sensorer installerade på en testbädd på en gata i södra Stockholm. Sensorerna inkluderar en mikrofon som kontinuerligt mäter den lokala ljudmiljön, en radar som detekterar varje gång ett fordon passerar, och en kamera som också detekterar ett fordon genom att ta bild på dess registreringsskylt. Endast ljud från gles trafik kommer att beaktas och användas i detta arbete, och därför används bara de ljudinspelningar där radarn har upptäckt ett enskilt fordon under ett 40 sekunders intervall. Detta gör att den insamlade datan har svaga etiketter. Den resulterande detekteringsmetoden är en tvåstegsprocess: För det första används den oövervakade inlärningsmetoden k-means för att generera starka etiketter. För det andra används de starka etiketterna av den övervakade inlärningsmetoden slumpmässig beslutsskog eller stödvektormaskin i syfte att klassificera ljudegenskaper. Detekteringssystemet av glest trafikljud uppnådde tillfredsställande resultat. Däremot producerade den oövervakade klassificeringsmetoden för fordonsljud otillräckliga resultat, och klustringen kunde inte urskilja mellan olika fordonsklasser baserat på ljuddatan. Noise pollution Machine learning Sound event detection SED Support vector machine SVM Random forest RF Decision tree K-means clustering Spherical k-means clustering Traffic noise Buller Maskininlärning Ljudhändelsedetektering Stödvektormaskin SVM Slumpmässiga beslutsskogar RF K-means klustring Sfärisk k-means klustring Trafikljud Bullerhändelse Other Mathematics Annan matematik
50	Classification of Carpiodes Using Fourier Descriptors: A Content Based Image Retrieval Approach Trahan, Patrick 06 August 2009 (has links) Taxonomic classification has always been important to the study of any biological system. Many biological species will go unclassified and become lost forever at the current rate of classification. The current state of computer technology makes image storage and retrieval possible on a global level. As a result, computer-aided taxonomy is now possible. Content based image retrieval techniques utilize visual features of the image for classification. By utilizing image content and computer technology, the gap between taxonomic classification and species destruction is shrinking. This content based study utilizes the Fourier Descriptors of fifteen known landmark features on three Carpiodes species: C.carpio, C.velifer, and C.cyprinus. Classification analysis involves both unsupervised and supervised machine learning algorithms. Fourier Descriptors of the fifteen known landmarks provide for strong classification power on image data. Feature reduction analysis indicates feature reduction is possible. This proves useful for increasing generalization power of classification. Content-Based Image Retrieval Alpha Taxonomy Beta Taxonomy Gamma Taxonomy Principal Component Analysis K-Means Clustering Hierarchical Clustering KNearest Neighbor Support Vector Machine Random Forest Quadratic Discriminant Analysis Feature Reduction Variable Importance

Search results