Global ETD Search

31	Optimization Approaches for Modeling Sustainable Food Waste Management Systems Kuruppuarachchi, Lakshika Nishadhi 15 September 2022 (has links) No description available. Industrial Engineering
32	Disaster detection using real-time and historical Twitter data analysis Åslund, Emelie January 2022 (has links) No description available. Disaster Twitter K-means Clustering Computer Sciences Datavetenskap (datalogi)
33	A comparison of clustering techniques for short social text messages / En jämförelse av tekniker för klustring av korta sociala textmeddelanden Ranby, Erik January 2016 (has links) The amount of social text messages authored each day is huge and the information contained within is potentially very valuable. Software that can cluster and thereby help analyze these messages would consequently be helpful. This thesis explores several ways of clustering social text messages. Two algorithms and several setups with these algorithms have been tested and evaluated with the same data as input. Based on these evaluations, a comparison has been conducted in order to answer the question which algorithm setup is best suited for the task. The two clustering algorithms that have been the main subjects for the comparison are K-means and agglomerative hierarchical. All setups were run with 3-grams as well as with only single words as features. The evaluation measures used were intra-cluster distance, inter-cluster distance and silhouette value. Intra-cluster distance is the distance between data points in the same cluster while inter-cluster is the distance between the clusters. Silhouette value is another more general evaluation measure that is often used to estimate the quality of a clustering. The results showed that if running time is a high priority, using K-means without 3-grams is preferred. On the other hand, if the quality of the clusters is important and performance is less so, introducing 3-grams together with any of the two algorithms will suit your needs better. / Mängden sociala textmeddelanden som skrivs varje dag är enorm och informationen i dessa kan vara mycket värdefull. Mjukvara som kan klustra och på så sätt analysera dessa meddelanden kan därmed vara användbar. Denna avhandling utforskar flera sätt att klustra sociala textmeddelanden. Två algoritmer och flera konfigureringar med dessa algoritmer har testats och utvärderats med samma indata. Baserat på dessa utvärderingar har en jämförelse utförts för att kunna svara på frågan vilken av dessa konfigureringar som är bäst anpassad för sitt syfte. De två klustringsalgoritmerna som i första hand har jämförts är K-means och agglomerative hierarchical. Alla konfigureringar kördes både med och utan 3-gram som komplement till endast enstaka ord. Utvärderingsmetoderna som användes var intra-cluster distance, inter-cluster distance och silhouette value. Intra-cluster distance är avståndet mellan datapunkterna i samma kluster medan inter-cluster distance är avståndet mellan de olika klustrena. Silhouette value är annan, mer generell, utvärderingsmetod som ofta används för att uppskatta kvaliten på en klustring. Resultaten visade att K-means utan 3-gram är att föredra om kravet på körningstid inte är högst prioriterat. Å andra sidan, om kvaliten på klustringen är viktigare än prestandan på algoritmen, så bör 3-gram användas tillsammans med vilken som av de två algoritmerna. clustering K-means hierarchical Computer Sciences Datavetenskap (datalogi)
34	Approximation to K-Means-Type Clustering Wei, Yu 05 1900 (has links) <p> Clustering involves partitioning a given data set into several groups based on some similarity/dissimilarity measurements. Cluster analysis has been widely used in information retrieval, text and web mining, pattern recognition, image segmentation and software reverse engineering.</p> <p> K-means is the most intuitive and popular clustering algorithm and the working horse for clustering. However, the classical K-means suffers from several flaws. First, the algorithm is very sensitive to the initialization method and can be easily trapped at a local minimum regarding to the measurement (the sum of squared errors) used in the model. On the other hand, it has been proved that finding a global minimal sum of the squared errors is NP-hard even when k = 2. In the present model for K-means clustering, all the variables are required to be discrete and the objective is nonlinear and nonconvex.</p> <p> In the first part of the thesis, we consider the issue of how to derive an optimization model to the minimum sum of squared errors for a given data set based on continuous convex optimization. For this, we first transfer the K-means clustering into a novel optimization model, 0-1 semidefinite programming where the eigenvalues of involved matrix argument must be 0 or 1. This provides an unified way for many other clustering approaches such as spectral clustering and normalized cut. Moreover, the new optimization model also allows us to attack the original problem based on the relaxed linear and semidefinite programming.</p> <p> Moreover, we consider the issue of how to get a feasible solution of the original clustering from an approximate solution of the relaxed problem. By using principal component analysis, we construct a rounding procedure to extract a feasible clustering and show that our algorithm can provide a 2-approximation to the global solution of the original problem. The complexity of our rounding procedure is O(n^(k2(k-1)/2)), which improves substantially a similar rounding procedure in the literature with a complexity O(n^k3/2). In particular, when k = 2, our rounding procedure runs in O(n log n) time. To the best of our knowledge, this is the lowest complexity that has been reported in the literature to find a solution to K-means clustering with guaranteed quality.</p> <p> In the second part of the thesis, we consider approximation methods for the so-called balanced bi-clustering. By using a simple heuristics, we prove that we can improve slightly the constrained K-means for bi-clustering. For the special case where the size of each cluster is fixed, we develop a new algorithm, called Q means, to find a 2-approximation solution to the balanced bi-clustering. We prove that the Q-means has a complexity O(n^2).</p> <p> Numerical results based our approaches will be reported in the thesis as well.</p> / Thesis / Master of Science (MSc)
35	A comparison of driving characteristics and environmental characteristics using factor analysis and k-means clustering algorithm Jung, Heejin 19 September 2012 (has links) The dissertation aims to classify drivers based on driving and environmental behaviors. The research determined significant factors using factor analysis, identified different driver types using k-means clustering, and studied how the same drivers map in each classification domain. The research consists of two study cases. In the first study case, a new variable is proposed and then is used for classification. The drivers were divided into three groups. Two alternatives were designed to evaluate the environmental impact of driving behavior changes. In the second study case, two types of data sets were constructed: driving data and environmental data. The driving data represents driving behavior of individual drivers. The environmental data represents emissions and fuel consumption estimated by microscopic energy and emissions models. Significant factors were explored in each data set using factor analysis. A pair of factors was defined for each data set. Each pair of factors was used for each k-means clustering: driving clustering and environmental clustering. Then the factors were used to identify groups of drivers in each clustering domain. In the driving clustering, drivers were grouped into three clusters. In the environmental clustering, drivers were clustered into two groups. The groups from the driving clustering were compared to the groups from the environmental clustering in terms of emissions and fuel consumption. The three groups of drivers from the driving clustering were also mapped in the environmental domain. The results indicate that the differences in driving patterns among the three driver groups significantly influenced the emissions of HC, CO, and NOx. As a result, it was determined that the average target operating acceleration and braking did essentially influence the amount of emissions in terms of HC, CO, and NOx. Therefore, if drivers were to change their driving behavior to be more defensive, it is expected that emissions of HC, CO, and NOx would decrease. It was also found that spacing-based driving tended to produce less emissions but consumed more fuel than other groups, while speed-based driving produced relatively more emissions. On the other hand, the defensively moderate drivers consumed less fuel and produced fewer emissions. / Ph. D. NGSIM driving characteristics factor analysis k-means clustering CMEM
36	Wide Area Power System Monitoring Device Design and Data Analysis Khan, Kevin Jamil Hiroshi 14 September 2006 (has links) The frequency disturbance recorder (FDR) is a cost effective data acquisition device used to measure power system frequency at the distribution level. FDRs are time synchronized via the global positioning system (GPS) timing and data recorded by FDRs are time stamped to allow for comparative analysis between FDRs. The data is transmitted over the internet to a central server where the data is collected and stored for post mortem analysis. Currently, most of the analysis is done with power system frequency. The purpose of this study is to take a first in depth look at the angle data collected by FDRs. Different data conditioning techniques are proposed and tested before one is chosen. The chosen technique is then used to extract useable angle data for angle analysis on eight generation trip events. The angle differences are then used to create surface plot angle difference movies for further analysis. A new event detection algorithm, the k-means algorithm, is also presented in this paper. The algorithm is proposed as a simple and fast alternative to the current detection method. Next, this thesis examines several GPS modules and recommends one for a replacement of the current GPS chip, which is no longer in production. Finally, the manufacturing process for creating an FDR is documented. This thesis may have raised more questions than it answers and it is hoped that this work will lay the foundation for further analysis of angles from FDR data. / Master of Science angle difference FNET k-means Frequency Disturbance Recorder voltage angle
37	Using UAV Mounted LiDAR to Estimate Plant Height and Growth Dhami, Harnaik Singh 09 September 2019 (has links) In this thesis, we develop algorithms to estimate crop heights as well as to detect plots infarms. Plant height estimation is needed in precision agriculture to monitor plant health andgrowth cycles. We use a 3D LiDAR mounted on an Unmanned Aerial Vehicle (UAV) anduse the LiDAR data for height and plot estimation. We present a general methodology forextracting plant heights from 3D LiDAR with two specific variants for the two environments:row-crops and pasture. The main algorithm is based on ground plane estimation from 3DLiDAR scans, which is then used to determine the height of plants in the scans. For rowcrops, the plot detection uses a K-means clustering algorithm to find the bounding boxes ofthese clusters, and a voting scheme to determine the best-fit width, height, and orientationof the clusters/plots. This best-fit box is then used to create a grid over the LiDAR dataand the plots are extracted. For pasture, relative heights are estimated using data collectedweekly. Both algorithms we evaluated using data collected from actual farms and pasture.The accuracy in plot height estimation was +/- 5.36 % and that for growth estimates was+/- 7.91 %. / Master of Science / Plant height estimation and measurement is a vital task when it comes to farming. Knowing these characteristics help determine whether the plants are growing healthy and when to harvest them. On similar lines, accurate estimates of the plant heights can be used to prevent overgrazing and undergrazing of pastures. However, as farm and plot size increases, getting consistent and accurate measurements becomes a more time-consuming and manually intensive task. Using robots can help solve this problem because they can be used to estimate the height. With sensors that are already available, such as the 3D LiDAR that we use, we can use aerial robots to fly over the farm and collect plant data. This data can then be processed to estimate the plant height, eliminating the need to go out and manually measure every single plant. This thesis discusses a methodology of doing exactly this, as well as detecting plots within a farm. The algorithms are evaluated using data collected from actual farms and pasture. LiDAR Plant Height Estimation Plot Detection K-means PrecisionAgriculture
38	Classification of ADHD Using Heterogeneity Classes and Attention Network Task Timing Hanson, Sarah Elizabeth 21 June 2018 (has links) Throughout the 1990s ADHD diagnosis and medication rates have increased rapidly, and this trend continues today. These sharp increases have been met with both public and clinical criticism, detractors stating over-diagnosis is a problem and healthy children are being unnecessarily medicated and labeled as disabled. However, others say that ADHD is being under-diagnosed in some populations. Critics often state that there are multiple factors that introduce subjectivity into the diagnosis process, meaning that a final diagnosis may be influenced by more than the desire to protect a patient's wellbeing. Some of these factors include standardized testing, legislation affecting special education funding, and the diagnostic process. In an effort to circumvent these extraneous factors, this work aims to further develop a potential method of using EEG signals to accurately discriminate between ADHD and non-ADHD children using features that capture spectral and perhaps temporal information from evoked EEG signals. KNN has been shown in prior research to be an effective tool in discriminating between ADHD and non-ADHD, therefore several different KNN models are created using features derived in a variety of fashions. One takes into account the heterogeneity of ADHD, and another one seeks to exploit differences in executive functioning of ADHD and non-ADHD subjects. The results of this classification method vary widely depending on the sample used to train and test the KNN model. With unfiltered Dataset 1 data over the entire ANT1 period, the most accurate EEG channel pair achieved an overall vector classification accuracy of 94%, and the 5th percentile of classification confidence was 80%. These metrics suggest that using KNN of EEG signals taken during the ANT task would be a useful diagnosis tool. However, the most accurate channel pair for unfiltered Dataset 2 data achieved an overall accuracy of 65% and a 5th percentile of classification confidence of 17%. The same method that worked so well for Dataset 1 did not work well for Dataset 2, and no conclusive reason for this difference was identified, although several methods to remove possible sources of noise were used. Using target time linked intervals did appear to marginally improve results in both Dataset 1 and Dataset 2. However, the changes in accuracy of intervals relative to target presentation vary between Dataset 1 and Dataset 2. Separating subjects into heterogeneity classes does appear to result in good (up to 83%) classification accuracy for some classes, but results are poor (about 50%) for other heterogeneity classes. A much larger data set is necessary to determine whether or not the very positive results found with Dataset 1 extend to a wide population. / Master of Science ADHD EEG KNN K-Means Heterogeneity Attention Network Task
39	Agrupamento de textos utilizando divergência Kullback-Leibler / Texts grouping using Kullback-Leibler divergence Willian Darwin Junior 22 February 2016 (has links) O presente trabalho propõe uma metodologia para agrupamento de textos que possa ser utilizada tanto em busca textual em geral como mais especificamente na distribuição de processos jurídicos para fins de redução do tempo de resolução de conflitos judiciais. A metodologia proposta utiliza a divergência Kullback-Leibler aplicada às distribuições de frequência dos radicais (semantemas) das palavras presentes nos textos. Diversos grupos de radicais são considerados, formados a partir da frequência com que ocorrem entre os textos, e as distribuições são tomadas em relação a cada um desses grupos. Para cada grupo, as divergências são calculadas em relação à distribuição de um texto de referência formado pela agregação de todos os textos da amostra, resultando em um valor para cada texto em relação a cada grupo de radicais. Ao final, esses valores são utilizados como atributos de cada texto em um processo de clusterização utilizando uma implementação do algoritmo K-Means, resultando no agrupamento dos textos. A metodologia é testada em exemplos simples de bancada e aplicada a casos concretos de registros de falhas elétricas, de textos com temas em comum e de textos jurídicos e o resultado é comparado com uma classificação realizada por um especialista. Como subprodutos da pesquisa realizada, foram gerados um ambiente gráfico de desenvolvimento de modelos baseados em Reconhecimento de Padrões e Redes Bayesianas e um estudo das possibilidades de utilização de processamento paralelo na aprendizagem de Redes Bayesianas. / This work proposes a methodology for grouping texts for the purposes of textual searching in general but also specifically for aiding in distributing law processes in order to reduce time applied in solving judicial conflicts. The proposed methodology uses the Kullback-Leibler divergence applied to frequency distributions of word stems occurring in the texts. Several groups of stems are considered, built up on their occurrence frequency among the texts and the resulting distributions are taken regarding each one of those groups. For each group, divergences are computed based on the distribution taken from a reference text originated from the assembling of all sample texts, yelding one value for each text in relation to each group of stems. Finally, those values are taken as attributes of each text in a clusterization process driven by a K-Means algorithm implementation providing a grouping for the texts. The methodology is tested for simple toy examples and applied to cases of electrical failure registering, texts with similar issues and law texts and compared to an expert\'s classification. As byproducts from the conducted research, a graphical development environment for Pattern Recognition and Bayesian Networks based models and a study on the possibilities of using parallel processing in Bayesian Networks learning have also been obtained. Agrupamento de textos Algoritmo K-Means Divergência Kullback-Leibler Informação mútua K-Means algorithm Kullback-Leibler divergence Mutual information Text clustering
40	Agrupamento de textos utilizando divergência Kullback-Leibler / Texts grouping using Kullback-Leibler divergence Darwin Junior, Willian 22 February 2016 (has links) O presente trabalho propõe uma metodologia para agrupamento de textos que possa ser utilizada tanto em busca textual em geral como mais especificamente na distribuição de processos jurídicos para fins de redução do tempo de resolução de conflitos judiciais. A metodologia proposta utiliza a divergência Kullback-Leibler aplicada às distribuições de frequência dos radicais (semantemas) das palavras presentes nos textos. Diversos grupos de radicais são considerados, formados a partir da frequência com que ocorrem entre os textos, e as distribuições são tomadas em relação a cada um desses grupos. Para cada grupo, as divergências são calculadas em relação à distribuição de um texto de referência formado pela agregação de todos os textos da amostra, resultando em um valor para cada texto em relação a cada grupo de radicais. Ao final, esses valores são utilizados como atributos de cada texto em um processo de clusterização utilizando uma implementação do algoritmo K-Means, resultando no agrupamento dos textos. A metodologia é testada em exemplos simples de bancada e aplicada a casos concretos de registros de falhas elétricas, de textos com temas em comum e de textos jurídicos e o resultado é comparado com uma classificação realizada por um especialista. Como subprodutos da pesquisa realizada, foram gerados um ambiente gráfico de desenvolvimento de modelos baseados em Reconhecimento de Padrões e Redes Bayesianas e um estudo das possibilidades de utilização de processamento paralelo na aprendizagem de Redes Bayesianas. / This work proposes a methodology for grouping texts for the purposes of textual searching in general but also specifically for aiding in distributing law processes in order to reduce time applied in solving judicial conflicts. The proposed methodology uses the Kullback-Leibler divergence applied to frequency distributions of word stems occurring in the texts. Several groups of stems are considered, built up on their occurrence frequency among the texts and the resulting distributions are taken regarding each one of those groups. For each group, divergences are computed based on the distribution taken from a reference text originated from the assembling of all sample texts, yelding one value for each text in relation to each group of stems. Finally, those values are taken as attributes of each text in a clusterization process driven by a K-Means algorithm implementation providing a grouping for the texts. The methodology is tested for simple toy examples and applied to cases of electrical failure registering, texts with similar issues and law texts and compared to an expert\'s classification. As byproducts from the conducted research, a graphical development environment for Pattern Recognition and Bayesian Networks based models and a study on the possibilities of using parallel processing in Bayesian Networks learning have also been obtained. Agrupamento de textos Algoritmo K-Means Divergência Kullback-Leibler Informação mútua K-Means algorithm Kullback-Leibler divergence Mutual information Text clustering

Search results