Global ETD Search

271	An Ordinary Differential Equation Based Model For Clustering And Vector Quantization Cheng, Jie 01 January 2009 (has links) (PDF) This research focuses on the development of a novel adaptive dynamical system approach to vector quantization or clustering based on only ordinary differential equations (ODEs) with potential for a real-time implementation. The ODE-based approach has an advantage in making it possible real-time implementation of the system with either electronic or photonic analog devices. This dynamical system consists of a set of energy functions which create valleys for representing clusters. Each valley represents a cluster of similar input patterns. The proposed system includes a dynamic parameter, called vigilance parameter. This parameter approximately reflects the radius of the generated valleys. Through several examples of different pattern clusters, it is shown that the model can successfully quantize/cluster these types of input patterns. Also, a hardware implementation by photonic and/or electronic analog devices is given In addition, we analyze and study stability of our dynamical system. By discovering the equilibrium points for certain input patterns and analyzing their stability, we have shown the quantizing behavior of the system with respect to its parameters. We also extend our model to include competition mechanism and vigilance dynamics. The competition mechanism causes only one label to be assigned to a group of patterns. The vigilance dynamics adjust vigilance parameter so that the cluster size or the quantizing resolution can be adaptive to the density and distribution of the input patterns. This reduces the burden of re-tuning the vigilance parameter for a given input pattern set and also better represents the input pattern space. The vigilance parameter approximately reflects the radius of the generated valley for each cluster. Making this parameter dynamic allows the bigger cluster to have a bigger radius and as a result a better cluster. Furthermore, an alternative dynamical system to our proposed system is also introduced. This system utilizes sigmoid and competitive functions. Although the results of this system are encouraging, the use of sigmoid function makes analyze and study stability of the system extremely difficult. Clustering ODE Real Time Vector Quantization
272	Classifying Previous Covid-19 Infection : Advanced Logistic Regression Approach / Klassifiering av tidigare Covid-19 infektion : Avancerad logistisk regressionsmetodik Westerholm, Daniel January 2023 (has links) The study aimed to developed a logistic model based on antibody proteins, vaccinations and demographic factors that predicts previous infection in Covid-19. The data set comprised of 2750 individuals from eldercare homes in Sweden, with four test dates executed between October of 2021 and August of 2022. Exploratory data analysis revealed bimodal patterns in the antibodies against nucleocapsid protein within the non-infected group, raising suspicions of false negatives in the data. Due to the binary nature of the response and to be interpretable for further research, logistic regressions were used to model the relation between predictors and the logit of the response. Because of low performance scores and high probability for the presence of false negatives, K-means clustering algorithm was performed on the data. As a clustering variable, the logarithm of base 2 of the nucleocapsid protein was used, because of its theoretical relationship with previous infection in Covid-19. Observations were reclassified using the clustering technique, and two new logistic models were fitted to the data. The final model contained polynomial terms to handle the non-linear relationship between the logit of the response and the predictors. We found a significant relationship between the logarithm of 2 of nucleocapsid protein and previous Covid-19 infection in the final model, with high prediction results. We reached an F1-score of 0.94, indicating a well-performing model. Additionally, an algorithm was created to predict the days since infection, involving the change in nucleocapsid protein from one test date to the next, and a GAM model for fitting a smooth line to the data between nucleocapsid protein as response against the days since infection. Using this algorithm, we reached an absolute mean error between predicted results and actual days since infection of 23 days. This algorithm was later applied to observations reclassified in the clustering process. In conclusion, the study successfully reclassified false negative observations with previous Covid-19 infection, and fitted a logistic model with high prediction score with F1-score of 0.94. Finally, an algorithm was created that estimated the days since infection with an absolute mean error of 23 days. / Syftet med studien var att utveckla en logistisk modell baserad på antikroppsproteiner, vaccinationer och demografiska faktorer som förutsäger tidigare infektion i Covid-19. Datamängden bestod av 2750 individer från äldreboenden i Sverige, med fyra testdatum utförda mellan oktober 2021 och augusti 2022. Utforskande dataanalys visade på bimodala mönster i antikroppar mot nukleokapsidprotein inom den icke- infekterade gruppen, vilket gav upphov till misstankar om falskt negativa resultat i datamaterialet. På grund av svarets binära karaktär och för att vara tolkningsbara för vidare forskning användes logistiska regressioner för att modellera förhållandet mellan prediktorer och responsvariabeln. På grund av låga prediktionsresultat och hög sannolikhet av förekomsten av falskt negativa svar utfördes K-means-klusteralgoritmen på datat. Som klustervariabel användes logaritmen av bas 2 för nukleokapsidproteinet, på grund av dess teoretiska samband med tidigare infektion i Covid-19. Observationerna omklassificerades med hjälp av klustertekniken, och två nya logistiska modeller anpassades till datat. Den slutliga modellen innehöll polynomiala termer för att hantera det icke-linjära förhållandet mellan responsens logit och prediktorerna. Vi fann ett signifikant samband mellan logaritmen av 2 av nuk- leokapsidprotein och tidigare Covid-19-infektion i den slutliga modellen, med ett högt prediktionsresultat. Vi nådde en F1-score på 0.94. Dessutom skapades en algoritm som predicerade dagar sedan infektion med hjälp av förändringen i nukleokap- sidprotein från ett testdatum till nästa, och en GAM-modell för att anpassa ett glidande medelvärdeslinje till datat mellan nukleokapsidprotein som response mot dagarna sedan infektionen. Med hjälp av denna algoritm nåddes ett absolut medelfel på 23 dagar mellan prediktion och faktiskt tid sedan infektionen. Denna algoritm tillämpades senare på observationer som omklassificerats i klusterprocessen. Sammanfattningsvis lyckades studien framgångsrikt omklassificera falskt negativa observationer med tidigare Covid-19-infektion och anpassade en logistisk modell med hög prediktionspoäng med en F1-score på 0.94. Slutligen skapades en algoritm som uppskattade dagarna sedan infektionen med ett absolut medelfel på 23 dagar. Covid-19 logistic regression clustering Mathematics Matematik
273	A Minimum Spanning Tree Based Clustering Algorithm for High throughput Biological Data Pirim, Harun 30 April 2011 (has links) A new minimum spanning tree (MST) based heuristic for clustering biological data is proposed. The heuristic uses MSTs to generate initial solutions and applies a local search to improve the solutions. Local search transfers the nodes to the clusters with which they have the most connections, if this transfer improves the objective function value. A new objective function is defined and used in the heuristic. The objective function considers both tightness and separation of the clusters. Tightness is obtained by minimizing the maximum diameter among all clusters. Separation is obtained by minimizing the maximum number of connections of a gene with other clusters. The objective function value calculation is realized on a binary graph generated using the threshold value and keeping the minimumpercentage of edges while the binary graph is connected. Shortest paths between nodes are used as distance values between gene pairs. The efficiency and the effectiveness of the proposed method are tested using fourteen different data sets externally and biologically. The method finds clusters which are similar to actual ones using 12 data sets for which actual clusters are known. The method also finds biologically meaningful clusters using 2 data sets for which real clusters are not known. A mixed integer programming model for clustering biological data is also proposed for future studies. optimization heuristics networks integer programming clustering
274	Longitudinal Data Clustering Via Kernel Mixture Models Zhang, Xi January 2021 (has links) Kernel mixture models are proposed to cluster univariate, independent multivariate and dependent bivariate longitudinal data. The Gaussian distribution in finite mixture models is replaced by the Gaussian and gamma kernel functions, and the expectation-maximization algorithm is used to estimate bandwidths and compute log-likelihood scores. For dependent bivariate longitudinal data, the bivariate Gaussian copula is used to reveal the correlation between two attributes. After that, we use AIC, BIC and ICL to select the best model. In addition, we also introduce a kernel distance-based clustering method to compare with the kernel mixture models. A simulation is performed to illustrate the performance of this mixture model, and results show that the gamma kernel mixture model performs better than the kernel distance-based clustering method based on misclassification rates. Finally, these two models are applied to COVID-19 data, and sixty countries are classified into ten clusters based on growth rates and death rates. / Thesis / Master of Science (MSc) kernel mixture model longitudinal data clustering
275	Evolutionary Algorithms for Model-Based Clustering Kampo, Regina S. January 2021 (has links) Cluster analysis is used to detect underlying group structure in data. Model-based clustering is the process of performing cluster analysis which involves the fitting of finite mixture models. However, parameter estimation in mixture model-based approaches to clustering is notoriously difficult. To this end, this thesis focuses on the development of evolutionary computation as an alternative technique for parameter estimation in mixture models. An evolutionary algorithm is proposed and illustrated on the well-established Gaussian mixture model with missing values. Next, the family of Gaussian parsimonious clustering models is considered, and an evolutionary algorithm is developed to estimate the parameters. Next, an evolutionary algorithm is developed for latent Gaussian mixture models and to facilitate the flexible clustering of high-dimensional data. For all models and families of models considered in this thesis, the proposed algorithms used for model-fitting and parameter estimation are presented and the performance illustrated using real and simulated data sets to assess the clustering ability of all models. This thesis concludes with a discussion and suggestions for future work. / Dissertation / Doctor of Philosophy (PhD) Evolutionary Algorithm Model-based Clustering EM Algorithm
276	Clustering Methods for Delineating Regions of Spatial Stationarity Collings, Jared M. 30 November 2007 (has links) (PDF) This paper seeks to further investigate data extracted by the use of Functional Magnetic Resonance Imaging (FMRI) as it is applied to brain tissue and how it measures blood flow to certain areas of the brain following the application of a stimulus. As a precursor to detailed spatial analysis of this kind of data, this paper develops methods of grouping data based on the necessary conditions for spatial statistical analysis. The purpose of this paper is to examine and develop methods that can be used to delineate regions of stationarity. One of the major assumptions used in spatial estimation is that the data field is homogeneous with respect to the mean and the covariance function. As such, any spatial estimation presupposes that these criteria are met. With respect to analyses that may be considered new or experimental, however, there is no evidence that these assumptions will hold. clustering FRMI spatial stationarity SFMRI Statistics and Probability
277	Assessment of aCGH Clustering Methodologies Baker, Serena F. 18 October 2010 (has links) (PDF) Array comparative genomic hybridization (aCGH) is a technique for identifying duplications and deletions of DNA at specific locations across a genome. Potential objectives of aCGH analysis are the identification of (1) altered regions for a given subject, (2) altered regions across a set of individuals, and (3) clinically relevant clusters of hybridizations. aCGH analysis can be particularly useful when it identifies previously unknown clusters with clinical relevance. This project focuses on the assessment of existing aCGH clustering methodologies. Three methodologies are considered: hierarchical clustering, weighted clustering of called aCGH data, and clustering based on probabilistic recurrent regions of alteration within subsets of individuals. Assessment is conducted first through the analysis of aCGH data obtained from patients with ovarian cancer and then through simulations. Performance assessment for the data analysis is based on cluster assignment correlation with clinical outcomes (e.g., survival). For each method, 1,000 simulations are summarized with Cohen's kappa coefficient, interpreted as the proportion of correct cluster assignments beyond random chance. Both the data analysis and the simulation results suggest that hierarchical clustering tends to find more clinically relevant clusters when compared to the other methods. Additionally, these clusters are composed of more patients who belong in the clusters to which they are assigned. array CGH hierarchical clustering WECCA Statistics and Probability
278	Easy to Find: Creating Query-Based Multi-Document Summaries to Enhance Web Search Qumsiyeh, Rani Majed 15 March 2011 (has links) (PDF) Current web search engines, such as Google, Yahoo!, and Bing, rank the set of documents S retrieved in response to a user query Q and display each document with a title and a snippet, which serves as an abstract of the corresponding document in S. Snippets, however, are not as useful as they are designed for, i.e., to assist search engine users to quickly identify results of interest, if they exist, without browsing through the documents in S, since they (i) often include very similar information and (ii) do not capture the main content of the corresponding documents. Moreover, when the intended information need specified in a search query is ambiguous, it is difficult, if not impossible, for a search engine to identify precisely the set of documents that satisfy the user's intended request. Furthermore, a document title retrieved by web search engines is not always a good indicator of the content of the corresponding document, since it is not always informative. All these design problems can be solved by our proposed query-based, web informative summarization engine, denoted Q-WISE. Q-WISE clusters documents in S, which allows users to view segregated document collections created according to the specific topic covered in each collection, and generates a concise/comprehensive summary for each collection/cluster of documents. Q-WISE is also equipped with a query suggestion module that provides a guide to its users in formulating a keyword query, which facilitates the web search and improves the precision and recall of the search results. Experimental results show that Q-WISE is highly effective and efficient in generating a high quality summary for each cluster of documents on a specific topic, retrieved in response to a Q-WISE user's query. The empirical study also shows that Q-WISE's clustering algorithm is highly accurate, labels generated for the clusters are useful and often reflect the topic of the corresponding clustered documents, and the performance of the query suggestion module of Q-WISE is comparable to commercial web search engines. clustering summarization query suggestion Computer Sciences
279	A Performance Comparison Of Clustering Algorithms In Ad Hocnetworks Yeung, Chun 01 January 2006 (has links) An ad hoc network is comprised of wireless mobile nodes without the need of wired network infrastructure. Due to the limited transmission range of nodes, the exchange of data between them may not be possible using direct communication. Partitioning the network into clusters and electing a clusterhead for each cluster to assist with the resource allocation and data packet transmissions among its members and neighboring clusterheads is one of the most common ways of providing support for the existing ad hoc routing protocols. This thesis presents the performance comparison of four ad hoc network clustering protocols: Dynamic Mobile Adaptive Clustering (DMAC), Highest-Degree and Lowest-ID algorithms, and Weighted Clustering Algorithm (WCA). Yet Another Extensible Simulation (YAES) was used as the simulator to carry out the simulations. Ad hoc Clustering Algortihm Computer Engineering Engineering
280	PV Hosting Analysis and Demand Response Selection for handling Modern Grid Edge Capability Abraham, Sherin Ann 27 June 2019 (has links) Recent technological developments have led to significant changes in the power grid. Increasing consumption, widespread adoption of Distributed Energy Resources (DER), installation of smart meters, these are some of the many factors that characterize the changing distribution network. These transformations taking place at the edge of the grid call for improved planning and operation practices. In this context, this thesis aims to improve the grid edge functionality by putting forth a method to address the problem of high demand during peak period by identifying customer groups for participation in demand response programs, which can lead to significant peak shaving for the utility. A possible demand response strategy for peak shaving makes use of Photovoltaic (PV) and Battery energy storage system (BESS). In the process, this work also examines the approach to computation of hosting capacity (HC) for small PV and quantifies the difference obtained in HC when a detailed Low voltage (LV) network is available and included in HC studies. Most PV hosting studies assess the impact on system feeders with aggregated LV loads. However, as more residential customers adopt rooftop solar, the need to include secondary network models in the analysis is studied by performing a comparative study of hosting capacity for a feeder with varying loading information available. / Master of Science / Today, with significant technological advancements, as we proceed towards a modern grid, a mere change in physical infrastructure will not be enough. With the changes in kinds of equipment installed on the grid, a wave of transformation has also begun to flow in the planning and operation practices for a smarter grid. Today, the edge of the grid where the customer is interfaced to the power system has become extremely complex. Customers can use rooftop solar PV to generate their own electricity, they are more informed about their consumption behavior due to installation of smart meters and also have options to integrate other technology like battery energy storage system and electric vehicles. Like with any good technology, adoption of these advancements in the system brings with itself a greater need for reform in operation and planning of the system. For instance, increasing installation of rooftop solar at the customer end calls for review of existing methods that determine the maximum level of PV deployment possible in the network without violating the operating conditions. So, in this work, a comparative study is done to review the PV hosting capacity of a network with varying levels of information available. And the importance of utilities to have secondary network models available is emphasized. With PV deployed in the system, enhanced demand response strategies can be formulated by utilities to tackle high demand during peak period. In a bid to identify customers for participation in such programs, in this work, a computationally efficient strategy is developed to identify customers with high demand during peak period, who can be incentivized to participate in demand response programs. With this, a significant peak shaving can be achieved by the utility, and in turn stress on the distribution network is reduced during peak hours. Smart grid AMI data clustering PV hosting

Search results