Global ETD Search

301	Clustering of nonstationary data streams: a survey of fuzzy partitional methods Abdullatif, Amr R.A., Masulli, F., Rovetta, S. 20 January 2020 (has links) Yes / Data streams have arisen as a relevant research topic during the past decade. They are real‐time, incremental in nature, temporally ordered, massive, contain outliers, and the objects in a data stream may evolve over time (concept drift). Clustering is often one of the earliest and most important steps in the streaming data analysis workflow. A comprehensive literature is available about stream data clustering; however, less attention is devoted to the fuzzy clustering approach, even though the nonstationary nature of many data streams makes it especially appealing. This survey discusses relevant data stream clustering algorithms focusing mainly on fuzzy methods, including their treatment of outliers and concept drift and shift. / Ministero dell‘Istruzione, dell‘Universitá e della Ricerca. Data streams Fuzzy clustering Nonstationary data Survey
302	A Data Clustering Approach to Support Modular Product Family Design Sahin, Asli 14 November 2007 (has links) Product Platform Planning is an emerging philosophy that calls for the planned development of families of related products. It is markedly different from the traditional product development process and relatively new in engineering design. Product families and platforms can offer a multitude of benefits when applied successfully such as economies of scale from producing larger volumes of the same modules, lower design costs from not having to redesign similar subsystems, and many other advantages arising from the sharing of modules. While advances in this are promising, there still remain significant challenges in designing product families and platforms. This is particularly true for defining the platform components, platform architecture, and significantly different platform and product variants in a systematic manner. Lack of precise definition for platform design assets in terms of relevant customer requirements, distinct differentiations, engineering functions, components, component interfaces, and relations among all, causes a major obstacle for companies to take full advantage of the potential benefits of product platform strategy. The main purpose of this research is to address the above mentioned challenges during the design and development of modular platform-based product families. It focuses on providing answers to a fundamental question, namely, how can a decision support approach from product module definition to the determination of platform alternatives and product variants be integrated into product family design? The method presented in this work emphasizes the incorporation of critical design requirements and specifications for the design of distinctive product modules to create platform concepts and product variants using a data clustering approach. A case application developed in collaboration with a tire manufacturer is used to verify that this research approach is suitable for reducing the complexity of design results by determining design commonalities across multiple design characteristics. The method was found helpful for determining and integrating critical design information (i.e., component dimensions, material properties, modularization driving factors, and functional relations) systematically into the design of product families and platforms. It supported decision-makers in defining distinctive product modules within the families and in determining multiple platform concepts and derivative product variants. / Ph. D. Modular Product Product Platform Design Data Clustering
303	Classification Analysis for Environmental Monitoring: Combining Information across Multiple Studies Zhang, Huizi 29 September 2006 (has links) Environmental studies often employ data collected over large spatial regions. Although it is convenient, the conventional single model approach may fail to accurately describe the relationships between variables. Two alternative modeling approaches are available: one applies separate models for different regions; the other applies hierarchical models. The separate modeling approach has two major difficulties: first, we often do not know the underlying clustering structure of the entire data; second, it usually ignores possible dependence among clusters. To deal with the first problem, we propose a model-based clustering method to partition the entire data into subgroups according to the empirical relationships between the response and the predictors. To deal with the second, we propose Bayesian hierarchical models. We illustrate the use of the Bayesian hierarchical model under two situations. First, we apply the hierarchical model based on the empirical clustering structure. Second, we integrate the model-based clustering result to help determine the clustering structure used in the hierarchical model. The nature of the problem is classification since the response is categorical rather than continuous and logistic regression models are used to model the relationship between variables. / Ph. D. Environmental studies Clustering Hierarchical Model Classification
304	Detecting code duplications in the NPM community Liu, Hanwen 09 September 2021 (has links) In the modern software development process, it has become a very mainstream practice to build software projects on top of third-party packages to simplify the development process. In this development method, it is quite common to copy existing code or files in other libraries instead of making regular calls. Although this approach can reduce the project's dependence on other libraries and make the project more streamlined, it also causes difficulties in maintenance and understanding. The ignorance of code duplication by third-party library community can even be exploited for malicious purpose, such as typo-squatting attack. This paper serves as a starting point to analyze the growing code duplication issues surrounding third-party open source packages, and what is the root cause of code duplication. In this paper, I conducted code duplication-related research based on some popular packages in the third-party open source packages community, the NPM community, by using the tokenizer tool and the code comparison tool to compute the code similarity, quantitatively analyzed the prevalence of code duplication in the NPM community, and did some related experiments based on this similarity. In the experiments, I found that code duplication is very common in NPM community: 17.1% of all the files have 1-93 similar file in other package when the threshold of similar file is set to 0.5. 29.3% of all the packages has at least one "similar package" when the threshold of similar package is set to 0.5. In all the 951 similar package pairs, 33.9% of them, 323 package pairs comes from the same domain. The ultimate goal of this paper is to promote the awareness of the commonness and the importance of code duplication in the third-party package community and the reasonable use of code duplication by developers in the project development. / In the modern software development process, developers often call other people's completed code to build their own programs. There are generally two ways to do this: indirectly call other people's code through "import" or similar instructions in the program, or directly copy and paste other people's code and make slight modifications. The second method can make the program more independent and easy to use, but the code duplication problem caused by this method also has great security risks.This paper serves as a starting point to analyze the growing code duplication issues, and what is the root cause of code duplication. In this paper, I conducted code duplication-related research based on some popular code packages in the NPM community.I used some tools to compute a value to define how different codes are similar to each other, quantitatively analyzed the prevalence of code duplication in the NPM community, and did some related experiments based on this similarity. In the experiments, I found that code duplication is very common in the NPM community: 17.1% of all the files have 1-93 similar file in other package, and 29.3% of all the package have at least one "similar package", when the definition of similar files and packages are not that "strict".In all the 951 similar package pairs, 33.9% of them, 323 package pairs comes from the same domain. The ultimate goal of this paper is to promote the awareness of the commonness and the importance of code duplication in the third-party package community and the reasonable use of code duplication by developers in the project development. Code duplication NPM Clustering Code Similarity
305	Quantifying and Mapping Spatial Variability in Simulated Forest Plots Corral, Gavin Richard 11 December 2015 (has links) Spatial analysis is of primary importance in forestry. Many factors that affect tree development have spatial components and can be sampled across geographic space. Some examples of spatially structured factors that affect tree growth include soil composition, water availability, and growing space. Our goals for this dissertation were to test the efficacy of spatial analysis tools in a forestry setting and make recommendations for their use. Reliable spatial analysis tools will lead to more effective statistical testing and can lead to useful mapping of spatial patterns. The data for this project is from simulated even aged loblolly pine stands (Pinus taeda L.). These simulated stands are grown at regular spacing and we impose a range of parameters on the stands to simulate many possible scenarios. In chapter 3 of this dissertation we perform a sensitivity analysis to determine if our methods are suitable for further research and applications. In chapter 4 we perform our analysis on more realistic data generated by a spatially-explicit stand simulator, PTAEDA 4.1. In chapter 3 we performed a statistical simulation of plantation stands without effects of competition and mortality. We used redundancy analysis (RDA) to quantify spatial variability, partial redundancy analysis (pRDA) to test for spatial dependence, and spatially constrained cluster analysis to map soil productivity. Our results indicated that RDA and pRDA are reliable methods and future evaluation is appropriate. The results from the spatially constrained cluster analysis were less clear. The success or failure of the clustering algorithm could not be disentangled from the success or failure of the selection criterion used to predict the number of clusters. Further investigations should address this concern. In chapter 4 we used PTAEDA 4.1, a loblolly stand simulator, to simulate a range of site conditions and produce data that we could use for analysis. The results showed that RDA and pRDA were not reliable methods and ready for the field. Spatially constrained cluster analysis performed poorly when more realistic data was used and because of this further use was uncertain. It was clear from the results that levels of variation and spatial pattern complexity of microsites influenced the success rate of the methods. Both RDA and pRDA were less successful with higher levels of variation in the data and with increased spatial pattern complexity. In chapter 5 we related the coefficient of variation from our simulations in (chapters 3 and 4) to two sets of real plot data, including a clonal set and open pollinated set. We then implemented a spatial analysis of the real plot data. Our spatial analysis results of the two comparable data sets were unaffected by genetic variability indicating that the primary source of variability across plots appears to be soil and other factors, not genetic related. / Ph. D. Forestry Simulations Redundancy Clustering Partitioning Loblolly
306	Generating Random Graphs with Tunable Clustering Coefficient Parikh, Nidhi Kiranbhai 29 April 2011 (has links) Most real-world networks exhibit a high clustering coefficient— the probability that two neighbors of a node are also neighbors of each other. We propose four algorithms CONF-1, CONF-2, THROW-1, and THROW-2 which are based on the configuration model and that take triangle degree sequence (representing the number of triangles/corners at a node) and single-edge degree sequence (representing the number of single-edges/stubs at a node) as input and generate a random graph with a tunable clustering coefficient. We analyze them theoretically and empirically for the case of a regular graph. CONF-1 and CONF-2 generate a random graph with the degree sequence and the clustering coefficient anticipated from the input triangle and single-edge degree sequences. At each time step, CONF-1 chooses each node for creating triangles or single edges with the same probability, while CONF-2 chooses a node for creating triangles or single edge with a probability proportional to their number of unconnected corners or unconnected stubs, respectively. Experimental results match quite well with the anticipated clustering coefficient except for highly dense graphs, in which case the experimental clustering coefficient is higher than the anticipated value. THROW-2 chooses three distinct nodes for creating triangles and two distinct nodes for creating single edges, while they need not be distinct for THROW-1. For THROW-1 and THROW-2, the degree sequence and the clustering coefficient of the generated graph varies from the input. However, the expected degree distribution, and the clustering coefficient of the generated graph can also be predicted using analytical results. Experiments show that, for THROW-1 and THROW-2, the results match quite well with the analytical results. Typically, only information about degree sequence or degree distribution is available. We also propose an algorithm DEG that takes degree sequence and clustering coefficient as input and generates a graph with the same properties. Experiments show results for DEG that are quite similar to those for CONF-1 and CONF-2. / Master of Science Clustering coefficient complex networks random graphs algorithms
307	Efficient Community Detection for Large Scale Networks via Sub-sampling Bellam, Venkata Pavan Kumar 18 January 2018 (has links) Many real-world systems can be represented as network-graphs. Some of the networks have an inherent community structure based on interactions. The problem of identifying this grouping structure given a graph is termed as community detection problem which has certain existing algorithms. This thesis contributes by providing specific improvements to various community detection algorithms such as spectral clustering and extreme point algorithm. One of the main contributions is proposing a new sub-sampling method to make existing spectral clustering method scalable by reducing the computational complexity. Also, we have implemented extreme points algorithm for a general multiple communities detection case along with a sub-sampling based version to reduce the computational complexity. We have also developed spectral clustering algorithm for popularity-adjusted block model (PABM) model based graphs to make the algorithm exact thus improving its accuracy. / Master of Science / We live in an increasingly interconnected world, where agents constantly interact with each other. This general agent-interaction framework describes many important systems, such as social interpersonal systems, protein interaction systems, trade and financial systems, power grids, and the World Wide Web, to name a few. By denoting agents as nodes and their interconnections as links, any such system can be represented as a network. Such networks or graphs provide a powerful and universal representation for analyzing a wide variety of systems spanning a remarkable range of scientific disciplines. Networks act as conduits for many kinds of transmissions. For instance, they are influential in the dissemination of ideas, adoption of technologies, helping find jobs and spread of diseases. Thus networks play a critical role both in providing information and helping make decisions making them a crucial part of the Data and Decisions Destination Area. A well-known feature of many networks is community structure. Nodes in a network are often found to belong to groups or communities that exhibit similar behavior. The identification of this community structure, called community detection, is an important problem with many critical applications. For example, communities in a protein interaction network often correspond to functional groups. This thesis focuses on cutting-edge methods for community detection in networks. The main approach is efficient community detection via sub-sampling. This is applied to two different approaches. The first approach is optimization of a modularity function using a low-rank approximation for multiple communities. The second approach is a spectral clustering where we aim to formulate an algorithm for community detection by exploiting the eigenvectors of the network adjacency matrix. Spectral clustering Extreme points Sub-sampling PABM
308	Investigation of molecular and mesoscale clusters in undersaturated glycine aqueous solutions Zimbitas, G., Jawor-Baczynska, A., Vesga, M.J., Javid, Nadeem, Moore, B.D., Parkinson, J., Sefcik, J. 08 August 2019 (has links) Yes / In this work DLS, NTA, SAXS and NMR were used to investigate populations, size distributions and structure of clusters in undersaturated aqueous solutions of glycine. Molecular and colloidal scale (mesoscale) clusters with radii around 0.3-0.5 nm and 100–150 nm, respectively, were observed using complementary experimental techniques. Molecular clusters are consistent with hydrated glycine dimers present in equilibrium with glycine monomers in aqueous solutions. Mesoscale clusters previously observed in supersaturated glycine solutions appear to be indefinitely stable, in mutual equilibrium within mesostructured undersaturated solutions across all glycine concentrations investigated here, down to as low as 1 mg/g of water. / Supported by EPSRC funding via the SynBIM project (Grant Reference EP/P0068X/1) and by the Synchrotron SOLEIL. Clustering Mesostructured liquid phase Glycine Scattering
309	Overlapped schedules with centralized clustering for wireless sensor networks Ammar, Ibrahim A.M., Miskeen, Guzlan M.A., Awan, Irfan U. January 2013 (has links) No / The main attributes that have been used to conserve the energy in wireless sensor networks (WSNs) are clustering, synchronization and low-duty-cycle operation. Clustering is an energy efficient mechanism that divides sensor nodes into many clusters. Clustering is a standard approach for achieving energy efficient and hence extending the network lifetime. Synchronize the schedules of these clusters is one of the primary challenges in WSNs. Several factors cause the synchronization errors. Among them, clock drift that is accommodated at each hop over the time. Synchronization by means of scheduling allows the nodes to cooperate and transmit data in a scheduled manner under the duty cycle mechanism. Duty cycle is the approach to efficiently utilize the limited energy supplies for the sensors. This concept is used to reduce idle listening. Duty cycle, nodes clustering and schedules synchronization are the main attributes we have considered for designing a new medium access control (MAC) protocol. The proposed OLS-MAC protocol designed with the target of making the schedules of the clusters to be overlapped with introducing a small shift time between the adjacent clusters schedules to compensate the clock drift. The OLS-MAC algorithm is simulated in NS-2 and compared to some S-MAC derived protocols. We verified that our proposed algorithm outperform these protocols in number of performance matrix. Duty cycle Overlapping Clustering Synchronization Clock drift
310	The biometric characteristics of a smile Ugail, Hassan, Aldahoud, Ahmad 20 March 2022 (has links) No / Facial expressions have been studied looking for its diagnostic capabilities in mental health and clues for longevity, gender and other such personality traits. The use of facial expressions, especially the expression of smile, as a biometric has not been looked into great detail. However, research shows that a person can be identified from their behavioural traits including their emotional expressions. In this Chapter, we discuss a novel computational biometric model which can be derived from the smile expression. We discuss how the temporal components of a smile can be utilised to show that similarities in the smile exist for an individual and it can be enabled to create a tool which can be utilised as a biometric. Smile biometrics Smile dynamics Smile intervals Clustering

Search results