Global ETD Search

91	Incorporating Physical Information into Clustering for FPGAs Chen, Doris Tzu Lang January 2007 (has links) The traditional approach to FPGA clustering and CLB-level placement has been shown to yield significantly worse overall placement quality than approaches which allow BLEs to move during placement. In practice, however, modern FPGA architectures require computationally-expensive Design Rule Checks (DRC) which render BLE-level placement impractical. This thesis research addresses this problem by proposing a novel clustering framework that produces better initial clusters that help to reduce the dependence on BLE-level placement. The work described in this dissertation includes: (1) a comparison of various clustering algorithms used for FPGAs, (2) the introduction of a novel hybridized clustering framework for timing-driven FPGA clustering, (3) the addition of physical information to make better clusters, (4) a comparison of the implemented approaches to known clustering tools, and (5) the implementation and evaluation of cluster improvement heuristics. The proposed techniques are quantified across accepted benchmarks and show that the implemented DPack produces results with 16% less wire length, 19% smaller minimum channel widths, and 8% less critical delay, on average, than known academic tools. The hybridized approach, HDPack, is found to achieve 21% less wire length, 24% smaller minimum channel widths, and 6% less critical delay, on average. FPGA computer-aided design clustering
92	Dissimilarity Plots. A Visual Exploration Tool for Partitional Clustering. Hahsler, Michael, Hornik, Kurt January 2009 (has links) (PDF) For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this paper we extend (dissimilarity) matrix shading with several reordering steps based on seriation. Both methods, matrix shading and seriation, have been well-known for a long time. However, only recent algorithmic improvements allow to use seriation for larger problems. Furthermore, seriation is used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is independent of the dimensionality of the data. A big advantage is that it presents the structure between clusters and the micro-structure within clusters in one concise plot. This not only allows for judging cluster quality but also makes mis-specification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples. / Series: Research Report Series / Department of Statistics and Mathematics
93	Real-time data stream clustering over sliding windows Badiozamany, Sobhan January 2016 (has links) In many applications, e.g. urban traffic monitoring, stock trading, and industrial sensor data monitoring, clustering algorithms are applied on data streams in real-time to find current patterns. Here, sliding windows are commonly used as they capture concept drift. Real-time clustering over sliding windows is early detection of continuously evolving clusters as soon as they occur in the stream, which requires efficient maintenance of cluster memberships that change as windows slide. Data stream management systems (DSMSs) provide high-level query languages for searching and analyzing streaming data. In this thesis we extend a DSMS with a real-time data stream clustering framework called Generic 2-phase Continuous Summarization framework (G2CS). G2CS modularizes data stream clustering by taking as input clustering algorithms which are expressed in terms of a number of functions and indexing structures. G2CS supports real-time clustering by efficient window sliding mechanism and algorithm transparent indexing. A particular challenge for real-time detection of a high number of rapidly evolving clusters is efficiency of window slides for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH. To that end, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM). To further improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing where indexing structures suitable for the clustering algorithms can be plugged-in.
94	A Risk-Oriented Clustering Approach for Asset Categorization and Risk Measurement Liu, Lu 18 July 2019 (has links) When faced with market risk for investments and portfolios, people often calculate the risk measure, which is a real number mapping to each random payoff. There are many ways to quantify the potential risk, among which the most important input is the features from future performance. Future distributions are unknown and thus always estimated from historical Profit and Loss (P&L) distributions. However, past data may not be appropriate for estimating the future; risk measures generated from single historical distributions can be subject to error. To overcome these shortcomings, one natural way implemented is to identify and categorize similar assets whose Profit and Loss distributions can be used as alternative scenarios. In practice, one of the most common and intuitive categorizations is sector, based on industry. It is widely agreed that companies in the same sector share the same, or related, business types and operating characteristics. But in the field of risk management, sector-based categorization does not necessarily mean assets are grouped in terms of their risk profiles, and we show that risk measures in the same sector tend to have large variation. Although improved risk measures related to the distribution ambiguity has been discussed at length, we seek to develop a more risk-oriented categorization by providing a new clustering approach. Furthermore, our method can better inform us of the potential risk and the extreme worst-case scenario within the same category. clustering risk measures Wasserstein distance
95	Digging deeper into clustering and covering problems Bandyapadhyay, Sayan 01 May 2019 (has links) Clustering problems often arise in the fields like data mining, machine learning and computational biology to group a collection of objects into similar groups with respect to a similarity measure. For example, clustering can be used to group genes with related expression patterns. Covering problems are another important class of problems, where the task is to select a subset of objects from a larger set, such that the objects in the subset "cover" (or contain) a given set of elements. Covering problems have found applications in various fields including wireless and sensor networks, VLSI, and image processing. For example, covering can be used to find placement locations of the minimum number of mobile towers to serve all the customers of a region. In this dissertation, we consider an interesting collection of geometric clustering and covering problems, which are modeled as optimization problems. These problems are known to be $\mathsf{NP}$-hard, i.e. no efficient algorithms are expected to be found for these problems that return optimal solutions. Thus, we focus our effort in designing efficient approximation algorithms for these problems that yield near-optimal solutions. In this work, we study three clustering problems: $k$-means, $k$-clustering and Non-Uniform-$k$-center and one covering problem: Metric Capacitated Covering. $k$-means is one of the most studied clustering problems and probably the most frequently used clustering problem in practical applications. In this problem, we are given a set of points in an Euclidean space and we want to choose $k$ center points from the same Euclidean space. Each input point is assigned to its nearest chosen center, and points assigned to a center form a cluster. The cost per input point is the square of its distance from its nearest center. The total cost is the sum of the costs of the points. The goal is to choose $k$ center points so that the total cost is minimized. We give a local search based algorithm for this problem that always returns a solution of cost within $(1+\eps)$-factor of the optimal cost for any $\eps > 0$. However, our algorithm uses $(1+\eps)k$ center points. The best known approximation before our work was about 9 that uses exactly $k$ centers. The result appears in Chapter \ref{sec:kmeanschap}. $k$-clustering is another popular clustering problem studied mainly by the theory community. In this problem, each cluster is represented by a ball in the input metric space. We would like to choose $k$ balls whose union contains all the input points. The cost of each ball is its radius to the power $\alpha$ for some given paramater $\alpha \ge 1$. The total cost is the sum of the costs of the chosen $k$ balls. The goal is to find $k$ balls such that the total cost is minimized. We give a probabilistic metric partitioning based algorithm for this problem that always returns a solution of cost within $(1+\eps)$-factor of the optimal cost for any $\eps > 0$. However, our algorithm uses $(1+\eps)k$ balls, and the running time is quasi-polynomial. The best known approximation in polynomial time is $c^{\alpha}$ that uses exactly $k$ balls, where $c$ is a constant. The result appears in Chapter \ref{sec:kcluster}. Non-Uniform-$k$-center is another clustering problem, which was posed very recently. Like in $k$-clustering here also each cluster is represented by a ball. Additionally, we are given $k$ integers $r_1,\ldots,r_k$, and we want to find the minimum dilation $\alpha$ and choose $k$ balls with radius $\alpha\cdot r_i$ for $1\le i\le k$ whose union contains all the input points. This problem is known to be notoriously hard. No approximation is known even in the special case when $r_i$'s belong to a set of three integers. We give an LP rounding based algorithm for this special case that always returns a solution of cost within a constant factor of the optimal cost. However, our algorithm uses $(2+\eps)k$ balls for some constant $\epsilon$. We also show that this special case can be solved in polynomial time under a practical assumption. Moreover, we prove that the Euclidean version of the problem is also as hard as the general version. These results appear in Chapter \ref{sec:nukc}. Capacitated Covering is a generalization of the classical set cover problem. In the Metric Capacitated Covering problem, we are given a set of balls and a set of points in a metric space. Additionally, we are given an integer that is referred to as the capacity. The goal is to find a minimum subset of the input set of balls, such that each point can be assigned to the chosen balls in a manner so that the number of points assigned to each ball is bounded by the capacity. We give an LP rounding based algorithm for this problem that always returns a solution of cost within a constant factor of the optimal cost. However, we assume that we are allowed to expand the balls by a fairly small constant. If no expansion is allowed, then the problem is known to not admit any constant approximation. We discuss our findings in Chapter \ref{sec:capa}. As mentioned above, for many of the problems we consider, we obtain results that improve the best known approximation bounds. Our findings make significant progress towards better understanding the internals of these problems, which have impact across the disciplines. Also, during the course of our work, we have designed tools and techniques, which might be of independent interest for solving similar optimization problems. Finally, in Chapter \ref{sec:conclude}, we conclude our discussion and pose some open questions, which we consider as our potential future work. Approximation Clustering Covering Inapproximability Optimization
96	Exploring Age-Related Metamemory Differences Using Modified Brier Scores and Hierarchical Clustering Parlett, Chelsea 17 May 2019 (has links) Older adults (OAs) typically experience memory failures as they age. However, with some exceptions, studies of OAs’ ability to assess their own memory functions– Metamemory (MM)– find little evidence that this function is susceptible to age-related decline. Our study examines OAs’ and young adults’ (YAs) MM performance and strategy use. Groups of YAs (N = 138) and OAs (N = 79) performed a MM task that required participants to place bets on how likely they were to remember words in a list. Our analytical approach includes hierarchical clustering, and we introduce a new measure of MM—the modified Brier—in order to adjust for di↵erences in scale usage between participants. Our data indicate that OAs and YAs di↵er in the strategies they use to assess their memory and in how well their MM matches with memory performance. However, there was no evidence that the chosen strategies were associated with di↵erences in MM match, indicating that there are multiple strategies that might be e↵ective (i.e. lead to similar match) in this MM task. Metamemory Aging Clustering Cognitive Psychology
97	A Confidence-based Hierarchical Word Clustering for Document Classification Yin, Kai-Tai 09 August 2007 (has links) We propose a novel feature reduction approach to group words hierarchically into clusters which can then be used as new features for document classification. Initially, each word constitutes a cluster. We calculate the mutual confidence between any two different words. The pair of clusters containing the two words with the highest mutual confidence are combined into a new cluster. This process of merging is iterated until all the mutual confidences between the un-processed pair of words are smaller than a predefined threshold or only one cluster exists. In this way, a hierarchy of word clusters is obtained. The user can decide the clusters, from a certain level, to be used as new features for document classification. Experimental results have shown that our method can perform better than other methods. Classification Word Clustering Confidence Hierarchical
98	Effect of pendant distribution on the dispersancy of maleated ethylene propylene Araya, Andrea 27 September 2011 (has links) This study describes how changes made to the modification of a polyolefin affect the solution properties of these modified polyolefins in apolar solvents. The modified polyolefins of interest are maleated ethylene-propylene random copolymers (EP-MAH) reacted with N-phenyl-p-phenylenediamine (NP3D) to yield NP3D-EP-MAH. NP3D-EP-MAH is used as a dispersant by the oil-additive industry and solution properties such as self-aggregation, rheological behaviour, and its efficiency at stabilizing carbon black particles (CBPs) were investigated. The maleation of the polyolefin was characterized in terms of succinic anhydride (SAH) content and level of SAH clustering along the polymer backbone by FT-IR and UV-Vis absorption and steady-state and time-resolved fluorescence. The self-aggregation of the modified polyolefins was characterized in hexane by replacing NP3D with 1-pyrenemethylamine and using fluorescence to probe excimer formation between an excited and a ground-state pyrene. The rheological behaviour exhibited by the solutions of modified polyolefins was characterized from the viscosity profiles of the solutions obtained as a function of polymer concentration. Finally, the adsorption of the modified polyolefins onto CBPs was characterized by analysis of Langmuir isotherms, which yields both the equilibrium constant and the maximum coverage for the binding of the modified polyolefins onto CBPs. The conclusions reached in this thesis are that clustering of the SAH pendants along the EP backbone enhances the ability of the modified polyolefin to self-aggregate in apolar solution. In turn, self-aggregation led to enhanced thickening of the NP3D-EP-MAH solutions and stronger adsorption onto CBPs. This thesis establishes how the level of SAH clustering affects self-association and establishes its consequence on the rheological properties and adsorption isotherms of NP3D-EP-MAH samples in apolar solvents. SAH clustering effect Adsorption Chemistry
99	A New Measure For Clustering Model Selection McCrosky, Jesse January 2008 (has links) A new method for determining the number of k-means clusters in a given data set is presented. The algorithm is developed from a theoretical perspective and then its implementation is examined and compared to existing solutions. Clustering Model Selection Computer Science
100	A new framework for clustering Zhou, Wu January 2010 (has links) The difficulty of clustering and the variety of clustering methods suggest the need for a theoretical study of clustering. Using the idea of a standard statistical framework, we propose a new framework for clustering. For a well-defined clustering goal we assume that the data to be clustered come from an underlying distribution and we aim to find a high-density cluster tree. We regard this tree as a parameter of interest for the underlying distribution. However, it is not obvious how to determine a connected subset in a discrete distribution whose support is located in a Euclidean space. Building a cluster tree for such a distribution is an open problem and presents interesting conceptual and computational challenges. We solve this problem using graph-based approaches and further parameterize clustering using the high-density cluster tree and its extension. Motivated by the connection between clustering outcomes and graphs, we propose a graph family framework. This framework plays an important role in our clustering framework. A direct application of the graph family framework is a new cluster-tree distance measure. This distance measure can be written as an inner product or kernel. It makes our clustering framework able to perform statistical assessment of clustering via simulation. Other applications such as a method for integrating partitions into a cluster tree and methods for cluster tree averaging and bagging are also derived from the graph family framework. clustering cluster tree distance Statistics

Search results