Global ETD Search

171	High Level Debugging Techniques for Modern Verification Flows Poulos, Zissis Paraskevas 04 July 2014 (has links) Early closure to functional correctness of the final chip has become a crucial success factor in the semiconductor industry. In this context, the tedious task of functional debugging poses a significant bottleneck in modern electronic design processes, where new problems related to debugging are constantly introduced and predominantly performed manually. This dissertation proposes methodologies that address two emerging debugging problems in modern design flows. First, it proposes a novel and automated triage framework for Register-Transfer-Level (RTL) debugging. The proposed framework employs clustering techniques to automate the grouping of a plethora of failures that occur during regression verification. Experiments demonstrate accuracy improvements of up to 40% compared to existing triage methodologies. Next, it introduces new techniques for Field Programmable Gate Array (FPGA) debugging that leverage reconfigurability to allow debugging to operate without iterative executions of computationally-intensive design re-synthesis tools. Experiments demonstrate productivity improvements of up to 30 x vs. conventional approaches. RTL Debugging FPGA Triage Clustering 0544
172	An Improved Regional Honey Production Model for the United States Trimboli, Anthony B 01 April 2017 (has links) Currently three systems are used to categorize honey production regions in the United States, one from the United States Department of Agriculture, one from the American Bee Journal used for its monthly U.S. Honey Crop and Markets report, and one from Bee Culture’s monthly regional honey price report. These systems follow political state boundaries and are based upon climate, bee forage, and regional beekeeping practices. While these systems are popular with the general beekeeping community, to our knowledge, their accuracy has not been studied. Although differing geographic regions can vary in bee forage species availability, states with similar geography and flora should have similar honey production. This is not the case because states within the same honey production region vary in honey production, possibly due to smaller ecotype divisions within the larger honey production regions. Due to this ecotype gradient, some models divide the United States into far more regions based upon ecotypes and disregard political boundaries. While a model based on ecotypes that disregard state political boundaries may be more accurate, it is not currently possible to statistically evaluate them due to how honey production data are collected. This study developed nine novel regional honey production models that regard political boundaries while attempting to satisfy ecotype similarity. The first four alternative models are based solely on Level II ecoregions and were developed by a best fit manual approach that minimized the number of ecoregions per honey production region. The five remaining models were created using statistical k-means partitioning cluster analysis and are purely data based. Also discussed is a linear regression model produced by Page et al. Differences within and between the models were analyzed using descriptive statistics and ANOVA in order to determine an improved model that describes regional honey production in the United States. Many of the models, both preexisting and those developed for this study, had insignificant means and are not viable. Of those that had significant means, a k-means cluster based model was determined to be the statistically superior model and can be considered an improved regional honey production model for the United States. Clustering Ecoregion Ecology and Evolutionary Biology Entomology
173	Voting in clustering and finding the number of clusters Dimitriadou, Evgenia, Weingessel, Andreas, Hornik, Kurt January 1999 (has links) (PDF) In this paper we present an unsupervised algorithm which performs clustering given a data set and which can also find the number of clusters existing in it. This algorithm consists of two techniques. The first, the voting technique, allows us to combine several runs of clustering algorithms, with the number of clusters predefined, resulting in a common partition. We introduce the idea that there are cases where an input point has a structure with a certain degree of confidence and may belong to more than one cluster with a certain degree of "belongingness". The second part consists of an index measure which receives the results of every voting process for diffrent number of clusters and makes the decision in favor of one. This algorithm is a complete clustering scheme which can be applied to any clustering method and to any type of data set. Moreover, it helps us to overcome instabilities of the clustering algorithms and to improve the ability of a clustering algorithm to find structures in a data set. / Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
174	Contributions to Gene Set Analysis of Correlated, Paired-Sample Transcriptome Data to Enable Precision Medicine Schissler, Alfred Grant, Schissler, Alfred Grant January 2017 (has links) This dissertation serves as a unifying document for three related articles developed during my dissertation research. The projects involve the development of single-subject transcriptome (i.e. gene expression data) methodology for precision medicine and related applications. Traditional statistical approaches are largely unavailable in this setting due to prohibitive sample size and lack of independent replication. This leads one to rely on informatic devices including knowledgebase integration (e.g., gene set annotations) and external data sources (e.g., estimation of inter-gene correlation). Common statistical themes include multivariate statistics (such as Mahalanobis distance and copulas) and large-scale significance testing. Briefly, the first work describes the development of clinically relevant single-subject metrics of gene set (pathway) differential expression, N-of-1-pathways Mahalanobis distance (MD) scores. Next, the second article describes a method which overcomes a major shortcoming of the MD framework by accounting for inter-gene correlation. Lastly, the statistics developed in the previous works are re-purposed to analyze single-cell RNA-sequencing data derived from rare cells. Importantly, these works represent an interdisciplinary effort and show that creative solutions for pressing issues become possible at the intersection of statistics, biology, medicine, and computer science. Clustering Gene set Multivariate Paired-sample RNA
175	Spatio-temporal and neighborhood characteristics of two dengue outbreaks in two arid cities of Mexico. Reyes-Castro, Pablo A, Harris, Robin B, Brown, Heidi E, Christopherson, Gary L, Ernst, Kacey C 03 1900 (has links) Little is currently known about the spatial-temporal dynamics of dengue epidemics in arid areas. This study assesses dengue outbreaks that occurred in two arid cities of Mexico, Hermosillo and Navojoa, located in northern state of Sonora. Laboratory confirmed dengue cases from Hermosillo (N=2730) and Navojoa (N=493) were geocoded by residence and assigned neighborhood-level characteristics from the 2010 Mexican census. Kernel density and Space-time cluster analysis was performed to detect high density areas and space-time clusters of dengue. Ordinary Least Square regression was used to assess the changing socioeconomic characteristics of cases over the course of the outbreaks. Both cities exhibited contiguous patterns of space-time clustering. Initial areas of dissemination were characterized in both cities by high population density, high percentage of occupied houses, and lack of healthcare. Future research and control efforts in these regions should consider these space-time and socioeconomic patterns. Dengue outbreak Space-time clustering Socioeconomic factors
176	Construction and Analysis of Vector Space Models for Use in Aspect Mining Tribbey, William 01 January 2011 (has links) A legacy software system can be taken to consist of N methods which contain within their implementations the intended activities and functions of the system. These activities and functions are referred to as concerns. Some of these concerns are typically implemented and used in multiple methods throughout the system and these are deemed to be crosscutting concerns. Through the use of an aspect-oriented programming paradigm, the implementation and use of these crosscutting concerns can be abstracted into aspects. In order to refactor the system, the process of aspect mining is carried out to identify the crosscutting concerns in the software system. Once identified, the crosscutting concerns can then be refactored into aspects. Clustering-based aspect mining techniques make use of a vector space model to represent the source code to be mined. In this investigation, the individual methods of the software system were represented by a d-dimensional vector by mapping a method M to the vector V where the components of the vector V were values derived from applying a source code metric to each method M. These vector space models were then processed through the k-means++ clustering algorithm and the resulting cluster configurations were then evaluated to assess the quality of the results with respect to the identification of crosscutting concerns. This research studied the effect that the number of dimensions of a vector space model has on the results of a clustering-based aspect mining algorithm. Several vector space models were defined and principal component analysis was used to reduce the dimensionality of the models. Each of the models was processed multiple times through the aspect mining algorithm and the distributions of the collected measures were tested for statistically significant differences using the Wilcoxon rank sum test. The results indicate that changes in the number of dimensions of a vector space model can produce significant effects in the collected measures. In addition, the measures used to assess the performance of an aspect mining process need to be analyzed for underlying relationships. aspect clustering concern crosscutting mining Computer Sciences
177	The Distance to a University and Regional Output : A Study of how Distance to a University Impacts the Economic Productivity of a Municipality Hovander, Sebastian January 2016 (has links) The Swedish population is rapidly increasing in educational level in the past two decades and educational level has long been a topic of interest for labor productivity. This increase in educational level brings up an interesting discussion of whether the remoteness of a university helps create productivity and if so by how much. This is a study that will try and explain the impact on regional productivity by having a university closer, using the distance to the closest university of each municipality in Sweden, and depending on what quality this university possess. Using simple OLS regressions results have shown some reasons for increased productivity, either positive or negative, while distance showed to not matter for regional productivity at all. This field is somewhat untouched, and with further research and by including other geographical economic theories, it could become an interesting study. Clustering Human Capital Spillover Effect Growth
178	Diseño e implementación de algoritmos aproximados de clustering balanceado en PSO Lai, Chun-Hau January 2012 (has links) Magíster en Ciencias, Mención Computación / Este trabajo de tesis está dedicado al diseño e implementación de algoritmos aproximados que permiten explorar las mejores soluciones para el problema de Clustering Balanceado, el cual consiste en dividir un conjunto de n puntos en k clusters tal que cada cluster tenga como m ́ınimo ⌊ n ⌋ puntos, k y éstos deben estar lo más cercano posible al centroide de cada cluster. Estudiamos los algoritmos existentes para este problema y nuestro análisis muestra que éstos podrían fallar en entregar un resultado óptimo por la ausencia de la evaluación de los resultados en cada iteración del algoritmo. Entonces, recurrimos al concepto de Particles Swarms, que fue introducido inicialmente para simular el comportamiento social humano y que permite explorar todas las posibles soluciones de manera que se aproximen a la óptima rápidamente. Proponemos cuatro algoritmos basado en Particle Swarm Optimization (PSO): PSO-Hu ́ngaro, PSO-Gale-Shapley, PSO-Aborci ́on-Punto-Cercano y PSO-Convex-Hull, que aprovechan la característica de la generación aleatoria de los centroides por el algoritmo PSO, para asignar los puntos a estos centroides, logrando una solución más aproximada a la óptima. Evaluamos estos cuatro algoritmos con conjuntos de datos distribuidos en forma uniforme y no uniforme. Se encontró que para los conjuntos de datos distribuidos no uniformemente, es impredecible determinar cuál de los cuatro algoritmos propuestos llegaría a tener un mejor resultado de acuerdo al conjunto de métricas (intra-cluster-distancia, índice Davies-Doublin e índice Dunn). Por eso, nos concentramos con profundidad en el comportamiento de ellos para los conjuntos de datos distribuidos en forma uniforme. Durante el proceso de evaluación se descubrió que la formación de los clusters balanceados de los algoritmos PSO-Absorcion-Puntos-Importantes y PSO-Convex-Hull depende fuertemente del orden con que los centroides comienzan a absorber los puntos más cercanos. En cambio, los algoritmos PSO-Hungaro y PSO-Gale-Shapley solamente dependen de los centroides generados y no del orden de los clusters a crear. Se pudo concluir que el algoritmo PSO-Gale-Shapley presenta el rendimiento menos bueno para la creación de clusters balanceados, mientras que el algoritmo PSO-Hungaro presenta el rendimiento más eficiente para lograr el resultado esperado. Éste último está limitado al tamaño de los datos y la forma de distribución. Se descubrió finalmente que, para los conjuntos de datos de tamaños grandes, independiente de la forma de distribución, el algoritmo PSO-Convex-Hull supera a los demás, entregando mejor resultado según las métricas usadas. Algoritmos computacionales Clustering balancead Particle swarm optimization
179	Finding all maximal cliques in dynamic graphs Stix, Volker January 2002 (has links) (PDF) Clustering applications dealing with perception based or biased data lead to models with non-disjunct clusters. There, objects to be clustered are allowed to belong to several clusters at the same time which results in a fuzzy clustering. It can be shown that this is equivalent to searching all maximal cliques in dynamic graphs like G_t=(V,E_t), where E_(t-1) in E_t, t=1,... ,T; E_0=(). In this article algorithms are provided to track all maximal cliques in a fully dynamic graph. It is naturally to raise the question about the maximum clique, having all maximal cliques. Therefore this article discusses potentials and drawbacks for this problem as well. (author's abstract) / Series: Working Papers on Information Systems, Information Business and Operations
180	"Aprendizado de máquina semi-supervisionado: proposta de um algoritmo para rotular exemplos a partir de poucos exemplos rotulados" Sanches, Marcelo Kaminski 11 August 2003 (has links) A fim de se utilizar algoritmos de Aprendizado de Máquina para tarefas de classificação, é admitida a existência de um conjunto de exemplos rotulados, conhecido como conjunto de treinamento, o qual é utilizado para o treinamento do classificador. Entretanto, em casos reais, esse conjunto de treinamento pode não conter um número de exemplos suficientemente grande para se induzir um bom classificador. Recentemente, a comunidade científica tem mostrado um grande interesse em uma variação dessa abordagem de aprendizado supervisionado. Essa nova abordagem, conhecida como aprendizado semi-supervisionado, assume que, juntamente com o conjunto de treinamento, há um segundo conjunto, de exemplos não rotulados, também disponível durante o treinamento. Uma das metas do aprendizado semi-supervisionado é o treinamento de classificadores quando uma grande quantidade de exemplos não rotulados está disponível juntamente com um pequeno conjunto de exemplos rotulados. A motivação para o aprendizado semi-supervisionado deve-se ao fato que, em muitas aplicações do mundo real, conjuntos de exemplos não rotulados são facilmente encontrados ou muito baratos para serem coletados, quando comparados aos conjuntos de exemplos rotulados. Um outro fator é que exemplos não rotulados podem ser coletados de forma automática enquanto os rotulados necessitam de especialistas ou outros custosos recursos de classificação. Os exemplos não rotulados podem ser utilizados de diversas maneiras. Neste trabalho é explorado um mecanismo no qual os exemplos não rotulados podem ser utilizados para melhorar tarefas de classificação e é proposto um algoritmo semi-supervisionado, denominado k-meanski, o qual viabiliza o uso de exemplos não rotulados em aprendizado supervisionado. A técnica utilizada pelo algoritmo proposto está baseada em duas premissas. A primeira delas é que os exemplos tendem a se agrupar naturalmente em clusters, ao invés de se distribuirem uniformemente no espaço de descrição dos exemplos. Além disso, cada exemplo do conjunto inicial de exemplos rotulados deve estar localizado perto do centro de um dos clusters existentes no espaço de descrição de exemplos. A segunda premissa diz que a maioria dos exemplos nos clusters pertencem a uma classe específica. Obviamente, a validade dessas premissas é dependente do conjunto de dados utilizado. O algoritmo k-meanski funciona bem nos casos em que os dados estão em conformidade com ambas as premissas. Entretanto, caso elas sejam violadas, a performance do algoritmo não será boa. São mostrados experimentos utilizando conjuntos de dados do mundo real, escolhendo-se aleatoriamente exemplos desses conjuntos para atuarem como exemplos rotulados. aprendizado de máquina aprendizado semi-supervisionado clustering

Search results