Global ETD Search

181	A Centralized Energy Management System for Wireless Sensor Networks Skowyra, Richard William 05 May 2009 (has links) This document presents the Centralized Energy Management System (CEMS), a dynamic fault-tolerant reclustering protocol for wireless sensor networks. CEMS reconfigures a homogeneous network both periodically and in response to critical events (e.g. cluster head death). A global TDMA schedule prevents costly retransmissions due to collision, and a genetic algorithm running on the base station computes cluster assignments in concert with a head selection algorithm. CEMS' performance is compared to the LEACH-C protocol in both normal and failure-prone conditions, with an emphasis on each protocol's ability to recover from unexpected loss of cluster heads. Fault Tolerance Clustering Wireless Sensor Networks
182	Fouille de données par contraintes / Data mining by constraints Boudane, Abdelhamid 13 September 2018 (has links) Dans cette thèse, nous abordons les problèmes bien connus de clustering et de fouille de règles d’association. Notre première contribution introduit un nouveau cadre de clustering, où les objets complexes sont décrits par des formules propositionnelles. Premièrement, nous adaptons les deux fameux algorithmes de clustering, à savoir, le k-means et l’algorithme hiérarchique ascendant, pour traiter ce type d’objets complexes. Deuxièmement, nous introduisons un nouvel algorithme hiérarchique descendant pour le clustering des objets représentés explicitement par des ensembles de modèles. Enfin, nous proposons un encodage basé sur la satisfiabilité propositionnelle du problème de clustering des formules propositionnelles sans avoir besoin d’une représentation explicite de leurs modèles. Dans une seconde contribution, nous proposons une nouvelle approche basée sur la satisfiabilité pour extraire les règles d’association en une seule étape. La tâche est modélisée comme une formule propositionnelle dont les modèles correspondent aux règles à extraire. Pour montrer la flexibilité de notre cadre, nous abordons également d’autres variantes, à savoir, l’extraction des règles d’association fermées, minimales non redondantes, les plus générales et les indirectes. Les expérimentations sur de nombreux jeux de données montrent que sur la majorité des tâches de fouille de règles d’association considérées, notre approche déclarative réalise de meilleures performances que les méthodes spécialisées. / In this thesis, We adress the well-known clustering and association rules mining problems. Our first contribution introduces a new clustering framework, where complex objects are described by propositional formulas. First, we extend the two well-known k-means and hierarchical agglomerative clustering techniques to deal with these complex objects. Second, we introduce a new divisive algorithm for clustering objects represented explicitly by sets of models. Finally, we propose a propositional satisfiability based encoding of the problem of clustering propositional formulas without the need for an explicit representation of their models. In a second contribution, we propose a new propositional satisfiability based approach to mine association rules in a single step. The task is modeled as a propositional formula whose models correspond to the rules to be mined. To highlight the flexibility of our proposed framework, we also address other variants, namely the closed, minimal non-redundant, most general and indirect association rules mining tasks. Experiments on many datasets show that on the majority of the considered association rules mining tasks, our declarative approach achieves better performance than the state-of-the-art specialized techniques. Fouille de données Clustering Règles d'association Satisfiabilité 006.3
183	CONFIRM: Clustering of Noisy Form Images using Robust Matching Tensmeyer, Christopher Alan 01 May 2016 (has links) Identifying the type of a scanned form greatly facilitates processing, including automated field segmentation and field recognition. Contrary to the majority of existing techniques, we focus on unsupervised type identification, where the set of form types are not known apriori, and on noisy collections that contain very similar document types. This work presents a novel algorithm: CONFIRM (Clustering Of Noisy Form Images using Robust Matching), which simultaneously discovers the types in a collection of forms and assigns each form to a type. CONFIRM matches type-set text and rule lines between forms to create domain specific features, which we show outperform Bag of Visual Word (BoVW) features employed by the current state-of-the-art. To scale to large document collections, we use a bootstrap approach to clustering, where only a small subset of the data is clustered directly, while the rest of the data is assigned to clusters in linear time. We show that CONFIRM reduces average cluster impurity by 44% compared to the state-of-the art on 5 collections of historical forms that contain significant noise. We also show competitive performance on the relatively clean NIST tax form collection. Clustering Historical Documents Form Recognition Computer Sciences
184	A framework for emerging topic detection in biomedicine Madlock-Brown, Charisse Renee 01 December 2014 (has links) Emerging topic detection algorithms have the potential to assist researchers in maintaining awareness of current trends in biomedical fields--a feat not easily achieved with existing methods. Though topic detection algorithms for news-cycles exist, several aspects of this particular area make applying them directly to scientific literature problematic. This dissertation offers a framework for emerging topic detection in biomedicine. The framework includes a novel set of weightings based on the historical importance of each topic identified. Features such as journal impact factor and funding data are used to develop a fitness score to identify which topics are likely to burst in the future. Characterization of bursts over an extended planning horizon by discipline was performed to understand what a typical burst trend looks like in this space to better understand how to identify important or emerging trends. Cluster analysis was used to create an overlapping hierarchical structure of scientific literature at the discipline level. This allows for granularity adjustment (e.g. discipline level or research area level) in emerging topic detection for different users. Using cluster analysis allows for the identification of terms that may not be included in annotated taxonomies, as they are new or not considered as relevant at the time the taxonomy was last updated. Weighting topics by historical frequency allows for better identification of bursts that are associated with less well-known areas, and therefore more surprising. The fitness score allows for the early identification of bursty terms. This framework will benefit policy makers, clinicians and researchers. publicabstract Burst Detection Clustering Analysis Bioinformatics
185	Contextualisation, Visualisation et Evaluation en Apprentissage Non Supervisé Candillier, Laurent 15 September 2006 (has links) (PDF) Cette thèse se place dans le cadre de l'apprentissage non supervisé, qui consiste à former différents groupes à partir d'un ensemble de données, de telle manière que les données considérées comme les plus similaires soient associées au même groupe et qu'au contraire les données considérées comme différentes se retrouvent dans des groupes distincts, permettant ainsi d'extraire de la connaissance à partir de ces données. Nous proposons d'abord deux nouvelles méthodes qui prennent en compte le contexte dans lequel les groupes sont créés, c'est-à-dire le fait que les caractéristiques des différents groupes peuvent être définies sur différents sous-ensembles des attributs décrivant les données. Dans la mise en oeuvre de ces méthodes, nous avons également considéré les problématiques de la minimisation du nombre de connaissances a priori requises de la part de l'utilisateur et de la présentation des résultats sous forme compréhensible et visuelle. Nous présentons ensuite plusieurs extensions possibles de ces méthodes, dans le cadre de l'apprentissage supervisé puis face à des données semi-structurées représentées sous forme arborescente. Différentes expérimentations sur données artificielles puis sur données réelles sont présentées qui mettent en avant l'intérêt de ces méthodes. Le problème de l'évaluation des résultats produits par une méthode d'apprentissage non supervisé, et de la comparaison de telles méthodes, restant aujourd'hui un problème ouvert, nous proposons enfin une nouvelle méthode d'évaluation plus objective et quantitative que celles utilisées traditionnellement, et dont la pertinence est montrée expérimentalement. [INFO:INFO_OH] Computer Science/Other clustering apprentissage non-supervisé
186	Identifying Deviating Systems with Unsupervised Learning Panholzer, Georg January 2008 (has links) <p>We present a technique to identify deviating systems among a group of systems in a</p><p>self-organized way. A compressed representation of each system is used to compute similarity measures, which are combined in an affinity matrix of all systems. Deviation detection and clustering is then used to identify deviating systems based on this affinity matrix.</p><p>The compressed representation is computed with Principal Component Analysis and</p><p>Kernel Principal Component Analysis. The similarity measure between two compressed</p><p>representations is based on the angle between the spaces spanned by the principal</p><p>components, but other methods of calculating a similarity measure are suggested as</p><p>well. The subsequent deviation detection is carried out by computing the probability of</p><p>each system to be observed given all the other systems. Clustering of the systems is</p><p>done with hierarchical clustering and spectral clustering. The whole technique is demonstrated on four data sets of mechanical systems, two of a simulated cooling system and two of human gait. The results show its applicability on these mechanical systems.</p> Deviation Detection Clustering Eigen-Subspace Machine Learning
187	none Chan, I-ju 23 August 2007 (has links) Due to the development of Information Technology and World Wide Web, now sales can understand more about customer's browsing behaviors through the World Wide Web. Compared with now, sales could only get the sale data in the past. There is a huge grown of these data availability compared with the past. As the advance of computer hardware, we have more ability to store these data. However, with these data of buyers' making decision process not being analyzed, decision makers often could not understand the hidden information effectively. The 3C retail chain stores had been selected as a case study for this research. We thought that consumer's need and acceptance toward products had already been reflected in the past consuming choices and searching behavior records. If we can find out the correlation between the searching behaviors and actual consuming records, this will assist business obtain the most appropriate marketing information on the web and increase response effectively. This will be tremendously profitable for marketing decision making. Using above theory, we suggest that data integration analysis model among members is applied to distinguish the marking segmentation for website promotion of company A. We hope that the model will improve current marketing communication and provide the practice value. In the past, we normally apply demographic statistics, geographical distribution or social economy to segment consumers for the research about divining target customers. However, there has been no appropriate segmentation model to assist business of actual brick-and-click to segmentation on the basic behavior of the customer purchase for internet marketing promotions yet. The consumer's behaviors had been applied as segmentation and the buyer's usage and response toward website had been utilized as foundation for this research. Principal components analysis had been applied to extract behavior variable; then Two-Stage Classification Method had applied to divide members into different groups. We divide members into life style groups of the one with similar data points and the one with different data points by the exploratory segmentation model for this research. This will be a nature formation of market segmentation to assist business to pin point what products to be sold on the web and how to differentiate the products. As well as assist business to segment member effectively, distinguish area website service and usage limitation. Business will no longer shoot blind for marketing to members and will be able to make the proper e-marketing communication decision. Searching Behaviors E-Marketing Market Segmentation Clustering PCA
188	Determining and characterizing immunological self/non-self Li, Ying 15 February 2007 The immune system has the ability to discriminate self from non-self proteins and also make appropriate immune responses to pathogens. A fundamental problem is to understand the genomic differences and similarities among the sets of self peptides and non-self peptides. The sequencing of human, mouse and numerous pathogen genomes and cataloging of their respective proteomes allows host self and non-self peptides to be identified. T-cells make this determination at the peptide level based on peptides displayed by MHC molecules.<p>In this project, peptides of specific lengths (k-mers) are generated from each protein in the proteomes of various model organisms. The set of unique k-mers for each species is stored in a library and defines its "immunological self". Using the libraries, organisms can be compared to determine the levels of peptide overlap. The observed levels of overlap can also be compared with levels which can be expected "at random" and statistical conclusions drawn.<p>A problem with this procedure is that sequence information in public protein databases (Swiss-PROT, UniProt, PIR) often contains ambiguities. Three strategies for dealing with such ambiguities have been explored in earlier work and the strategy of removing ambiguous k-mers is used here.<p>Peptide fragments (k-mers) which elicit immune responses are often localized within the sequences of proteins from pathogens. These regions are known as "immunodominants" (i.e., hot spots) and are important in immunological work. After investigating the peptide universes and their overlaps, the question of whether known regions of immunological significance (e.g., epitope) come from regions of low host-similarity is explored. The known regions of epitopes are compared with the regions of low host-similarity (i.e., non-overlaps) between HIV-1 and human proteomes at the 7-mer level. Results show that the correlation between these two regions is not statistically significant. In addition, pairs involving human and human viruses are explored. For these pairs, one graph for each k-mer level is generated showing the actual numbers of matches between organisms versus the expected numbers. From graphs for 5-mer and 6-mer level, we can see that the number of overlapping occurrences increases as the size of the viral proteome increases.<p>A detailed investigation of the overlaps/non-overlaps between viral proteome and human proteome reveals that the distribution of the locations of these overlaps/non-overlaps may have "structure" (e.g. locality clustering). Thus, another question that is explored is whether the locality clustering is statistically significant. A chi-square analysis is used to analyze the locality clustering. Results show that the locality clusterings for HIV-1, HIV-2 and Influenza A virus at the 5-mer, 6-mer and 7-mer levels are statistically significant. Also, for self-similarity of human protein Desmoglein 3 to the remaining human proteome, it shows that the locality clustering is not statistically significant at the 5-mer level while it is at the 6-mer and 7-mer levels. Locality Clustering Proteome Similarity Self/Non-self
189	Model Calibration, Drainage Volume Calculation and Optimization in Heterogeneous Fractured Reservoirs Kang, Suk Sang 1975- 14 March 2013 (has links) We propose a rigorous approach for well drainage volume calculations in gas reservoirs based on the flux field derived from dual porosity finite-difference simulation and demonstrate its application to optimize well placement. Our approach relies on a high frequency asymptotic solution of the diffusivity equation and emulates the propagation of a 'pressure front' in the reservoir along gas streamlines. The proposed approach is a generalization of the radius of drainage concept in well test analysis (Lee 1982), which allows us not only to compute rigorously the well drainage volumes as a function of time but also to examine the potential impact of infill wells on the drainage volumes of existing producers. Using these results, we present a systematic approach to optimize well placement to maximize the Estimated Ultimate Recovery. A history matching algorithm is proposed that sequentially calibrates reservoir parameters from the global-to-local scale considering parameter uncertainty and the resolution of the data. Parameter updates are constrained to the prior geologic heterogeneity and performed parsimoniously to the smallest spatial scales at which they can be resolved by the available data. In the first step of the workflow, Genetic Algorithm is used to assess the uncertainty in global parameters that influence field-scale flow behavior, specifically reservoir energy. To identify the reservoir volume over which each regional multiplier is applied, we have developed a novel approach to heterogeneity segmentation from spectral clustering theory. The proposed clustering can capture main feature of prior model by using second eigenvector of graph affinity matrix. In the second stage of the workflow, we parameterize the high-resolution heterogeneity in the spectral domain using the Grid Connectivity based Transform to severely compress the dimension of the calibration parameter set. The GCT implicitly imposes geological continuity and promotes minimal changes to each prior model in the ensemble during the calibration process. The field scale utility of the workflow is then demonstrated with the calibration of a model characterizing a structurally complex and highly fractured reservoir. Image Clustering Streamline Simulation Fractured Reservoirs
190	Unsupervised learning of relation detection patterns Gonzàlez Pellicer, Edgar 01 June 2012 (has links) L'extracció d'informació és l'àrea del processament de llenguatge natural l'objectiu de la qual és l'obtenir dades estructurades a partir de la informació rellevant continguda en fragments textuals. L'extracció d'informació requereix una quantitat considerable de coneixement lingüístic. La especificitat d'aquest coneixement suposa un inconvenient de cara a la portabilitat dels sistemes, ja que un canvi d'idioma, domini o estil té un cost en termes d'esforç humà. Durant dècades, s'han aplicat tècniques d'aprenentatge automàtic per tal de superar aquest coll d'ampolla de portabilitat, reduint progressivament la supervisió humana involucrada. Tanmateix, a mida que augmenta la disponibilitat de grans col·leccions de documents, esdevenen necessàries aproximacions completament nosupervisades per tal d'explotar el coneixement que hi ha en elles. La proposta d'aquesta tesi és la d'incorporar tècniques de clustering a l'adquisició de patrons per a extracció d'informació, per tal de reduir encara més els elements de supervisió involucrats en el procés En particular, el treball se centra en el problema de la detecció de relacions. L'assoliment d'aquest objectiu final ha requerit, en primer lloc, el considerar les diferents estratègies en què aquesta combinació es podia dur a terme; en segon lloc, el desenvolupar o adaptar algorismes de clustering adequats a les nostres necessitats; i en tercer lloc, el disseny de procediments d'adquisició de patrons que incorporessin la informació de clustering. Al final d'aquesta tesi, havíem estat capaços de desenvolupar i implementar una aproximació per a l'aprenentatge de patrons per a detecció de relacions que, utilitzant tècniques de clustering i un mínim de supervisió humana, és competitiu i fins i tot supera altres aproximacions comparables en l'estat de l'art. / Information extraction is the natural language processing area whose goal is to obtain structured data from the relevant information contained in textual fragments. Information extraction requires a significant amount of linguistic knowledge. The specificity of such knowledge supposes a drawback on the portability of the systems, as a change of language, domain or style demands a costly human effort. Machine learning techniques have been applied for decades so as to overcome this portability bottleneck¿progressively reducing the amount of involved human supervision. However, as the availability of large document collections increases, completely unsupervised approaches become necessary in order to mine the knowledge contained in them. The proposal of this thesis is to incorporate clustering techniques into pattern learning for information extraction, in order to further reduce the elements of supervision involved in the process. In particular, the work focuses on the problem of relation detection. The achievement of this ultimate goal has required, first, considering the different strategies in which this combination could be carried out; second, developing or adapting clustering algorithms suitable to our needs; and third, devising pattern learning procedures which incorporated clustering information. By the end of this thesis, we had been able to develop and implement an approach for learning of relation detection patterns which, using clustering techniques and minimal human supervision, is competitive and even outperforms other comparable approaches in the state of the art. processasment del llenguatge natural natural language processing extracció d'informació information extraction aprenentatge automàtic machine learning clustering de conjunts ensemble clustering clustering de minoria minority clustering 004

Search results