Global ETD Search

1	Communautés dans les réseaux sémantiques pairs-à-pairs / Communities in semantic peer-to-peer networks Ismail, Anis 13 July 2010 (has links) La première partie de cette thèse est dédiée à l’état de l’art sur les réseaux pair-à-pair, la recherche d’information dans de tels réseaux et la problématique de la fouille des données dans le contexte pair-à-pair en se focalisant plus particulièrement sur les méthodes de regroupement (clustering) et les arbres de décision.La seconde partie traite des réseaux où les pairs disposent de leurs propres schémas de données. On y analyse plus particulièrement les fondements et le fonctionnement du système SenPeer. On propose alors une architecture supportant une organisation communautaire des réseaux pair-à-pairs sémantiques. Cela nous permet alors de construire des réseaux pair-à-pair sémantiques structurés en communautés appelés cSON (CommunitySemantic Overlay Network).Ce qui pose alors les questions concernant l’explicitation des communautés et leur exploitation pour améliorer les performances (temps de réponse, nombres de messages, précision et le rappel). Pour construire les communautés, nous étudions deux alternatives différentes : (1) Médiation sémantique : la construction des communautés se base sur les liens sémantiques entre les super-pairs et la confiance qu’ils ont les uns envers les autres et (2) Clustering : un algorithme de clustering basé sur l’analyse des requêtes traitées par les super-pairs est à la base de construction des communautés. Ensuite, nous proposons deux méthodes pour calculer des caractérisations des communautaires en se plaçant dans les deux champs de recherche suivants : (1) Data mining: on cherche à caractériser chaque communauté à l’aide d’une connaissance extraite des requêtes traitées par ses super-pairs d’une même communauté CK (Communauty Knowledge) et (2) Hypergraphes : A l’inverse de la méthode précédente, notre objectif maintenant est de caractériser collectivement les communautés. On formalise ce problème comme la recherche des MCS (minimal covering shortcuts) qui sont des raccourcis, entre les super pairs,minimaux couvrants toutes les communautés. Nous développons ensuite deux méthodes de routages de requêtes CK-rooting et MCS-rooting en utilisant respectivement la connaissance communautaire et les MCS afin d’identifier les super-pairs susceptibles de traiter une requête donnée.Dans la troisième partie, nous présentons le simulateur développé pour supporter l’approche cSON. Nous présentons alors les résultats empiriques résultant de simulations et qui montrent une amélioration significative des performances de l’approche basée uniquement sur la médiation sémantique. Cette partie se termine avec la description d’une application de recherche d’information basée sur le partage de documents scientifiques enrichis. / The first part of this thesis is dedicated to the state of the art on the peer-to-peer networks, the information retrieval in such networks, and the problematic of data mining in the peer-to-peer context more particularly on clustering methods and decision trees.The second part deals with networks where peers have their own data schemas. We examine more particularlythe fundamentals and functioning of the system “SenPeer”. Then, we propose an architecture supporting acommunity organization of semantic peer-to-peer networks. This allows us to build peer-to-peer semantic structured communities called cSON (Communauty Semantic Overlay Network).This raises many questions concerning the explanation of communities and their operating to improve performances (response time, number of messages, precision and recall). To build communities, we study two different alternatives: (1) Semantic Mediation: the building of communities is based on semantic links between super-peers and the confidence that they have between them and (2) Clustering: a clustering algorithm, based onthe analysis of queries processed by the super-peers, is the base of community building. Then, we propose twomethods to calculate the characterizations of communities in the two research fields: (1) Data mining: we try to characterize each community using knowledge extracted from applications processed by his super-peers of the same community CK (Community Knowledge) and (2) Hypergraphs: Unlike the previous method, our goal nowis to characterize the communities collectively. We formalize this problem as the research of the MCS (minimalcovering shortcuts) which are shortcuts between the super-peers, minimum shortcuts covering all communities.Then, we develop two methods of queries routing CK-rooting and MCS-rooting respectively using community knowledge and MCS to identify the super-peers may process a given query.In the third section, we present the simulator developed to support the cSON approach. We present the empirical results representing the simulations and which show a significant improvement of performance of the approachonly based on semantic mediation.This part ends with a description of an application of information retrieval based on sharing enriched scientific documents. Pair-à-pair Communautés Simulation Hypergraphes Peer-to-peer Communities Clustering Data minig P2p
2	High impedance fault detection method in multi-grounded distribution networks Valero Masa, Alicia 07 December 2012 (has links) High Impedance Faults (HIFs) are undetectable by conventional protection technology under certain<p>conditions. These faults occur when an energized conductor makes undesired contact with a<p>quasi-insulating object, such as a tree or a road. This contact restricts the level of the fault current to a very low value, from a few mA up to 75A. In solidly grounded distribution networks where the value of the residual current under normal conditions is considerable, overcurrent devices do not protect against HIFs. However, such a protection is essential for guaranteeing public security, because of the possibility of reaching the fallen conductor and the risk of fire. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished Electricité Impedance (Electricity) Impédance électrique laboratory tests SVM classification data-minig protection distribution High-impedance fault
3	Statistical methods for insurance fraud detection Poissant, Mathieu January 2008 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal. Fraude Assurance automobile Forage de données Classification Composantes principales Fraud Car insurance Data minig Cluster analysis Principal components
4	Statistical methods for insurance fraud detection Poissant, Mathieu January 2008 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal Fraude Assurance automobile Forage de données Classification Composantes principales Fraud Car insurance Data minig Cluster analysis Principal components
5	資料採礦中之模型選取孫莓婷 Unknown Date (has links) 有賴電腦的輔助，企業或組織內部所存放的資料量愈來愈多，加速資料量擴大的速度。但是大量的資料帶來的未必是大量的知識，即使擁有功能強大的資料庫系統，倘若不對資料作有意義的分析與推論，再大的資料庫也只是存放資料的空間。過去企業或組織只把資料庫當作查詢系統，並不知道可以藉由資料庫獲取有價值的資訊，而其中資料庫的內容完整與否更是重要。由於企業所擁有的資料庫未必健全，雖然擁有龐大資料庫，但是其中資訊未必足夠。我們認為利用資料庫加值方法：插補方法、抽樣方法、模型評估等步驟，以達到擴充資訊的目的，應該可以在不改變原始資料結構之下增加資料庫訊息。本研究主要在比較不同階段的資料經過加值動作後，是否還能與原始資料結構一致。研究架構大致分成三個主要流程，包括迴歸模型、羅吉斯迴歸模型與決策樹C5.0。經過不同階段的資料加值後，我們所獲得的結論為在迴歸模型為主要流程之下，利用迴歸為主的插補方法可以使加值後的資料庫較貼近原始資料，若想進一步採用抽樣方法縮減資料量，系統抽樣所獲得的結果會比利用簡單隨機抽樣來的好。而在決策樹C5.0的主要流程下，以類神經演算法作為插補的主要方法，在提增資訊量的同時，也使插補後的資料更接近原始資料。關於羅吉斯迴歸模型，由於間斷型變數的類別比例差異過大，致使此流程無法達到有效結論。經由實證分析可以瞭解不同的配模方式，表現較佳的資料庫加值技術也不盡相同，但是與未插補的資料庫相比較，利用資料庫加值技術的確可以增加資訊量，使加值後的虛擬資料庫更貼近原始資料結構。 / With the fast pace of advancement in computer technology, computers have the capacity to store huge amount of data. The abundance of the data, without its proper treatment, does not necessary mean having valuable information on hand. As such, a large database system can merely serve as ways of accessing and storing. Keeping this in mind, we would like to focus on the integrity of the database. We adapt the methods where the missing values are imputed and added while leaving the data structure unmodified. The interest of this paper is to find out when the data are post value added using three different imputation methods, namely regression analysis, logistic regression analysis and C5.0 decision tree, which of the methods could provide the most consistent and resemblance value-added database to the original one. The results this paper has obtained are as the followings. The regression method, after imputation of the added value, produced the closer database structure to the original one. And in the case of having large amount of data where the smaller size of data is desired, then the systematic sampling provides a better outcome than the simple random sampling. The C5.0 decision tree method provides similar result as with the regression method. Finally with respect to the logistic regression analysis, the ratio of each class in the discrete variables is out of proportion, thereby making it difficult to make a reasonable conclusion. After going through the above studies, we have found that although the results from three different methods give slight different outcomes, one thing stands out and that is using the technique of value-added database could actually improve the authentic of the original database. 資料採礦插補方法抽樣方法模型選取 Data Minig Imputation Method Sampling Model Selection
6	Algoritmo SSDM para a mineração de dados semanticamente similares. Escovar, Eduardo Luís Garcia 28 May 2004 (has links) Made available in DSpace on 2016-06-02T19:05:56Z (GMT). No. of bitstreams: 1 DissELGE.pdf: 764248 bytes, checksum: 4660cc71261254f054468d04e4659dc6 (MD5) Previous issue date: 2004-05-28 / Financiadora de Estudos e Projetos / The SSDM algorithm, created to allow semantically similar data mining, is presented in this work. Using fuzzy logic concepts, this algorithm analyzes the similarity grade between items, considering it if it is greater than a user-defined parameter. When this occurs, fuzzy associations between items are established, and are expressed in the association rules obtained. Therefore, besides associations discovered by conventional algorithms, SSDM also discovers semantic associations, showing them together with the other rules obtained. To do that, strategies are defined to discover these associations and calculate the support and the confidence of the rules where they appear. / Neste trabalho é apresentado o algoritmo SSDM, criado para permitir a mineração de dados semanticamente similares. Usando conceitos de lógica nebulosa, esse algoritmo analisa o grau de similaridade entre os itens, e o considera caso ele seja maior do que um parâmetro definido pelo usuário. Quando isso ocorre, são estabelecidas associações nebulosas entre os itens, que são expressas nas regras de associação obtidas. Assim, além das associações descobertas por algoritmos convencionais, o SSDM também descobre associações semânticas, e as exibe junto às demais regras obtidas. Para isso, são definidas estratégias para descobrir essas associações e para calcular o suporte e a confiança das regras onde elas aparecem. Banco de dados Data minig (mineração de dados) Lógica nebulosa Regras de associação Similaridade Semântica Fuzzy logic Data mining
7	A genome-scale mining strategy for recovering novel rapidly-evolving nuclear single-copy genes for addressing shallow-scale phylogenetics in Hydrangea Wanke, Stefan, Granados Mendoza, Carolina, Naumann, Julia, Samain, Marie-Stéphanie, Goetghebeur, Paul, De Smet, Yannick 04 January 2016 (has links) (PDF) Background Identifying orthologous molecular markers that potentially resolve relationships at and below species level has been a major challenge in molecular phylogenetics over the past decade. Non-coding regions of nuclear low- or single-copy markers are a vast and promising source of data providing information for shallow-scale phylogenetics. Taking advantage of public transcriptome data from the One Thousand Plant Project (1KP), we developed a genome-scale mining strategy for recovering potentially orthologous single-copy markers to address low-scale phylogenetics. Our marker design targeted the amplification of intron-rich nuclear single-copy regions from genomic DNA. As a case study we used Hydrangea section Cornidia, one of the most recently diverged lineages within Hydrangeaceae (Cornales), for comparing the performance of three of these nuclear markers to other "fast" evolving plastid markers. Results Our data mining and filtering process retrieved 73 putative nuclear single-copy genes which are potentially useful for resolving phylogenetic relationships at a range of divergence depths within Cornales. The three assessed nuclear markers showed considerably more phylogenetic signal for shallow evolutionary depths than conventional plastid markers. Phylogenetic signal in plastid markers increased less markedly towards deeper evolutionary divergences. Potential phylogenetic noise introduced by nuclear markers was lower than their respective phylogenetic signal across all evolutionary depths. In contrast, plastid markers showed higher probabilities for introducing phylogenetic noise than signal at the deepest evolutionary divergences within the tribe Hydrangeeae (Hydrangeaceae). Conclusions While nuclear single-copy markers are highly informative for shallow evolutionary depths without introducing phylogenetic noise, plastid markers might be more appropriate for resolving deeper-level divergences such as the backbone relationships of the Hydrangeaceae family and deeper, at which non-coding parts of nuclear markers could potentially introduce noise due to elevated rates of evolution. The herein developed and demonstrated transcriptome based mining strategy has a great potential for the design of novel and highly informative nuclear markers for a range of plant groups and evolutionary scales. Data-Minig feinskalige Phylogenetics Biologie TU Dresden Publikationsfonds Data mining Fine-scale phylogenetics Hydrangea sect. Cornidia Phylogenetic signal Phylogenetic noise Technical University Dresden Publication funds ddc:570 rvk:WH 3100
8	A genome-scale mining strategy for recovering novel rapidly-evolving nuclear single-copy genes for addressing shallow-scale phylogenetics in Hydrangea Wanke, Stefan, Granados Mendoza, Carolina, Naumann, Julia, Samain, Marie-Stéphanie, Goetghebeur, Paul, De Smet, Yannick 04 January 2016 (has links) Background Identifying orthologous molecular markers that potentially resolve relationships at and below species level has been a major challenge in molecular phylogenetics over the past decade. Non-coding regions of nuclear low- or single-copy markers are a vast and promising source of data providing information for shallow-scale phylogenetics. Taking advantage of public transcriptome data from the One Thousand Plant Project (1KP), we developed a genome-scale mining strategy for recovering potentially orthologous single-copy markers to address low-scale phylogenetics. Our marker design targeted the amplification of intron-rich nuclear single-copy regions from genomic DNA. As a case study we used Hydrangea section Cornidia, one of the most recently diverged lineages within Hydrangeaceae (Cornales), for comparing the performance of three of these nuclear markers to other 'fast' evolving plastid markers. Results Our data mining and filtering process retrieved 73 putative nuclear single-copy genes which are potentially useful for resolving phylogenetic relationships at a range of divergence depths within Cornales. The three assessed nuclear markers showed considerably more phylogenetic signal for shallow evolutionary depths than conventional plastid markers. Phylogenetic signal in plastid markers increased less markedly towards deeper evolutionary divergences. Potential phylogenetic noise introduced by nuclear markers was lower than their respective phylogenetic signal across all evolutionary depths. In contrast, plastid markers showed higher probabilities for introducing phylogenetic noise than signal at the deepest evolutionary divergences within the tribe Hydrangeeae (Hydrangeaceae). Conclusions While nuclear single-copy markers are highly informative for shallow evolutionary depths without introducing phylogenetic noise, plastid markers might be more appropriate for resolving deeper-level divergences such as the backbone relationships of the Hydrangeaceae family and deeper, at which non-coding parts of nuclear markers could potentially introduce noise due to elevated rates of evolution. The herein developed and demonstrated transcriptome based mining strategy has a great potential for the design of novel and highly informative nuclear markers for a range of plant groups and evolutionary scales. info:eu-repo/classification/ddc/570 ddc:570

Search results