Global ETD Search

11	Discovery and Analysis of Aligned Pattern Clusters from Protein Family Sequences Lee, En-Shiun Annie 28 April 2015 (has links) Protein sequences are essential for encoding molecular structures and functions. Consequently, biologists invest substantial resources and time discovering functional patterns in proteins. Using high-throughput technologies, biologists are generating an increasing amount of data. Thus, the major challenge in biosequencing today is the ability to conduct data analysis in an effi cient and productive manner. Conserved amino acids in proteins reveal important functional domains within protein families. Conversely, less conserved amino acid variations within these protein sequence patterns reveal areas of evolutionary and functional divergence. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search is used. However, at present, combinatorial methods of pattern search generate a large set of solutions, and probabilistic methods require richer representations. They require biological ground truth of the input sequences, such as gene name or taxonomic species, as class labels based on traditional classi fication practice to train a model for predicting unknown sequences. However, these algorithms are inherently biased by mislabelling and may not be able to reveal class characteristics in a detailed and succinct manner. A novel pattern representation called an Aligned Pattern Cluster (AP Cluster) as developed in this dissertation is compact yet rich. It captures conservations and variations of amino acids and covers more sequences with lower entropy and greatly reduces the number of patterns. AP Clusters contain statistically signi cant patterns with variations; their importance has been confi rmed by the following biological evidences: 1) Most of the discovered AP Clusters correspond to binding segments while their aligned columns correspond to binding sites as verifi ed by pFam, PROSITE, and the three-dimensional structure. 2) By compacting strong correlated functional information together, AP Clusters are able to reveal class characteristics for taxonomical classes, gene classes and other functional classes, or incorrect class labelling. 3) Co-occurrence of AP Clusters on the same homologous protein sequences are spatially close in the protein's three-dimensional structure. These results demonstrate the power and usefulness of AP Clusters. They bring in similar statistically signifi cance patterns with variation together and align them to reveal protein regional functionality, class characteristics, binding and interacting sites for the study of protein-protein and protein-drug interactions, for diff erentiation of cancer tumour types, targeted gene therapy as well as for drug target discovery. pattern discovery knowledge discovery clustering protein functionality sequence pattern co-occurrence spectral clustering hierarchical clustering classification unsupervised learning cluster validity measure Jaccard Index Mutual Information Information gain pattern alignment
12	Design, Implementation and Analysis of a Description Model for Complex Archaeological Objects / Elaboration, mise en œuvre et analyse d’un mod`ele de description d’objets arch´eologiques complexes Ozturk, Aybuke 09 July 2018 (has links) La céramique est l'un des matériaux archéologiques les plus importants pour aider à la reconstruction des civilisations passées. Les informations à propos des objets céramiques complexes incluent des données textuelles, numériques et multimédias qui posent plusieurs défis de recherche abordés dans cette thèse. D'un point de vue technique, les bases de données de céramiques présentent différents formats de fichiers, protocoles d'accès et langages d'interrogation. Du point de vue des données, il existe une grande hétérogénéité et les experts ont différentes façons de représenter et de stocker les données. Il n'existe pas de contenu et de terminologie standard, surtout en ce qui concerne la description des céramiques. De plus, la navigation et l'observation des données sont difficiles. L'intégration des données est également complexe en raison de laprésence de différentes dimensions provenant de bases de données distantes, qui décrivent les mêmes catégories d'objets de manières différentes.En conséquence, ce projet de thèse vise à apporter aux archéologues et aux archéomètres des outils qui leur permettent d'enrichir leurs connaissances en combinant différentes informations sur les céramiques. Nous divisons notre travail en deux parties complémentaires : (1) Modélisation de données archéologiques complexes, et (2) Partitionnement de données (clustering) archéologiques complexes. La première partie de cette thèse est consacrée à la conception d'un modèle de données archéologiques complexes pour le stockage des données céramiques. Cette base de donnée alimente également un entrepôt de données permettant des analyses en ligne (OLAP). La deuxième partie de la thèse est consacrée au clustering (catégorisation) des objets céramiques. Pour ce faire, nous proposons une approche floue, dans laquelle un objet céramique peut appartenir à plus d'un cluster (d'une catégorie). Ce type d'approche convient bien à la collaboration avec des experts, enouvrant de nouvelles discussions basées sur les résultats du clustering.Nous contribuons au clustering flou (fuzzy clustering) au sein de trois sous-tâches : (i) une nouvelle méthode d'initialisation des clusters flous qui maintient linéaire la complexité de l'approche ; (ii) un indice de qualité innovant qui permet de trouver le nombre optimal de clusters ; et (iii) l'approche Multiple Clustering Analysis qui établit des liens intelligents entre les données visuelles, textuelles et numériques, ce qui permet de combiner tous les types d'informations sur les céramiques. Par ailleurs, les méthodes que nous proposons pourraient également être adaptées à d'autres domaines d'application tels que l'économie ou la médecine. / Ceramics are one of the most important archaeological materials to help in the reconstruction of past civilizations. Information about complex ceramic objects is composed of textual, numerical and multimedia data, which induce several research challenges addressed in this thesis. From a technical perspective, ceramic databases have different file formats, access protocols and query languages. From a data perspective, ceramic data are heterogeneous and experts have differentways of representing and storing data. There is no standardized content and terminology, especially in terms of description of ceramics. Moreover, data navigation and observation are difficult. Data integration is also difficult due to the presence of various dimensions from distant databases, which describe the same categories of objects in different ways.Therefore, the research project presented in this thesis aims to provide archaeologists and archaeological scientists with tools for enriching their knowledge by combining different information on ceramics. We divide our work into two complementary parts: (1) Modeling of Complex Archaeological Data and (2) Clustering Analysis of Complex Archaeological Data. The first part of this thesis is dedicated to the design of a complex archaeological database model for the storage of ceramic data. This database is also used to source a data warehouse for doing online analytical processing (OLAP). The second part of the thesis is dedicated to an in-depth clustering (categorization) analysis of ceramic objects. To do this, we propose a fuzzy approach, where ceramic objects may belong to more than one cluster (category). Such a fuzzy approach is well suited for collaborating with experts, by opening new discussions based on clustering results.We contribute to fuzzy clustering in three sub-tasks: (i) a novel fuzzy clustering initialization method that keeps the fuzzy approach linear; (ii) an innovative quality index that allows finding the optimal number of clusters; and (iii) the Multiple Clustering Analysis approach that builds smart links between visual, textual and numerical data, which assists in combining all types ofceramic information. Moreover, the methods we propose could also be adapted to other application domains such as economy or medicine. Objets complexes Archéologie Archéométrie Céramique Bases de données Entrepôts de données OLAP Clustering Initialisation linéaire MaxMin Validité des clusters Visual TSFD Clustering ensembliste Clustering de partitions combinées Complex Objects Archaeology Archaeometry Ceramics Databases Data Warehouses OLAP Clustering MaxMin Linear Initialization Cluster Validity Visual TSFD Cluster Ensemble Combined Partition Clustering 006 930.102
13	Détection dynamique des intrusions dans les systèmes informatiques / Dynamic intrusion detection in computer systems Pierrot, David 21 September 2018 (has links) La démocratisation d’Internet, couplée à l’effet de la mondialisation, a pour résultat d’interconnecter les personnes, les états et les entreprises. Le côté déplaisant de cette interconnexion mondiale des systèmes d’information réside dans un phénomène appelé « Cybercriminalité ». Des personnes, des groupes mal intentionnés ont pour objectif de nuire à l’intégrité des systèmes d’information dans un but financier ou pour servir une cause. Les conséquences d’une intrusion peuvent s’avérer problématiques pour l’existence d’une entreprise ou d’une organisation. Les impacts sont synonymes de perte financière, de dégradation de l’image de marque et de manque de sérieux. La détection d’une intrusion n’est pas une finalité en soit, la réduction du delta détection-réaction est devenue prioritaire. Les différentes solutions existantes s’avèrent être relativement lourdes à mettre place aussi bien en matière de compétence que de mise à jour. Les travaux de recherche ont permis d’identifier les méthodes de fouille de données les plus performantes mais l’intégration dans une système d’information reste difficile. La capture et la conversion des données demandent des ressources de calcul importantes et ne permettent pas forcément une détection dans des délais acceptables. Notre contribution permet, à partir d’une quantité de données relativement moindre de détecter les intrusions. Nous utilisons les événements firewall ce qui réduit les besoins en terme de puissance de calcul tout en limitant la connaissance du système d’information par les personnes en charge de la détection des intrusions. Nous proposons une approche prenant en compte les aspects techniques par l’utilisation d’une méthode hybride de fouille de données mais aussi les aspects fonctionnels. L’addition de ces deux aspects est regroupé en quatre phases. La première phase consiste à visualiser et identifier les activités réseau. La deuxième phase concerne la détection des activités anormales en utilisant des méthodes de fouille de données sur la source émettrice de flux mais également sur les actifs visés. Les troisième et quatrième phases utilisent les résultats d’une analyse de risque et d’audit technique de sécurité pour une prioritisation des actions à mener. L’ensemble de ces points donne une vision générale sur l’hygiène du système d’information mais aussi une orientation sur la surveillance et les corrections à apporter. L’approche développée a donné lieu à un prototype nommé D113. Ce prototype, testé sur une plate-forme d’expérimentation sur deux architectures de taille différentes a permis de valider nos orientations et approches. Les résultats obtenus sont positifs mais perfectibles. Des perspectives ont été définies dans ce sens. / The expansion and democratization of the digital world coupled with the effect of the Internet globalization, has allowed individuals, countries, states and companies to interconnect and interact at incidence levels never previously imagined. Cybercrime, in turn, is unfortunately one the negative aspects of this rapid global interconnection expansion. We often find malicious individuals and/or groups aiming to undermine the integrity of Information Systems for either financial gain or to serve a cause. The consequences of an intrusion can be problematic for the existence of a company or an organization. The impacts are synonymous with financial loss, brand image degradation and lack of seriousness. The detection of an intrusion is not an end in itself, the reduction of the delta detection-reaction has become a priority. The different existing solutions prove to be cumbersome to set up. Research has identified more efficient data mining methods, but integration into an information system remains difficult. Capturing and converting protected resource data does not allow detection within acceptable time frames. Our contribution helps to detect intrusions. Protect us against Firewall events which reduces the need for computing power while limiting the knowledge of the information system by intrusion detectors. We propose an approach taking into account the technical aspects by the use of a hybrid method of data mining but also the functional aspects. The addition of these two aspects is grouped into four phases. The first phase is to visualize and identify network activities. The second phase concerns the detection of abnormal activities using data mining methods on the source of the flow but also on the targeted assets. The third and fourth phases use the results of a risk analysis and a safety verification technique to prioritize the actions to be carried out. All these points give a general vision on the hygiene of the information system but also a direction on monitoring and corrections to be made.The approach developed to a prototype named D113. This prototype, tested on a platform of experimentation in two architectures of different size made it possible to validate our orientations and approaches. The results obtained are positive but perfectible. Prospects have been defined in this direction. Securité Détection d'intrusions Pare-feu (firewall) Apprentissage supervisé Collecte de renseignements Journaux (logs) Évènements Attaques Clustering Validité des clusters Disponibilité Intégrité Confidentialité Traçabilité Apprentissage non-surpervisé Security Intrusion detection Firewall Supervised machine learning Unsupervised machine learning Information gathering Logs Events Attacks Clustering Cluster validity Avaibility Integrity Privacy Traceability 004

Page generated in 0.0839 seconds