• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 40
  • 3
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 69
  • 69
  • 21
  • 20
  • 15
  • 12
  • 12
  • 11
  • 11
  • 11
  • 10
  • 9
  • 9
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Drug Repositioning through the Development of Diverse Computational Methods using Machine Learning, Deep Learning, and Graph Mining

Thafar, Maha A. 30 June 2022 (has links)
The rapidly increasing number of existing drugs with genomic, biomedical, and pharmacological data make computational analyses possible, which reduces the search space for drugs and facilitates drug repositioning (DR). Thus, artificial intelligence, machine learning, and data mining have been used to identify biological interactions such as drug-target interactions (DTI), drug-disease associations, and drug-response. The prediction of these biological interactions is seen as a critical phase needed to make drug development more sustainable. Furthermore, late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed. In this dissertation, we tried to address three crucial problems associated with the DR pipeline and presents several novel computational methods developed for DR. First, we developed three network-based DTI prediction methods using machine learning, graph embedding, and graph mining. These methods significantly improved prediction performance, and the best-performing method reduces the error rate by more than 33% across all datasets compared to the best state-of-the-art method. Second, because it is more insightful to predict continuous values that indicate how tightly the drug binds to a specific target, we conducted a comparison study of current regression-based methods that predict drug-target binding affinities (DTBA). We discussed how to develop more robust DTBA methods and subsequently developed Affinity2Vec, the first regression-based method that formulates the entire task as a graph-based method and combines several computational techniques from feature representation learning, graph mining, and machine learning with no 3D structural data of proteins. Affinity2Vec outperforms the state-of-the-art methods. Finally, since drug development failure is associated with sub-optimal target identification, we developed the first DL-based computational method (OncoRTT) to identify cancer-specific therapeutic targets for the ten most common cancers worldwide. Implementing our approach required creating a suitable dataset that could be used by the computational method to identify oncology-related DTIs. Thus, we created the OncologyTT datasets to build and evaluate our OncoRTT method. Our methods demonstrated their efficiency by achieving high prediction performance and identifying therapeutic targets for several cancer types. Overall, in this dissertation, we developed several computational methods to solve biomedical domain problems, specifically drug repositioning, and demonstrated their efficiencies and capabilities.
52

Connaissance et optimisation de la prise en charge des patients : la science des réseaux appliquée aux parcours de soins / Understanding and optimization of patient care and services : networks science applied to healthcare pathways

Jaffré, Marc-Olivier 26 October 2018 (has links)
En France, la nécessaire rationalisation des moyens alloués aux hôpitaux a abouti à une concentration des ressources et une augmentation de la complexité des plateaux techniques. Leur pilotage et leur répartition territoriale s’avèrent d’autant plus difficile, soulevant ainsi la problématique de l’optimisation des systèmes de soins. L’utilisation des données massives produites pas ces systèmes pourrait constituer une nouvelle approche en matière d’analyse et d’aide à la décision. Méthode : A partir d’une réflexion sur la notion de performance, différentes approches d’optimisation préexistantes sont d’abord mis en évidence. Le bloc opératoire a été choisi en tant que terrain expérimental. Suit une analyse sur une fusion d’établissements en tant qu’exemple d’une approche d’optimisation par massification.Ces deux étapes permettent de défendre une approche alternative qui associe l’usage de données massives, la science des réseaux et la visualisation des données sous forme cartographique. Deux sets de séjours en chirurgie orthopédique sur la région ex-Midi-Pyrénées sont utilisés. L’enchainement des séjours de soins est considéré en tant en réseau de données. L’ensemble est projeté dans un environnement visuel développé en JavaScript et permettant une fouille dynamique du graphe. Résultats : La possibilité de visualiser des parcours de santé sous forme de graphes NŒUDS-LIENS est démontrée. Les graphes apportent une perception supplémentaire sur les enchainements de séjours et les redondances des parcours. Le caractère dynamique des graphes permet en outre leur fouille. L’approche visuelle subjective est complétée par une série de mesures objectives issues de la science des réseaux. Les plateaux techniques de soins produisent des données massives utiles à leur analyse et potentiellement à leur optimisation. La visualisation graphique de ces données associées à un cadre d’analyse tel que la science des réseaux donne des premiers indicateurs positifs avec notamment la mise en évidence de motifs redondants. La poursuite d’expérimentations à plus large échelle est requise pour valider, renforcer et diffuser ces observations et cette méthode. / In France, the streamlining of means assigned hospitals result in concentration of resources ana growing complexily of heallhcare facilities. Piloting and planning (them turn out to be all the more difficult, thus leading of optimjzation problems. The use of massive data produced by these systems in association with network science an alternative approach for analyzing and improving decision-making support jn healthcare. Method : Various preexisting optimisation are first highblighted based on observations in operating theaters chosen as experirnentai sites. An analysis of merger of two hospitlas also follows as an example of an optimization method by massification. These two steps make it possible to defend an alternative approach that combines the use of big data science of networks data visualization techniques. Two sets of patient data in orthopedic surgery in the ex-Midi-Pyrénées region in France are used to create a network of all sequences of care. The whole is displayed in a visual environment developed in JavaScript allowing a dynamic mining of the graph. Results: Visualizing healthcare sequences in the form of nodes and links graphs has been sel out. The graphs provide an additional perception of' the redundancies of he healthcare pathways. The dynamic character of the graphs also allows their direct rnining. The initial visual approach is supplernented by a series of objcctive measures from the science of networks. Conciusion: Healthcare facilities produce massive data valuable for their analysis and optimization. Data visualizalion together with a framework such as network science gives prelimiaary encouraging indicators uncovering redondant healthcare pathway patterns. Furthev experimentations with various and larger sets of data is required to validate and strengthen these observations and methods.
53

Time series data mining using complex networks / Mineração de dados em séries temporais usando redes complexas

Ferreira, Leonardo Nascimento 15 September 2017 (has links)
A time series is a time-ordered dataset. Due to its ubiquity, time series analysis is interesting for many scientific fields. Time series data mining is a research area that is intended to extract information from these time-related data. To achieve it, different models are used to describe series and search for patterns. One approach for modeling temporal data is by using complex networks. In this case, temporal data are mapped to a topological space that allows data exploration using network techniques. In this thesis, we present solutions for time series data mining tasks using complex networks. The primary goal was to evaluate the benefits of using network theory to extract information from temporal data. We focused on three mining tasks. (1) In the clustering task, we represented every time series by a vertex and we connected vertices that represent similar time series. We used community detection algorithms to cluster similar series. Results show that this approach presents better results than traditional clustering results. (2) In the classification task, we mapped every labeled time series in a database to a visibility graph. We performed classification by transforming an unlabeled time series to a visibility graph and comparing it to the labeled graphs using a distance function. The new label is the most frequent label in the k-nearest graphs. (3) In the periodicity detection task, we first transform a time series into a visibility graph. Local maxima in a time series are usually mapped to highly connected vertices that link two communities. We used the community structure to propose a periodicity detection algorithm in time series. This method is robust to noisy data and does not require parameters. With the methods and results presented in this thesis, we conclude that network science is beneficial to time series data mining. Moreover, this approach can provide better results than traditional methods. It is a new form of extracting information from time series and can be easily extended to other tasks. / Séries temporais são conjuntos de dados ordenados no tempo. Devido à ubiquidade desses dados, seu estudo é interessante para muitos campos da ciência. A mineração de dados temporais é uma área de pesquisa que tem como objetivo extrair informações desses dados relacionados no tempo. Para isso, modelos são usados para descrever as séries e buscar por padrões. Uma forma de modelar séries temporais é por meio de redes complexas. Nessa modelagem, um mapeamento é feito do espaço temporal para o espaço topológico, o que permite avaliar dados temporais usando técnicas de redes. Nesta tese, apresentamos soluções para tarefas de mineração de dados de séries temporais usando redes complexas. O objetivo principal foi avaliar os benefícios do uso da teoria de redes para extrair informações de dados temporais. Concentramo-nos em três tarefas de mineração. (1) Na tarefa de agrupamento, cada série temporal é representada por um vértice e as arestas são criadas entre as séries de acordo com sua similaridade. Os algoritmos de detecção de comunidades podem ser usados para agrupar séries semelhantes. Os resultados mostram que esta abordagem apresenta melhores resultados do que os resultados de agrupamento tradicional. (2) Na tarefa de classificação, cada série temporal rotulada em um banco de dados é mapeada para um gráfico de visibilidade. A classificação é realizada transformando uma série temporal não marcada em um gráfico de visibilidade e comparando-a com os gráficos rotulados usando uma função de distância. O novo rótulo é dado pelo rótulo mais frequente nos k grafos mais próximos. (3) Na tarefa de detecção de periodicidade, uma série temporal é primeiramente transformada em um gráfico de visibilidade. Máximos locais em uma série temporal geralmente são mapeados para vértices altamente conectados que ligam duas comunidades. O método proposto utiliza a estrutura de comunidades para realizar a detecção de períodos em séries temporais. Este método é robusto para dados ruidosos e não requer parâmetros. Com os métodos e resultados apresentados nesta tese, concluímos que a teoria da redes complexas é benéfica para a mineração de dados em séries temporais. Além disso, esta abordagem pode proporcionar melhores resultados do que os métodos tradicionais e é uma nova forma de extrair informações de séries temporais que pode ser facilmente estendida para outras tarefas.
54

Mining Tera-Scale Graphs: Theory, Engineering and Discoveries

Kang, U 01 May 2012 (has links)
How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such Tera- or Peta-scale graphs? In this thesis, we propose PEGASUS, a large scale graph mining system implemented on the top of the HADOOP platform, the open source version of MAPREDUCE. PEGASUS includes algorithms which help us spot patterns and anomalous behaviors in large graphs. PEGASUS enables the structure analysis on large graphs. We unify many different structure analysis algorithms, including the analysis on connected components, PageRank, and radius/diameter, into a general primitive called GIM-V. GIM-V is highly optimized, achieving good scale-up on the number of edges and available machines. We discover surprising patterns using GIM-V, including the 7-degrees of separation in one of the largest publicly available Web graphs, with 7 billion edges. PEGASUS also enables the inference and the spectral analysis on large graphs. We design an efficient distributed belief propagation algorithm which infer the states of unlabeled nodes given a set of labeled nodes. We also develop an eigensolver for computing top k eigenvalues and eigenvectors of the adjacency matrices of very large graphs. We use the eigensolver to discover anomalous adult advertisers in the who-follows-whom Twitter graph with 3 billion edges. In addition, we develop an efficient tensor decomposition algorithm and use it to analyze a large knowledge base tensor. Finally, PEGASUS allows the management of large graphs. We propose efficient graph storage and indexing methods to answer graph mining queries quickly. We also develop an edge layout algorithm for better compressing graphs.
55

Δομική ανάλυση χρονικά εξελισσόμενων γραφημάτων : ιδιότητες, μοντέλα και εφαρμογές / Structural analysis of time evolving graphs : properties, models and applications

Μαλλιαρός, Φραγκίσκος 07 October 2011 (has links)
Τα τελευταία χρόνια έχει παρατηρηθεί ιδιαίτερο ερευνητικό ενδιαφέρον στη μελέτη δικτύων (γραφημάτων) που προκύπτουν από διάφορες κοινωνικές, τεχνολογικές και επιστημονικές δραστηριότητες. Χαρακτηριστικά παραδείγματα αποτελούν το γράφημα του Διαδικτύου, το γράφημα του Παγκοσμίου Ιστού, κοινωνικά δίκτυα αναπαράστασης της αλληλεπίδρασης των ατόμων στην κοινωνία ή των χρηστών σε υπηρεσίες κοινωνικής δικτύωσης, δίκτυα μοντελοποίησης της συνεργασίας μεταξύ οντοτήτων, βιολογικά δίκτυα, κ.α.. Βασικό χαρακτηριστικό των γραφημάτων αυτών αποτελεί το μεγάλο μέγεθός τους, κάτι που πολλές φορές δυσχαιρένει την ανάλυση και μελέτη τους. Επιπλέον, τα γραφήματα αυτά στις περισσότερες περιπτώσεις δεν είναι στατικά, αλλά εξελίσσονται στο χρόνο με την προσθήκη-διαγραφή κόμβων και ακμών. Έτσι, ορισμένα από τα ερωτήματα που προκύπτουν και έχουν απασχολήσει την ερευνητική κοινότητα είναι πώς μπορούμε να αναλύσουμε τέτοιου είδους γραφήματα και να εξάγουμε ενδιαφέρουσα πληροφορία, ποια είναι η δομή των γραφημάτων αυτών, καθώς και ο τρόπος με τον οποίο δομούνται και εξελίσσονται στο χρόνο. Ένα σημαντικό θέμα που σχετίζεται με τη δομή των γραφημάτων αυτών, αποτελεί η έννοια της ανθεκτικότητας. Γενικά, ένα γράφημα χαρακτηρίζεται ως ανθεκτικό, αν έχει τη δυνατότητα να διατηρήσει τη δομή του και τις ιδιότητες συνεκτικότητας που κατέχει, ύστερα από την απώλεια ενός μέρους των κόμβων και ακμών του. Η ιδιότητα της ανθεκτικότητας σε πραγματικά γραφήματα είναι άμεσα συνυφασμένη με την έννοια της δομής κοινοτήτων (community structure), δηλαδή της οργάνωσης των κόμβων σε ομάδες με υψηλό πλήθος συνδέσεων μεταξύ κόμβων της ίδιας ομάδας και μικρό πλήθος μεταξύ κόμβων που ανήκουν σε διαφορετικές ομάδες. Πώς μπορούμε να κάνουμε μια γρήγορη εκτίμηση των ιδιοτήτων ανθεκτικότητας ενός γραφήματος, χωρίς να επιτελέσουμε μια διαδικασία διαγραφής κόμβων και ακμών όπου σε κάθε βήμα υπολογίζεται η συνεκτικότητα; Με άλλα λόγια, υπάρχει κάποιος δείκτης (μετρική) που μπορεί να μας ενημερώσει τόσο για την ανθεκτικότητα όσο και για τις ιδιότητες δομής κοινοτήτων ενός γραφήματος, ο οποίος θα μπορεί να υπολογιστεί αρκετά γρήγορα ακόμα και για γραφήματα με εκατομμύρια κόμβους και ακμές; Επιπλέον, εάν το γράφημα εξελίσσεται στο χρόνο, τι μπορούμε να πούμε για την ανθεκτικότητά του και κατ' επέκταση, για τις ιδιότητες δομής κοινοτήτων που διαθέτει; Υπάρχει κάποια κοινή ιδιότητα (πρότυπο) στα κοινωνικά γραφήματα που σχετίζεται με τη χρονική εξέλιξη των ιδιοτήτων αυτών; Στα πλαίσια της παρούσας εργασίας προσπαθούμε να απαντήσουμε τα παραπάνω ερωτήματα, μελετώντας τις ιδιότητες επέκτασης κοινωνικών γραφημάτων μεγάλης κλίμακας. Αρχικά παρουσιάζουμε μια μετρική που έχει τη δυνατότητα να χαρακτηρίσει τόσο την ανθεκτικότητα όσο και τις ιδιότητες δομής κοινοτήτων ενός γραφήματος και περιγράφουμε πώς μπορούμε να την υπολογίσουμε αποδοτικά και αποτελεσματικά εκμεταλλευόμενοι ορισμένες ιδιαίτερες φασματικές ιδιότητες των πραγματικών γραφημάτων. Στη συνέχεια, εφαρμόζουμε τη μετρική αυτή σε ένα μεγάλο πλήθος στατικών κοινωνικών γραφημάτων μεγάλης κλίμακας και παρατηρούμε ορισμένες ενδιαφέρουσες ιδιότητες που σχετίζονται με την ανθεκτικότητά του και κατ΄ επέκταση με τις ιδιότητες δομής κοινοτήτων που εμφανίζουν. Μελετάμε πώς οι ιδιότητες αυτές αλλάζουν στον χρόνο, καθώς το γράφημα εξελίσσεται και παρατηρούμε ορισμένα ενδιαφέροντα πρότυπα. Τέλος, παρουσιάζουμε πώς μπορούμε να εντοπίσουμε ανωμαλίες σε γραφήματα που εξελίσσονται στο χρόνο, μελετώντας τις ιδιότητες που σχετίζονται με την ανθεκτικότητά του. / Over the last few years there has been a lot of interest in the study of complex network structures (or graphs) arising in many diverse settings. Characteristic examples are networks from the domain of sociology (e.g., social networks), technological and information networks (e.g., the Internet, the Web, email exchange networks, social interaction networks over social media applications), biological networks (e.g., protein interactions), collaboration and citation networks (e.g., coauthorship networks), and many more. A basic characteristic of these networks is their large scale (size), which in many cases hinder their study. Moreover, the graphs usually are not static, but they evolve over time with the addition/deletion of nodes and edges. A large amount of research work has been devoted on understanding the structure, the organization and the evolution of these networks, with many interesting results. One important aspect which is related to the structure of such graphs, is the notion of robustness. Generally, a graph is characterized as robust, if it is capable to retain its structure and its connectivity properties after the loss of a portion of its nodes and edges. The property of robustness in real-world graphs is closely related to the notion of community structure, where the network is organized based on a modular architecture, presenting well-defined clusters with large inter-cluster and small intra-cluster edge density. We expect that the robustness of a network with good community structure will be poor, since it can be easily become disconnected with the removal of the edges which connect the different clusters. How can we do this estimation quickly without removing edges and nodes and measuring the connectivity? In other words, is there a robustness and community structure index (metric) which can be computed fast enough, even for graphs with millions of nodes and edges? Moreover, if the network evolves over time, what can we say about its robustness, and as an extension, about its community structure? Is there a common pattern in social graphs that govern the time evolution of these properties? In this thesis, we tackle the problem of estimating the robustness properties of a graph quickly, studying the expansion properties of several real-world time-evolving social graphs. First, we present a metric which can be used to characterize both the robustness and the community structure properties of a graph. We present how to efficiently and effectively compute this measure, exploiting the special spectral properties of real-world graphs. Then, we apply this method to several large static social graphs, and we observe some interesting properties that are related to their robustness. We study how these properties change over time, while the graph evolves, and we observe interesting patterns. Finally, we show how to spot outliers and detect anomalies in graphs that evolve over time, examining the change of the robustness properties of a graph.
56

Contribution à la fouille de données spatio-temporelles : application à l'étude de l'érosion / Contribution to spatio-temporal data mining : application to erosion study

Sanhes, Jeremy 25 September 2014 (has links)
Les événements spatio-temporels regroupent une large diversité de phénomènes comportant des caractéristiques propres. Par exemple, l’étude de flux migratoires se révèle ainsi très différente de l’étude de propagation de maladies. En effet, le domaine d’intérêt de la première porte sur le suivi des trajectoires, tandis que celui de la deuxième porte sur les facteurs de la propagation. De plus, chaque classe d’un problème spatio-temporel peut être abordée différemment, que l’on considère ou non un voisinage spatial, une caractérisation des objets d’étude unique ou multiple, ou bien une (in)dépendance entre les événements. Ainsi, les techniques de fouilles de données développées sont souvent restées spécifiques à une sous-classe de problème spatio-temporel, c’est-à-dire sous un ensemble restreint d’hypothèses.Or, pour réussir à dégager des connaissances nouvelles à partir de données, il est nécessaire d’élargir cet ensemble d’hypothèses, c’est-à-dire élargir le champs des possibles quant aux corrélations qu’il peut exister entre événements. Nous proposons donc une modélisation de ces phénomènes spatio-temporels permettant de prendre en compte plus de considérations que dans l’état de l’art. En outre, cette modélisation permet d’exprimer des événements qui existent dans les phénomènes d’érosion : un objet d’étude peut se diviser en plusieurs objets, ou fusionner avec d’autres objets pour n’en former qu’un seul. Plus précisément, nous modélisons les dynamiques spatio-temporelles sous la forme d’un unique graphe orienté, que la composante temporelle des problèmes rend acyclique, et dont les sommets sont attribués par plusieurs caractéristiques. / Spatio-temporal events denote a large range of phenomena with different characteristics. For example, migration flows studies appear to be very different from disease spread studies. Indeed, interestingness of the first relies on tracking trajectories, whereas the second is about finding the factors of spread. Moreover, each class of a spatio-temporal problem can be tackled differently, depending on which parameters are considered: the studied spatial neighbourhood, the number of characteristics associated with the objects, or whether events are supposed correlated or independent. As a result, data mining techniques are often specificto a sub-class of spatio-temporal problem, that is to say, to a limited set of hypothesis.In order to bring out new knowledge from data, it seems to be necessary to enlarge this set of hypothesis, that is to say, to widen the field of possibilities regarding correlations that may exist between events. For this, we propose a new model that allows to take into account more considerations than existing studies. For example, this representation allows to model the complex spatio-temporal dynamic of erosion phenomenon: an object can be split up in several other objects, or can merge with other objects into one. More precisely, we use a single directed graph, that becomes acyclic thanks to the temporal component of the problem, and that is attributed by several characteristics.
57

Time series data mining using complex networks / Mineração de dados em séries temporais usando redes complexas

Leonardo Nascimento Ferreira 15 September 2017 (has links)
A time series is a time-ordered dataset. Due to its ubiquity, time series analysis is interesting for many scientific fields. Time series data mining is a research area that is intended to extract information from these time-related data. To achieve it, different models are used to describe series and search for patterns. One approach for modeling temporal data is by using complex networks. In this case, temporal data are mapped to a topological space that allows data exploration using network techniques. In this thesis, we present solutions for time series data mining tasks using complex networks. The primary goal was to evaluate the benefits of using network theory to extract information from temporal data. We focused on three mining tasks. (1) In the clustering task, we represented every time series by a vertex and we connected vertices that represent similar time series. We used community detection algorithms to cluster similar series. Results show that this approach presents better results than traditional clustering results. (2) In the classification task, we mapped every labeled time series in a database to a visibility graph. We performed classification by transforming an unlabeled time series to a visibility graph and comparing it to the labeled graphs using a distance function. The new label is the most frequent label in the k-nearest graphs. (3) In the periodicity detection task, we first transform a time series into a visibility graph. Local maxima in a time series are usually mapped to highly connected vertices that link two communities. We used the community structure to propose a periodicity detection algorithm in time series. This method is robust to noisy data and does not require parameters. With the methods and results presented in this thesis, we conclude that network science is beneficial to time series data mining. Moreover, this approach can provide better results than traditional methods. It is a new form of extracting information from time series and can be easily extended to other tasks. / Séries temporais são conjuntos de dados ordenados no tempo. Devido à ubiquidade desses dados, seu estudo é interessante para muitos campos da ciência. A mineração de dados temporais é uma área de pesquisa que tem como objetivo extrair informações desses dados relacionados no tempo. Para isso, modelos são usados para descrever as séries e buscar por padrões. Uma forma de modelar séries temporais é por meio de redes complexas. Nessa modelagem, um mapeamento é feito do espaço temporal para o espaço topológico, o que permite avaliar dados temporais usando técnicas de redes. Nesta tese, apresentamos soluções para tarefas de mineração de dados de séries temporais usando redes complexas. O objetivo principal foi avaliar os benefícios do uso da teoria de redes para extrair informações de dados temporais. Concentramo-nos em três tarefas de mineração. (1) Na tarefa de agrupamento, cada série temporal é representada por um vértice e as arestas são criadas entre as séries de acordo com sua similaridade. Os algoritmos de detecção de comunidades podem ser usados para agrupar séries semelhantes. Os resultados mostram que esta abordagem apresenta melhores resultados do que os resultados de agrupamento tradicional. (2) Na tarefa de classificação, cada série temporal rotulada em um banco de dados é mapeada para um gráfico de visibilidade. A classificação é realizada transformando uma série temporal não marcada em um gráfico de visibilidade e comparando-a com os gráficos rotulados usando uma função de distância. O novo rótulo é dado pelo rótulo mais frequente nos k grafos mais próximos. (3) Na tarefa de detecção de periodicidade, uma série temporal é primeiramente transformada em um gráfico de visibilidade. Máximos locais em uma série temporal geralmente são mapeados para vértices altamente conectados que ligam duas comunidades. O método proposto utiliza a estrutura de comunidades para realizar a detecção de períodos em séries temporais. Este método é robusto para dados ruidosos e não requer parâmetros. Com os métodos e resultados apresentados nesta tese, concluímos que a teoria da redes complexas é benéfica para a mineração de dados em séries temporais. Além disso, esta abordagem pode proporcionar melhores resultados do que os métodos tradicionais e é uma nova forma de extrair informações de séries temporais que pode ser facilmente estendida para outras tarefas.
58

Identification des motifs de voisinage conservés dans des contextes métaboliques et génomiques / Mining conserved neighborhood patterns in metabolic and genomic contexts

Zaharia, Alexandra 28 September 2018 (has links)
Cette thèse s'inscrit dans le cadre de la biologie des systèmes et porte plus particulièrement sur un problème relatif aux réseaux biologiques hétérogènes. Elle se concentre sur les relations entre le métabolisme et le contexte génomique, en utilisant une approche de fouille de graphes.Il est communément admis que des étapes enzymatiques successives impliquant des produits de gènes situés à proximité sur le chromosome traduisent un avantage évolutif du maintien de cette relation de voisinage au niveau métabolique ainsi que génomique. En conséquence, nous choisissons de nous concentrer sur la détection de réactions voisines catalysées par des produits de gènes voisins, où la notion de voisinage peut être modulée en autorisant que certaines réactions et/ou gènes soient omis. Plus spécifiquement, les motifs recherchés sont des trails de réactions (c'est-à-dire des séquences de réactions pouvant répéter des réactions, mais pas les liens entre elles) catalysées par des produits de gènes voisins. De tels motifs de voisinage sont appelés des motifs métaboliques et génomiques.De plus, on s'intéresse aux motifs de voisinage métabolique et génomique conservés, c'est-à-dire à des motifs similaires pour plusieurs espèces. Parmi les variations considérées pour un motif conservé, on considère l'absence/présence de réactions et/ou de gènes, ou leur ordre différent.Dans un premier temps, nous proposons des algorithmes et des méthodes afin d'identifier des motifs de voisinage métabolique et génomique conservés. Ces méthodes sont implémentées dans le pipeline libre CoMetGeNe (COnserved METabolic and GEnomic NEighborhoods). À l'aide de CoMetGeNe, on analyse une sélection de 50 espèces bactériennes, en utilisant des données issues de la base de connaissances KEGG.Dans un second temps, un développement de la détection de motifs conservés est exploré en prenant en compte la similarité chimique entre réactions. Il permet de mettre en évidence une classe de modules métaboliques conservés, caractérisée par le voisinage des gènes intervenants. / This thesis fits within the field of systems biology and addresses a problem related to heterogeneous biological networks. It focuses on the relationship between metabolism and genomic context through a graph mining approach.It is well-known that succeeding enzymatic steps involving products of genes in close proximity on the chromosome translate an evolutionary advantage in maintaining this neighborhood relationship at both the metabolic and genomic levels. We therefore choose to focus on the detection of neighboring reactions being catalyzed by products of neighboring genes, where the notion of neighborhood may be modulated by allowing the omission of several reactions and/or genes. More specifically, the sought motifs are trails of reactions (meaning reaction sequences in which reactions may be repeated, but not the links between them). Such neighborhood motifs are referred to as metabolic and genomic patterns.In addition, we are also interested in detecting conserved metabolic and genomic patterns, meaning similar patterns across multiple species. Among the possible variations for a conserved pattern, the presence/absence of reactions and/or genes may be considered, or the different order of reactions and/or genes.A first development proposes algorithms and methods for the identification of conserved metabolic and genomic patterns. These methods are implemented in an open-source pipeline called CoMetGeNe (COnserved METabolic and GEnomic NEighborhoods). By means of this pipeline, we analyze a data set of 50 bacterial species, using data extracted from the KEGG knowledge base.A second development explores the detection of conserved patterns by taking into account the chemical similarity between reactions. This allows for the detection of a class of conserved metabolic modules in which neighboring genes are involved.
59

On the discovery of relevant structures in dynamic and heterogeneous data

Preti, Giulia 22 October 2019 (has links)
We are witnessing an explosion of available data coming from a huge amount of sources and domains, which is leading to the creation of datasets larger and larger, as well as richer and richer. Understanding, processing, and extracting useful information from those datasets requires specialized algorithms that take into consideration both the dynamism and the heterogeneity of the data they contain. Although several pattern mining techniques have been proposed in the literature, most of them fall short in providing interesting structures when the data can be interpreted differently from user to user, when it can change from time to time, and when it has different representations. In this thesis, we propose novel approaches that go beyond the traditional pattern mining algorithms, and can effectively and efficiently discover relevant structures in dynamic and heterogeneous settings. In particular, we address the task of pattern mining in multi-weighted graphs, pattern mining in dynamic graphs, and pattern mining in heterogeneous temporal databases. In pattern mining in multi-weighted graphs, we consider the problem of mining patterns for a new category of graphs called emph{multi-weighted graphs}. In these graphs, nodes and edges can carry multiple weights that represent, for example, the preferences of different users or applications, and that are used to assess the relevance of the patterns. We introduce a novel family of scoring functions that assign a score to each pattern based on both the weights of its appearances and their number, and that respect the anti-monotone property, pivotal for efficient implementations. We then propose a centralized and a distributed algorithm that solve the problem both exactly and approximately. The approximate solution has better scalability in terms of the number of edge weighting functions, while achieving good accuracy in the results found. An extensive experimental study shows the advantages and disadvantages of our strategies, and proves their effectiveness. Then, in pattern mining in dynamic graphs, we focus on the particular task of discovering structures that are both well-connected and correlated over time, in graphs where nodes and edges can change over time. These structures represent edges that are topologically close and exhibit a similar behavior of appearance and disappearance in the snapshots of the graph. To this aim, we introduce two measures for computing the density of a subgraph whose edges change in time, and a measure to compute their correlation. The density measures are able to detect subgraphs that are silent in some periods of time but highly connected in the others, and thus they can detect events or anomalies happened in the network. The correlation measure can identify groups of edges that tend to co-appear together, as well as edges that are characterized by similar levels of activity. For both variants of density measure, we provide an effective solution that enumerates all the maximal subgraphs whose density and correlation exceed given minimum thresholds, but can also return a more compact subset of representative subgraphs that exhibit high levels of pairwise dissimilarity. Furthermore, we propose an approximate algorithm that scales well with the size of the network, while achieving a high accuracy. We evaluate our framework with an extensive set of experiments on both real and synthetic datasets, and compare its performance with the main competitor algorithm. The results confirm the correctness of the exact solution, the high accuracy of the approximate, and the superiority of our framework over the existing solutions. In addition, they demonstrate the scalability of the framework and its applicability to networks of different nature. Finally, we address the problem of entity resolution in heterogeneous temporal data-ba-se-s, which are datasets that contain records that give different descriptions of the status of real-world entities at different periods of time, and thus are characterized by different sets of attributes that can change over time. Detecting records that refer to the same entity in such scenario requires a record similarity measure that takes into account the temporal information and that is aware of the absence of a common fixed schema between the records. However, existing record matching approaches either ignore the dynamism in the attribute values of the records, or assume that all the records share the same set of attributes throughout time. In this thesis, we propose a novel time-aware schema-agnostic similarity measure for temporal records to find pairs of matching records, and integrate it into an exact and an approximate algorithm. The exact algorithm can find all the maximal groups of pairwise similar records in the database. The approximate algorithm, on the other hand, can achieve higher scalability with the size of the dataset and the number of attributes, by relying on a technique called meta-blocking. This algorithm can find a good-quality approximation of the actual groups of similar records, by adopting an effective and efficient clustering algorithm.
60

Defect Localization using Dynamic Call Tree Mining and Matching and Request Replication: An Alternative to QoS-aware Service Selection

Yousefi, Anis 04 1900 (has links)
<p>This thesis is concerned with two separate subjects; (i) Defect localization using tree mining and tree matching, and (ii) Quality-of-service-aware service selection; it is divided into these parts accordingly.</p> / <p>This thesis is concerned with two separate subjects; (i) Defect localization using tree mining and tree matching, and (ii) Quality-of-service-aware service selection; it is divided into these parts accordingly.</p> <p>In the first part of this thesis we present a novel technique for defect localization which is able to localize call-graph-affecting defects using tree mining and tree matching techniques. In this approach, given a set of successful executions and a failing execution and by following a series of analyses we generate an extended report of suspicious method calls. The proposed defect localization technique is implemented as a prototype and evaluated using four subject programs of various sizes, developed in Java or C. Our experiments show comparable results to similar defect localization tools, but unlike most of its counterparts, we do not require the availability of multiple failing executions to localize the defects. We believe that this is a major advantage, since it is often the case that we have only a single failing execution to work with. Potential risks of the proposed technique are also investigated.</p> <p>In the second part of this thesis we present an alternative strategy for service selection in service oriented architecture, which provides better quality services for less cost. The proposed Request Replication technique replicates a client’s request over a number of cheap, low quality services to gain the required quality of service. Following this approach, we also present a number of recommendations about how service providers should advertise non-functional properties of their services.</p> / Doctor of Philosophy (PhD)

Page generated in 0.0826 seconds