• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1353
  • 364
  • 187
  • 127
  • 69
  • 39
  • 37
  • 33
  • 26
  • 25
  • 22
  • 21
  • 19
  • 12
  • 9
  • Tagged with
  • 2707
  • 611
  • 529
  • 428
  • 401
  • 338
  • 287
  • 283
  • 278
  • 247
  • 241
  • 209
  • 206
  • 205
  • 193
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
351

Étude exhaustive de voies de signalisation de grande taille par clustering des trajectoires et caractérisation par analyse sémantique / Comprehensive study of large signaling pathways by clustering trajectories and characterization by semantic analysis

Coquet, Jean 20 December 2017 (has links)
Les voies de signalisation décrivent les réponses d'une cellule à des stimuli externes. Elles sont primordiales dans les processus biologiques tels que la différentiation, la prolifération ou encore l'apoptose. La biologie des systèmes tentent d'étudier ces voies de façon exhaustive à partir de modèles statistiques ou dynamiques. Le nombre de solutions expliquant un phénomène biologique (par exemple la réaction d'une cellule à un stimulus) peut être très élevé dans le cas de grands modèles. Cette thèse propose, dans un premier temps, différentes stratégies de regroupement de ces solutions à partir de méthodes de clustering et d'analyse de concepts formels. Puis elle présente la caractérisation de ces regroupements à partir de web sémantique. Ces stratégies ont été appliquées au réseau de signalisation du TGF-beta, un stimulus extra-cellulaire jouant un rôle important dans le développement du cancer, ce qui a permis d'identifier cinq grands groupes de trajectoires participant chacun à des processus biologiques différents. Dans un second temps, cette thèse se confronte au problème de conversion des données hétérogènes provenant de différentes bases dans un formalisme unique afin de pouvoir généraliser l'étude précédente. Elle propose une stratégie permettant de regrouper les différents réseaux de signalisation provenant d'une base de données en un modèle unique et ainsi permettant de calculer toutes les trajectoires de signalisation d'un stimulus. / Signaling pathways describe the extern stimuli responses of a cell. They are indispensable in biological processes such as differentiation, proliferation or apoptosis. The Systems Biology tries to study exhaustively the signalling pathways using static or dynamic models. The number of solutions which explain a biological phenomenon (for example the stimulus reaction of cell) can be very high in large models. First, this thesis proposes some different strategies to group the solutions describing the stimulus signalling with clustering methods and Formal Concept Analysis. Then, it presents the cluster characterization with semantic web methods. Those strategies have been applied to the TGF-beta signaling network, an extracellular stimulus playing an important role in the cancer growing, which helped to identify 5 large groups of trajectories characterized by different biological processes. Next, this thesis confronts the problem of heterogeneous data translation from different bases to a unique formalism. The goal is to be able to generalize the previous study. It proposes a strategy to group signaling pathways of a database to an unique model, then to calculate every signaling trajectory of the stimulus.
352

Découverte et exploitation de proportions analogiques dans les bases de données relationnelles / Discovering and exploiting analogical proportions in a relational database context

Correa Beltran, William 18 July 2016 (has links)
Dans cette thèse, nous nous intéressons aux proportions analogiques dans le contexte des bases de données relationnelles. Les proportions analogiques permettent de lier quatre éléments dans une relation du type ''A est à B ce que C est à D''. Par exemple, « Paris est à la France ce que Rome est à l'Italie ». Nous avons étudié le problème de la prédiction de valeurs manquantes dans une base de données en utilisant les proportions analogiques. Un algorithme de classification fondé sur les proportions analogiques a été modifié afin de résoudre ce problème. Puis, nous avons étudié les propriétés des éléments appartenant à l'ensemble d'apprentissage des classificateurs analogiques fréquemment exploités pour calculer la prédiction. Ceci nous a permis de réduire considérablement la taille de cet ensemble par élimination des éléments peu pertinents et par conséquent, de diminuer les temps d'exécution de ces classificateurs. La deuxième partie de la thèse a pour objectif de découvrir de nouveaux patrons basés sur la relation d'analogie, i.e., des parallèles, dans les bases de données. Nous avons montré qu'il est possible d'extraire ces patrons en s'appuyant sur des approches de clustering. Les clusters produits par de telles techniques présentent aussi un intérêt pour l'évaluation de requêtes recherchant des patrons d'analogie dans les bases de données. Dans cette perspective, nous avons proposé d'étendre le langage de requêtes SQL pour pouvoir trouver des quadruplets d'une base de données satisfaisant une proportion analogique. Nous avons proposé différentes stratégies d'évaluation pour de telles requêtes, et avons comparé expérimentalementleurs performances. / In this thesis, we are interested in the notion of analogical proportions in a relational database context. An analogical proportion is a statement of the form “A is to B as C is to D”, expressing that the relation beween A and B is the same as the relation between C and D. For instance, one may say that “Paris is to France as Rome is to Italy”. We studied the problem of imputing missing values in a relational database by means of analogical proportions. A classification algorithm based on analogical proportions has been modified in order to impute missing values. Then, we studied how analogical classifiers work in order to see if their processing could be simplified. We showed how some typeof analogical proportions is more useful than the others when performing classification. We then proposed an algorithm using this information, which allowed us to considerably reduce the size of the training set used by the analogical classificationalgorithm, and hence to reduce its execution time. In the second part of this thesis, we payed a particular attention to the mining of combinations of four tuples bound by an analogical relationship. For doing so, we used several clustering algorithms, and we proposed some modifications to them, in order tomake each obtained cluster represent a set of analogical proportions. Using the results of the clustering algorithms, we studied how to efficiently retrieve the analogical proportions in a database by means of queries. For doing so, we proposed to extend the SQL query language in order to retrieve from a database the quadruples of tuples satisfying an analogical proportion. We proposed severalquery evaluation strategies and experimentally compared their performances.
353

Diagrammes d’Euler pour la visualisation de communautés et d’ensembles chevauchants / Visualisation of overlapping sets and clusters with Euler diagrams

Simonetto, Paolo 02 December 2011 (has links)
Dans cette thèse, nous proposons une méthode pour la visualisation d'ensembles chevauchant et de basé sur les diagrammes d'Euler. Les diagrammes d'Euler sont probablement les plus intuitifs pour représenter de manière schématique les ensembles qui partagent des éléments. Cette métaphore visuelle est ainsi un outil puissant en termes de visualisation d'information. Cependant, la génération automatique de ces diagrammes présente encore de nombreux problèmes difficiles. Premièrement, tous les clustering chevauchants ne peuvent pas être dessinées avec les diagrammes d'Euler classiques. Deuxièmement, la plupart des algorithmes existants permettent uniquement de représenter les diagrammes de dimensions modestes. Troisièmement, les besoins des applications réelles requièrent un processus plus fiable et plus rapide.Dans cette thèse, nous décrivons une version étendue des diagrammes d'Euler. Cette extension permet de modéliser l'ensemble des instances de la classe des clustering chevauchants. Nous proposons ensuite un algorithme automatique de génération de cette extension des diagrammes d'Euler. Enfin, nous présentons une implémentation logicielle et des expérimentations de ce nouvel algorithme. / In this thesis, we propose a method for the visualisation of overlapping sets and of fuzzy graph clusterings based on Euler diagrams.Euler diagrams are probably the most intuitive and most used method to depict sets in which elements can be shared. Such a powerful visualisation metaphor could be an invaluable visualisation tool, but the automatic generation of Euler diagrams still presents many challenging problems. First, not all instances can be drawn using standard Euler diagrams. Second, most existing algorithms focus on diagrams of modest dimensions while real-world applications typically features much larger data. Third, the generation process must be reliable and reasonably fast.In this thesis, we describe an extended version of Euler diagrams that can be produced for every input instance. We then propose an automatic procedure for the generation of such diagrams that specifically target large input instances. Finally, we present a software implementation of this method and we describe some output examples generated on real-world data.
354

Visualização, kernels e subespaços: um estudo prático / Visualization, kernels and subspace: a practical study

Adriano Oliveira Barbosa 16 December 2016 (has links)
Dados de alta dimensão são tipicamente tratados como pertencentes a um único subespaço do espaço onde estão imersos. Entretanto, dados utilizados em aplicações reais estão usualmente distribuídos entre subespaços independentes e com dimensões distintas. Um objeto de estudo surge a partir dessa afirmação: como essa distribuição em subespaços independentes pode auxiliar tarefas de visualização? Por outro lado, se o dado parece estar embaralhado nesse espaço de alta dimensão, como visualizar seus padrões e realizar tarefas como classificação? Podemos, por exemplo, mapear esse dado num outro espaço utilizando uma função capaz de o desembaralhar, de modo que os padrões intrínsecos fiquem mais claros e, assim, facilitando nossa tarefa de visualização ou classificação. Essa Tese apresenta dois estudos que abordam ambos os problemas. Para o primeiro, utilizamos técnicas de subspace clustering para definir, quando existente, a estrutura de subespaços do dado e estudamos como essa informação pode auxiliar em visualizações utilizando projeções multidimensionais. Para o segundo problema, métodos de kernel, bastante conhecidos na literatura, são as ferramentas a nos auxiliar. Utilizamos a medida de similaridade do kernel para desenvolver uma nova técnica de projeção multidimensional capaz de lidar com dados imersos no espaço de características induzido implicitamente pelo kernel. / High-dimensional data are typically handled as laying in a single subspace of the original space. However, data involved in real applications are usually spread around in distinct subspaces which may have different dimensions. We would like to study how the subspace structure information can be used to improve visualization tasks. On the other hand, what if the data is tangled in this high-dimensional space, how to visualize its patterns or how to accomplish classification tasks? One could, for example, map the data in another high-dimensional space using amapping capable of untangle the data making the patterns clear, rendering the visualization or classification an easy task. This dissertation presents an study for both problems pointed out above. For the former, we use subspace clustering techniques to define, when it exists, a subspace structure, studying how this information can be used to support visualization tasks based on multidimensional projections. For the latter problem we employ kernel methods, well known in the literature, as a tool to assist visualization tasks. We use a similarity measure given by the kernel to develop acompletely new multidimensional projection technique capable of dealing with data embedded in the implicit feature space defined by the kernel.
355

Intégration des facteurs prédictifs de l'effet d'un traitement dans la conception et l'analyse des essais cliniques de petite taille : application à la maladie de Huntington. / Integration of predictive factors of treatment effect in design and analyse of clinical trials with small sample size : application on Huntington's disease

Schramm, Catherine 06 July 2016 (has links)
La maladie de Huntington est neurodégénérative, génétique, rare, multifacette et de durée d'évolution longue, induisant une grande. Les biothérapies en cours d'essai sont réalisées sur des petits effectifs, avec un effet mesurable à long terme et hétérogène. Identifier des marqueurs d'évolution de la maladie et de réponse au traitement permettrait de mieux comprendre et d'améliorer les résultats des futurs essais cliniques. Nous avons développé une méthode de clustering pour l'efficacité d'un traitement dans le cadre de données longitudinales afin de définir des répondeurs et non répondeurs au traitement. Notre méthode, robuste pour les petits effectifs, combine un modèle linéaire mixte à deux pentes et un algorithme de clustering. Le modèle mixte génère des effets aléatoires, associés à la réponse au traitement, propres à chaque patient. L'algorithme de clustering permet de définir des sous-groupes selon la valeur des effets aléatoires. Trouver des sous-groupes de patients répondeurs permet de définir des marqueurs prédictifs de la réponse au traitement qui seront utilisés pour donner le traitement le mieux adapté à chaque patient. Nous avons discuté de l'intégration (i) des marqueurs prédictifs dans les plans expérimentaux des essais cliniques, en évaluant leur impact sur la puissance de l'étude; et (ii) des marqueurs pronostiques, en étudiant l¿impact du polymorphisme COMT sur le déclin cognitif des patients. Enfin, nous avons évalué l'effet d'apprentissage des tests neuropsychologiques, et montré comment une double évaluation à l'inclusion dans un essai clinique permettait de s'en affranchir quand le critère de jugement principal est le déclin cognitif. / Huntington's disease is neurodegenerative, genetic, rare, multifaceted and has a long evolution, inducing heterogeneity of conditions and progression of the disease. Current biotherapy trials are performed on small samples of patients, with a treatment effect measurable in the long-term that is heterogeneous. Identifying markers of the disease progression and of the treatment response may help to better understand and improve results of biotherapy studies in Huntington's disease. We have developed a clustering method for the treatment efficacy in the case of longitudinal data in order to identify treatment responders and nonresponders. Our method combines a linear mixed model with two slopes and a classical clustering algorithm. The mixed model generates random effects associated with treatment response, specific to each patient. The clustering algorithm is used to define subgroups according to the value of the random effects. Our method is robust in case of small samples. Finding subgroups of responders may help to define predictive markers of treatment response which will be used to give the most appropriate treatment for each patient. We discussed integration of (i) the predictive markers in study design of future clinical trials, assessing their impact on the power of the study; and (ii) the prognostic markers of disease progression by studying the COMT polymorphism as a prognostic marker of cognitive decline in Huntington's disease. Finally, we evaluated the learning effect of neuropsychological tasks measuring cognitive abilities, and showed how a double baseline in a clinical trial could take it into account when the primary outcome is the cognitive decline.
356

Utilisation des modèles de co-clustering pour l'analyse exploratoire des données / No English title available

Guigourès, Romain 04 December 2013 (has links)
Le co-clustering est une technique de classification consistant à réaliser une partition simultanée des lignes et des colonnes d’une matrice de données. Parmi les approches existantes, MODL permet de traiter des données volumineuses et de réaliser une partition de plusieurs variables, continues ou nominales. Nous utilisons cette approche comme référence dans l’ensemble des travaux de la thèse et montrons la diversité des problèmes de data mining pouvant être traités, comme le partitionnement de graphes, de graphes temporels ou encore le clustering de courbes. L’approche MODL permet d’obtenir des résultats fins sur des données volumineuses, ce qui les rend difficilement interprétables. Des outils d’analyse exploratoire sont alors nécessaires pour les exploiter. Afin de guider l'utilisateur dans l'interprétation de tels résultats, nous définissons plusieurs outils consistant à simplifier des résultats fins afin d’en avoir une interprétation globale, à détecter les clusters remarquables, à déterminer les valeurs représentatives de leurs clusters et enfin à visualiser les résultats. Les comportements asymptotiques de ces outils d’analyse exploratoire sont étudiés afin de faire le lien avec les approches existantes.Enfin une application sur des comptes-rendus d’appels de l’opérateur Orange, collectés en Côte d’Ivoire, montre l’intérêt de l’approche et des outils d’analyse exploratoire dans un contexte industriel. / Co-clustering is a clustering technique aiming at simultaneously partitioning the rows and the columns of a data matrix. Among the existing approaches, MODL is suitable for processing huge data sets with several continuous or categorical variables. We use it as the baseline approach in this thesis. We discuss the reliability of applying such an approach on data mining problems like graphs partitioning, temporal graphs segmentation or curve clustering.MODL tracks very fine patterns in huge data sets, that makes the results difficult to study. That is why, exploratory analysis tools must be defined in order to explore them. In order to help the user in interpreting the results, we define exploratory analysis tools aiming at simplifying the results in order to make possible an overall interpretation, tracking the most interesting patterns, determining the most representative values of the clusters and visualizing the results. We investigate the asymptotic behavior of these exploratory analysis tools in order to make the connection with the existing approaches.Finally, we highlight the value of MODL and the exploratory analysis tools owing to an application on call detailed records from the telecom operator Orange, collected in Ivory Coast.
357

Vocation Clustering for Heavy-Duty Vehicles

Daniel Patrick Kobold Jr (9719936) 07 January 2021 (has links)
<p>The identification of the vocation of an unknown heavy-duty vehicle is valuable to parts manufacturers who may not have otherwise access to this information on a consistent basis. This study proposes a methodology for vocation identification that is based on clustering techniques. Two clustering algorithms are considered: K-Means and Expectation Maximization. These algorithms are used to first construct the operating profile of each vocation from a set of vehicles with known vocations. The vocation of an unknown vehicle is then determined using different assignment methods.</p> <p> </p> <p>These methods fall under two main categories: one-versus-all and one-versus-one. The one-versus-all approach compares an unknown vehicle to all potential vocations. The one-versus-one approach compares the unknown vehicle to two vocations at a time in a tournament fashion. Two types of tournaments are investigated: round-robin and bracket. The accuracy and efficiency of each of the methods is evaluated using the NREL FleetDNA dataset.</p> <p> </p> <p>The study revealed that some of the vocations may have unique operating profiles and are therefore easily distinguishable from others. Other vocations, however, can have confounding profiles. This indicates that different vocations may benefit from profiles with varying number of clusters. Determining the optimal number of clusters for each vocation can not only improve the assignment accuracy, but also enhance the computational efficiency of the application. The optimal number of clusters for each vocation is determined using both static and dynamic techniques. Static approaches refer to methods that are completed prior to training and may require multiple iterations. Dynamic techniques involve clusters being split or removed during training. The results show that the accuracy of dynamic techniques is comparable to that of static approaches while benefiting from a reduced computational time.</p>
358

Sparse subspace clustering-based motion segmentation with complete occlusion handling

Mattheus, Jana January 2021 (has links)
Motion segmentation is part of the computer vision field and aims to find the moving parts in a video sequence. It is used in applications such as autonomous driving, surveillance, robotics, human motion analysis, and video indexing. Since there are so many applications, motion segmentation is ill-defined and the research field is vast. Despite the advances in the research over the years, the existing methods are still far behind human capabilities. Problems such as changes in illumination, camera motion, noise, mixtures of motion, missing data, and occlusion remain challenges. Feature-based approaches have grown in popularity over the years, especially manifold clustering methods due to their strong mathematical foundation. Methods exploiting sparse and low-rank representations are often used since the dimensionality of the data is reduced while useful information regarding the motion segments is extracted. However, these methods are unable to effectively handle large and complete occlusions as well as missing data since they tend to fail when the amount of missing data becomes too large. An algorithm based on Sparse Subspace Clustering (SSC) has been proposed to address the issue of occlusions and missing data so that SSC can handle these cases with high accuracy. A frame-to-frame analysis was adopted as a pre-processing step to identify motion segments between consecutive frames, called inter-frame motion segments. The pre-processing step is called Multiple Split-And-Merge (MSAM), which is based on the classic top-down split-and-merge algorithm. Only points present in both frame pairs are segmented. This means that a point undergoing an occlusion is only assigned to a motion class when it has been visible for two consecutive frames after re-entering the camera view. Once all the inter-frame segments have been extracted, the results are combined in a single matrix and used as the input for the classic SSC algorithm. Therefore, SSC segments inter-frame motion segments rather than point trajectories. The resulting algorithm is referred to as MSAM-SSC. MSAM-SSC outperformed some of the most popular manifold clustering methods on the Hopkins155 and KT3DMoSeg datasets. It was also able to handle complete occlusions and 50% missing data sequences, as well as outliers. The algorithm can handle mixtures of motions and different numbers of motions. However, it was found that MSAM-SSC is more suited for traffic and articulate motion scenes which are often used in applications such as robotics, surveillance, and autonomous driving. For future work, the algorithm can be optimised to reduce the execution time so that it can be used for real-time applications. Additionally, the number of moving objects in the scene can be estimated to obtain a method that does not rely on prior knowledge. / Dissertation (MEng (Computer Engineering))--University of Pretoria, 2021. / CSIR / Electrical, Electronic and Computer Engineering / MEng (Computer Engineering) / Unrestricted
359

Segmentace stránky ve webovém prohlížeči / Page Segmentation in a Web Browser

Zubrik, Tomáš January 2021 (has links)
This thesis deals with the web page segmentation in a web browser. The implementation of Box Clustering Segmentation (BCS) method in JavaScript using an automated browser was created. The actual implementation consists of two main steps, which are the box extraction (leaf DOM nodes) from the browser context and their subsequent clustering based on the similarity model defined in BCS. Main result of this thesis is a functional implementation of BCS method usable for web page segmentation. The evaluation of the functionality and accuracy of the implementation is based on a comparison with a reference implementation created in Java.
360

Sparse Similarity and Network Navigability for Markov Clustering Enhancement

Durán Cancino, Claudio Patricio 29 September 2021 (has links)
Markov clustering (MCL) is an effective unsupervised pattern recognition algorithm for data clustering in high-dimensional feature space that simulates stochastic flows on a network of sample similarities to detect the structural organization of clusters in the data. However, it presents two main drawbacks: (1) its community detection performance in complex networks has been demonstrating results far from the state-of-the-art methods such as Infomap and Louvain, and (2) it has never been generalized to deal with data nonlinearity. In this work both aspects, although closely related, are taken as separated issues and addressed as such. Regarding the community detection, field under the network science ceiling, the crucial issue is to convert the unweighted network topology into a ‘smart enough’ pre-weighted connectivity that adequately steers the stochastic flow procedure behind Markov clustering. Here a conceptual innovation is introduced and discussed focusing on how to leverage network latent geometry notions in order to design similarity measures for pre-weighting the adjacency matrix used in Markov clustering community detection. The results demonstrate that the proposed strategy improves Markov clustering significantly, to the extent that it is often close to the performance of current state-of-the-art methods for community detection. These findings emerge considering both synthetic ‘realistic’ networks (with known ground-truth communities) and real networks (with community metadata), even when the real network connectivity is corrupted by noise artificially induced by missing or spurious links. Regarding the nonlinearity aspect, the development of algorithms for unsupervised pattern recognition by nonlinear clustering is a notable problem in data science. Minimum Curvilinearity (MC) is a principle that approximates nonlinear sample distances in the high-dimensional feature space by curvilinear distances, which are computed as transversal paths over their minimum spanning tree, and then stored in a kernel. Here, a nonlinear MCL algorithm termed MC-MCL is proposed, which is the first nonlinear kernel extension of MCL and exploits Minimum Curvilinearity to enhance the performance of MCL in real and synthetic high-dimensional data with underlying nonlinear patterns. Furthermore, improvements in the design of the so-called MC-kernel by applying base modifications to better approximate the data hidden geometry have been evaluated with positive outcomes. Thus, different nonlinear MCL versions are compared with baseline and state-of-art clustering methods, including DBSCAN, K-means, affinity propagation, density peaks, and deep-clustering. As result, the design of a suitable nonlinear kernel provides a valuable framework to estimate nonlinear distances when its kernel is applied in combination with MCL. Indeed, nonlinear-MCL variants overcome classical MCL and even state-of-art clustering algorithms in different nonlinear datasets. This dissertation discusses the enhancements and the generalized understanding of how network geometry plays a fundamental role in designing algorithms based on network navigability.

Page generated in 0.1078 seconds