• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 66
  • 43
  • Tagged with
  • 109
  • 109
  • 81
  • 45
  • 40
  • 40
  • 40
  • 38
  • 34
  • 22
  • 18
  • 17
  • 17
  • 17
  • 17
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Sélection séquentielle en environnement aléatoire appliquée à l'apprentissage supervisé

Caelen, Olivier 25 September 2009 (has links)
Cette thèse se penche sur les problèmes de décisions devant être prises de manière séquentielle au sein d'un environnement aléatoire. Lors de chaque étape d'un tel problème décisionnel, une alternative doit être sélectionnée parmi un ensemble d'alternatives. Chaque alternative possède un gain moyen qui lui est propre et lorsque l'une d'elles est sélectionnée, celle-ci engendre un gain aléatoire. La sélection opérée peut suivre deux types d'objectifs.<p>Dans un premier cas, les tests viseront à maximiser la somme des gains collectés. Un juste compromis doit alors être trouvé entre l'exploitation et l'exploration. Ce problème est couramment dénommé dans la littérature scientifique "multi-armed bandit problem".<p>Dans un second cas, un nombre de sélections maximal est imposé et l'objectif consistera à répartir ces sélections de façon à augmenter les chances de trouver l'alternative présentant le gain moyen le plus élevé. Ce deuxième problème est couramment repris dans la littérature scientifique sous l'appellation "selecting the best".<p>La sélection de type gloutonne joue un rôle important dans la résolution de ces problèmes de décision et opère en choisissant l'alternative qui s'est jusqu'ici montrée optimale. Or, la nature généralement aléatoire de l'environnement rend incertains les résultats d'une telle sélection. <p>Dans cette thèse, nous introduisons une nouvelle quantité, appelée le "gain espéré d'une action gloutonne". Sur base de quelques propriétés de cette quantité, de nouveaux algorithmes permettant de résoudre les deux problèmes décisionnels précités seront proposés.<p>Une attention particulière sera ici prêtée à l'application des techniques présentées au domaine de la sélection de modèles en l'apprentissage artificiel supervisé. <p>La collaboration avec le service d'anesthésie de l'Hôpital Erasme nous a permis d'appliquer les algorithmes proposés à des données réelles, provenant du milieu médical. Nous avons également développé un système d'aide à la décision dont un prototype a déjà été testé en conditions réelles sur un échantillon restreint de patients. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
42

Combinatorial aspects of genome rearrangements and haplotype networks / Aspects combinatoires des réarrangements génomiques et des réseaux d'haplotypes

Labarre, Anthony 12 September 2008 (has links)
The dissertation covers two problems motivated by computational biology: genome rearrangements, and haplotype networks.<p><p>Genome rearrangement problems are a particular case of edit distance problems, where one seeks to transform two given objects into one another using as few operations as possible, with the additional constraint that the set of allowed operations is fixed beforehand; we are also interested in computing the corresponding distances between those objects, i.e. merely computing the minimum number of operations rather than an optimal sequence. Genome rearrangement problems can often be formulated as sorting problems on permutations (viewed as linear orderings of {1,2,n}) using as few (allowed) operations as possible. In this thesis, we focus among other operations on ``transpositions', which displace intervals of a permutation. Many questions related to sorting by transpositions are open, related in particular to its computational complexity. We use the disjoint cycle decomposition of permutations, rather than the ``standard tools' used in genome rearrangements, to prove new upper bounds on the transposition distance, as well as formulae for computing the exact distance in polynomial time in many cases. This decomposition also allows us to solve a counting problem related to the ``cycle graph' of Bafna and Pevzner, and to construct a general framework for obtaining lower bounds on any edit distance between permutations by recasting their computation as factorisation problems on related even permutations.<p><p>Haplotype networks are graphs in which a subset of vertices is labelled, used in comparative genomics as an alternative to trees. We formalise a new method due to Cassens, Mardulyn and Milinkovitch, which consists in building a graph containing a given set of partially labelled trees and with as few edges as possible. We give exact algorithms for solving the problem on two graphs, with an exponential running time in the general case but with a polynomial running time if at least one of the graphs belong to a particular class.<p>/<p>La thèse couvre deux problèmes motivés par la biologie: l'étude des réarrangements génomiques, et celle des réseaux d'haplotypes.<p><p>Les problèmes de réarrangements génomiques sont un cas particulier des problèmes de distances d'édition, où l'on cherche à transformer un objet en un autre en utilisant le plus petit nombre possible d'opérations, les opérations autorisées étant fixées au préalable; on s'intéresse également à la distance entre les deux objets, c'est-à-dire au calcul du nombre d'opérations dans une séquence optimale plutôt qu'à la recherche d'une telle séquence. Les problèmes de réarrangements génomiques peuvent souvent s'exprimer comme des problèmes de tri de permutations (vues comme des arrangements linéaires de {1,2,n}) en utilisant le plus petit nombre d'opérations (autorisées) possible. Nous examinons en particulier les ``transpositions', qui déplacent un intervalle de la permutation. Beaucoup de problèmes liés au tri par transpositions sont ouverts, en particulier sa complexité algorithmique. Nous nous écartons des ``outils standards' utilisés dans le domaine des réarrangements génomiques, et utilisons la décomposition en cycles disjoints des permutations pour prouver de nouvelles majorations sur la distance des transpositions ainsi que des formules permettant de calculer cette distance en temps polynomial dans de nombreux cas. Cette décomposition nous sert également à résoudre un problème d'énumération concernant le ``graphe des cycles' de Bafna et Pevzner, et à construire une technique générale permettant d'obtenir de nouvelles minorations en reformulant tous les problèmes de distances d'édition sur les permutations en termes de factorisations de permutations paires associées.<p><p>Les réseaux d'haplotypes sont des graphes dont une partie des sommets porte des étiquettes, utilisés en génomique comparative quand les arbres sont trop restrictifs, ou quand l'on ne peut choisir une ``meilleure' topologie parmi un ensemble donné d'arbres. Nous formalisons une nouvelle méthode due à Cassens, Mardulyn et Milinkovitch, qui consiste à construire un graphe contenant tous les arbres partiellement étiquetés donnés et possédant le moins d'arêtes possible, et donnons des algorithmes résolvant le problème de manière optimale sur deux graphes, dont le temps d'exécution est exponentiel en général mais polynomial dans quelques cas que nous caractérisons.<p> / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
43

Fault detection in autonomous robots

Christensen, Anders Lyhne 27 June 2008 (has links)
In this dissertation, we study two new approaches to fault detection for autonomous robots. The first approach involves the synthesis of software components that give a robot the capacity to detect faults which occur in itself. Our hypothesis is that hardware faults change the flow of sensory data and the actions performed by the control program. By detecting these changes, the presence of faults can be inferred. In order to test our hypothesis, we collect data in three different tasks performed by real robots. During a number of training runs, we record sensory data from the robots both while they are operating normally and after a fault has been injected. We use back-propagation neural networks to synthesize fault detection components based on the data collected in the training runs. We evaluate the performance of the trained fault detectors in terms of the number of false positives and the time it takes to detect a fault.<p>The results show that good fault detectors can be obtained. We extend the set of possible faults and go on to show that a single fault detector can be trained to detect several faults in both a robot's sensors and actuators. We show that fault detectors can be synthesized that are robust to variations in the task. Finally, we show how a fault detector can be trained to allow one robot to detect faults that occur in another robot.<p><p>The second approach involves the use of firefly-inspired synchronization to allow the presence of faulty robots to be determined by other non-faulty robots in a swarm robotic system. We take inspiration from the synchronized flashing behavior observed in some species of fireflies. Each robot flashes by lighting up its on-board red LEDs and neighboring robots are driven to flash in synchrony. The robots always interpret the absence of flashing by a particular robot as an indication that the robot has a fault. A faulty robot can stop flashing periodically for one of two reasons. The fault itself can render the robot unable to flash periodically.<p>Alternatively, the faulty robot might be able to detect the fault itself using endogenous fault detection and decide to stop flashing.<p>Thus, catastrophic faults in a robot can be directly detected by its peers, while the presence of less serious faults can be detected by the faulty robot itself, and actively communicated to neighboring robots. We explore the performance of the proposed algorithm both on a real world swarm robotic system and in simulation. We show that failed robots are detected correctly and in a timely manner, and we show that a system composed of robots with simulated self-repair capabilities can survive relatively high failure rates.<p><p>We conclude that i) fault injection and learning can give robots the capacity to detect faults that occur in themselves, and that ii) firefly-inspired synchronization can enable robots in a swarm robotic system to detect and communicate faults.<p> / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished
44

Division of labour in groups of robots

Labella, Thomas Halva 09 February 2007 (has links)
In this thesis, we examine algorithms for the division of labour in a group of robot. The algorithms make no use of direct communication. Instead, they are based only on the interactions among the robots and between the group and the environment.<p><p>Division of labour is the mechanism that decides how many robots shall be used to perform a task. The efficiency of the group of robots depends in fact on the number of robots involved in a task. If too few robots are used to achieve a task, they might not be successful or might perform poorly. If too many robots are used, it might be a waste of resources. The number of robots to use might be decided a priori by the system designer. More interestingly, the group of robots might autonomously select how many and which robots to use. In this thesis, we study algorithms of the latter type.<p><p>The robotic literature offers already some solutions, but most of them use a form of direct communication between agents. Direct, or explicit, communication between the robots is usually considered a necessary condition for co-ordination. Recent studies have questioned this assumption. The claim is based on observations of animal colonies, e.g. ants and termites. They can effectively co-operate without directly communicating, but using indirect forms of communication like stigmergy. Because they do not rely on communication, such colonies show robust behaviours at group level, a condition that one wishes also for groups of robots. Algorithms for robot co-ordination without direct communication have been proposed in the last few years. They are interesting not only because they are a stimulating intellectual challenge, but also because they address a situation that might likely occur when using robots for real-world out-door applications. Unfortunately, they are still poorly studied.<p><p>This thesis helps the understanding and the development of such algorithms. We start from a specific case to learn its characteristics. Then we improve our understandings through comparisons with other solutions, and finally we port everything into another domain.<p><p>We first study an algorithm for division of labour that was inspired by ants' foraging. We test the algorithm in an application similar to ants' foraging: prey retrieval. We prove that the model used for ants' foraging can be effective also in real conditions. Our analysis allows us to understand the underlying mechanisms of the division of labour and to define some way of measuring it.<p><p>Using this knowledge, we continue by comparing the ant-inspired algorithm with similar solutions that can be found in the literature and by assessing their differences. In performing these comparisons, we take care of using a formal methodology that allows us to spare resources. Namely, we use concepts of experiment design to reduce the number of experiments with real robots, without losing significance in the results.<p><p>Finally, we apply and port what we previously learnt into another application: Sensor/Actor Networks (SANETs). We develop an architecture for division of labour that is based on the same mechanisms as the ants' foraging model. Although the individuals in the SANET can communicate, the communication channel might be overloaded. Therefore, the agents of a SANET shall be able to co-ordinate without accessing the communication channel. / Doctorat en sciences appliquées / info:eu-repo/semantics/nonPublished
45

Regions, distances and graphs

Collette, Sébastien 22 November 2006 (has links)
We present new approaches to define and analyze geometric graphs. <p><p>The region-counting distances, introduced by Demaine, Iacono and Langerman, associate to any pair of points (p,q) the number of items of a dataset S contained in a region R(p,q) surrounding (p,q). We define region-counting disks and circles, and study the complexity of these objects. Algorithms to compute epsilon-approximations of region-counting distances and approximations of region-counting circles are presented.<p><p>We propose a definition of the locality for properties of geometric graphs. We measure the local density of graphs using the region-counting distances between pairs of vertices, and we use this density to define local properties of classes of graphs.<p>We illustrate the locality by introducing the local diameter of geometric graphs: we define it as the upper bound on the size of the shortest path between any pair of vertices, expressed as a function of the density of the graph around those vertices. We determine the local diameter of several well-studied graphs such as the Theta-graph, the Ordered Theta-graph and the Skip List Spanner. We also show that various operations, such as path and point queries using geometric graphs as data structures, have complexities which can be expressed as local properties.<p><p>A family of proximity graphs, called Empty Region Graphs (ERG) is presented. The vertices of an ERG are points in the plane, and two points are connected if their neighborhood, defined by a region, does not contain any other point. The region defining the neighborhood of two points is a parameter of the graph. This family of graphs includes several known proximity graphs such as Nearest Neighbor Graphs, Beta-Skeletons or Theta-Graphs. We concentrate on ERGs that are invariant under translations, rotations and uniform scaling of the vertices. We give conditions on the region defining an ERG to ensure a number of properties that might be desirable in applications, such as planarity, connectivity, triangle-freeness, cycle-freeness, bipartiteness and bounded degree. These conditions take the form of what we call tight regions: maximal or minimal regions that a region must contain or be contained in to make the graph satisfy a given property. We show that every monotone property has at least one corresponding tight region; we discuss possibilities and limitations of this general model for constructing a graph from a point set.<p><p>We introduce and analyze sigma-local graphs, based on a definition of locality by Erickson, to illustrate efficient construction algorithm on a subclass of ERGs. / Doctorat en sciences, Spécialisation Informatique / info:eu-repo/semantics/nonPublished
46

Ant colony optimization and local search for the probabilistic traveling salesman problem: a case study in stochastic combinatorial optimization

Bianchi, Leonora 29 June 2006 (has links)
In this thesis we focus on Stochastic combinatorial Optimization Problems (SCOPs), a wide class of combinatorial optimization problems under uncertainty, where part of the information about the problem data is unknown at the planning stage, but some knowledge about its probability distribution is assumed.<p><p>Optimization problems under uncertainty are complex and difficult, and often classical algorithmic approaches based on mathematical and dynamic programming are able to solve only very small problem instances. For this reason, in recent years metaheuristic algorithms such as Ant Colony Optimization, Evolutionary Computation, Simulated Annealing, Tabu Search and others, are emerging as successful alternatives to classical approaches.<p><p>In this thesis, metaheuristics that have been applied so far to SCOPs are introduced and the related literature is thoroughly reviewed. In particular, two properties of metaheuristics emerge from the survey: they are a valid alternative to exact classical methods for addressing real-sized SCOPs, and they are flexible, since they can be quite easily adapted to solve different SCOPs formulations, both static and dynamic. On the base of the current literature, we identify the following as the key open issues in solving SCOPs via metaheuristics: <p>(1) the design and integration of ad hoc, fast and effective objective function approximations inside the optimization algorithm;<p>(2) the estimation of the objective function by sampling when no closed-form expression for the objective function is available, and the study of methods to reduce the time complexity and noise inherent to this type of estimation;<p>(3) the characterization of the efficiency of metaheuristic variants with respect to different levels of stochasticity in the problem instances. <p><p>We investigate the above issues by focusing in particular on a SCOP belonging to the class of vehicle routing problems: the Probabilistic Traveling Salesman Problem (PTSP). For the PTSP, we consider the Ant Colony Optimization metaheuristic and we design efficient local search algorithms that can enhance its performance. We obtain state-of-the-art algorithms, but we show that they are effective only for instances above a certain level of stochasticity, otherwise it is more convenient to solve the problem as if it were deterministic.<p>The algorithmic variants based on an estimation of the objective function by sampling obtain worse results, but qualitatively have the same behavior of the algorithms based on the exact objective function, with respect to the level of stochasticity. Moreover, we show that the performance of algorithmic variants based on ad hoc approximations is strongly correlated with the absolute error of the approximation, and that the effect on local search of ad hoc approximations can be very degrading.<p><p>Finally, we briefly address another SCOP belonging to the class of vehicle routing problems: the Vehicle Routing Problem with Stochastic Demands (VRPSD). For this problem, we have implemented and tested several metaheuristics, and we have studied the impact of integrating in them different ad hoc approximations.<p> / Doctorat en sciences appliquées / info:eu-repo/semantics/nonPublished
47

Semantic analysis in web usage mining

Norguet, Jean-Pierre 20 March 2006 (has links)
With the emergence of the Internet and of the World Wide Web, the Web site has become a key communication channel in organizations. To satisfy the objectives of the Web site and of its target audience, adapting the Web site content to the users' expectations has become a major concern. In this context, Web usage mining, a relatively new research area, and Web analytics, a part of Web usage mining that has most emerged in the corporate world, offer many Web communication analysis techniques. These techniques include prediction of the user's behaviour within the site, comparison between expected and actual Web site usage, adjustment of the Web site with respect to the users' interests, and mining and analyzing Web usage data to discover interesting metrics and usage patterns. However, Web usage mining and Web analytics suffer from significant drawbacks when it comes to support the decision-making process at the higher levels in the organization.<p><p>Indeed, according to organizations theory, the higher levels in the organizations need summarized and conceptual information to take fast, high-level, and effective decisions. For Web sites, these levels include the organization managers and the Web site chief editors. At these levels, the results produced by Web analytics tools are mostly useless. Indeed, most of these results target Web designers and Web developers. Summary reports like the number of visitors and the number of page views can be of some interest to the organization manager but these results are poor. Finally, page-group and directory hits give the Web site chief editor conceptual results, but these are limited by several problems like page synonymy (several pages contain the same topic), page polysemy (a page contains several topics), page temporality, and page volatility.<p><p>Web usage mining research projects on their part have mostly left aside Web analytics and its limitations and have focused on other research paths. Examples of these paths are usage pattern analysis, personalization, system improvement, site structure modification, marketing business intelligence, and usage characterization. A potential contribution to Web analytics can be found in research about reverse clustering analysis, a technique based on self-organizing feature maps. This technique integrates Web usage mining and Web content mining in order to rank the Web site pages according to an original popularity score. However, the algorithm is not scalable and does not answer the page-polysemy, page-synonymy, page-temporality, and page-volatility problems. As a consequence, these approaches fail at delivering summarized and conceptual results. <p><p>An interesting attempt to obtain such results has been the Information Scent algorithm, which produces a list of term vectors representing the visitors' needs. These vectors provide a semantic representation of the visitors' needs and can be easily interpreted. Unfortunately, the results suffer from term polysemy and term synonymy, are visit-centric rather than site-centric, and are not scalable to produce. Finally, according to a recent survey, no Web usage mining research project has proposed a satisfying solution to provide site-wide summarized and conceptual audience metrics. <p><p>In this dissertation, we present our solution to answer the need for summarized and conceptual audience metrics in Web analytics. We first described several methods for mining the Web pages output by Web servers. These methods include content journaling, script parsing, server monitoring, network monitoring, and client-side mining. These techniques can be used alone or in combination to mine the Web pages output by any Web site. Then, the occurrences of taxonomy terms in these pages can be aggregated to provide concept-based audience metrics. To evaluate the results, we implement a prototype and run a number of test cases with real Web sites. <p><p>According to the first experiments with our prototype and SQL Server OLAP Analysis Service, concept-based metrics prove extremely summarized and much more intuitive than page-based metrics. As a consequence, concept-based metrics can be exploited at higher levels in the organization. For example, organization managers can redefine the organization strategy according to the visitors' interests. Concept-based metrics also give an intuitive view of the messages delivered through the Web site and allow to adapt the Web site communication to the organization objectives. The Web site chief editor on his part can interpret the metrics to redefine the publishing orders and redefine the sub-editors' writing tasks. As decisions at higher levels in the organization should be more effective, concept-based metrics should significantly contribute to Web usage mining and Web analytics. <p> / Doctorat en sciences appliquées / info:eu-repo/semantics/nonPublished
48

The problem of tuning metaheuristics as seen from a machine learning perspective

Birattari, Mauro 20 December 2004 (has links)
<p>A metaheuristic is a generic algorithmic template that, once properly instantiated, can be used for finding high quality solutions of combinatorial optimization problems.<p>For obtaining a fully functioning algorithm, a metaheuristic needs to be configured: typically some modules need to be instantiated and some parameters need to be tuned. For the sake of precision, we use the expression <em>parametric tuning</em> for referring to the tuning of numerical parameters, either continuous or discrete but in any case ordinal. <p>On the other hand, we use the expression <em>structural tuning</em> for referring to the problem of defining which modules should be included and, in general, to the problem of tuning parameters that are either boolean or categorical. Finally, with <em>tuning</em> we refer to the composite <em>structural and parametric tuning</em>.</p><p><p><p>Tuning metaheuristics is a very sensitive issue both in practical applications and in academic studies. Nevertheless, a precise definition of the tuning problem is missing in the literature. In this thesis, we argue that the problem of tuning a metaheuristic can be profitably described and solved as a machine learning problem.</p><p><p><p>Indeed, looking at the problem of tuning metaheuristics from a machine learning perspective, we are in the position of giving a formal statement of the tuning problem and to propose an algorithm, called F-Race, for tackling the problem itself. Moreover, always from this standpoint, we are able to highlight and discuss some catches and faults in the current research methodology in the metaheuristics field, and to propose some guidelines.</p><p><p><p>The thesis contains experimental results on the use of F-Race and some examples of practical applications. Among others, we present a feasibility study carried out by the German-based software company <em>SAP</em>, that concerned the possible use of F-Race for tuning a commercial computer program for vehicle routing and scheduling problems. Moreover, we discuss the successful use of F-Race for tuning the best performing algorithm submitted to the <em>International Timetabling Competition</em> organized in 2003 by the <em>Metaheuristics Network</em> and sponsored by <em>PATAT</em>, the international series of conferences on the <em>Practice and Theory of Automated Timetabling</em>.</p> / Doctorat en sciences appliquées / info:eu-repo/semantics/nonPublished
49

Adiabatic quantum computation

Roland, Jérémie 28 September 2004 (has links)
Le développement de la Théorie du Calcul Quantique provient de l'idée qu'un ordinateur est avant tout un système physique, de sorte que ce sont les lois de la Nature elles-mêmes qui constituent une limite ultime sur ce qui peut être calculé ou non. L'intérêt pour cette discipline fut stimulé par la découverte par Peter Shor d'un algorithme quantique rapide pour factoriser un nombre, alors qu'actuellement un tel algorithme n'est pas connu en Théorie du Calcul Classique. Un autre résultat important fut la construction par Lov Grover d'un algorithme capable de retrouver un élément dans une base de donnée non-structurée avec un gain de complexité quadratique par rapport à tout algorithme classique. Alors que ces algorithmes quantiques sont exprimés dans le modèle ``standard' du Calcul Quantique, où le registre évolue de manière discrète dans le temps sous l'application successive de portes quantiques, un nouveau type d'algorithme a été récemment introduit, où le registre évolue continûment dans le temps sous l'action d'un Hamiltonien. Ainsi, l'idée à la base du Calcul Quantique Adiabatique, proposée par Edward Farhi et ses collaborateurs, est d'utiliser un outil traditionnel de la Mécanique Quantique, à savoir le Théorème Adiabatique, pour concevoir des algorithmes quantiques où le registre évolue sous l'influence d'un Hamiltonien variant très lentement, assurant une évolution adiabatique du système. Dans cette thèse, nous montrons tout d'abord comment reproduire le gain quadratique de l'algorithme de Grover au moyen d'un algorithme quantique adiabatique. Ensuite, nous montrons qu'il est possible de traduire ce nouvel algorithme adiabatique, ainsi qu'un autre algorithme de recherche à évolution Hamiltonienne, dans le formalisme des circuits quantiques, de sorte que l'on obtient ainsi trois algorithmes quantiques de recherche très proches dans leur principe. Par la suite, nous utilisons ces résultats pour construire un algorithme adiabatique pour résoudre des problèmes avec structure, utilisant une technique, dite de ``nesting', développée auparavant dans le cadre d'algorithmes quantiques de type circuit. Enfin, nous analysons la résistance au bruit de ces algorithmes adiabatiques, en introduisant un modèle de bruit utilisant la théorie des matrices aléatoires et en étudiant son effet par la théorie des perturbations. / Doctorat en sciences appliquées / info:eu-repo/semantics/nonPublished
50

Information-theoretic variable selection and network inference from microarray data

Meyer, Patrick E. 16 December 2008 (has links)
Statisticians are used to model interactions between variables on the basis of observed<p>data. In a lot of emerging fields, like bioinformatics, they are confronted with datasets<p>having thousands of variables, a lot of noise, non-linear dependencies and, only, tens of<p>samples. The detection of functional relationships, when such uncertainty is contained in<p>data, constitutes a major challenge.<p>Our work focuses on variable selection and network inference from datasets having<p>many variables and few samples (high variable-to-sample ratio), such as microarray data.<p>Variable selection is the topic of machine learning whose objective is to select, among a<p>set of input variables, those that lead to the best predictive model. The application of<p>variable selection methods to gene expression data allows, for example, to improve cancer<p>diagnosis and prognosis by identifying a new molecular signature of the disease. Network<p>inference consists in representing the dependencies between the variables of a dataset by<p>a graph. Hence, when applied to microarray data, network inference can reverse-engineer<p>the transcriptional regulatory network of cell in view of discovering new drug targets to<p>cure diseases.<p>In this work, two original tools are proposed MASSIVE (Matrix of Average Sub-Subset<p>Information for Variable Elimination) a new method of feature selection and MRNET (Minimum<p>Redundancy NETwork), a new algorithm of network inference. Both tools rely on<p>the computation of mutual information, an information-theoretic measure of dependency.<p>More precisely, MASSIVE and MRNET use approximations of the mutual information<p>between a subset of variables and a target variable based on combinations of mutual informations<p>between sub-subsets of variables and the target. The used approximations allow<p>to estimate a series of low variate densities instead of one large multivariate density. Low<p>variate densities are well-suited for dealing with high variable-to-sample ratio datasets,<p>since they are rather cheap in terms of computational cost and they do not require a large<p>amount of samples in order to be estimated accurately. Numerous experimental results<p>show the competitiveness of these new approaches. Finally, our thesis has led to a freely<p>available source code of MASSIVE and an open-source R and Bioconductor package of<p>network inference. / Doctorat en sciences, Spécialisation Informatique / info:eu-repo/semantics/nonPublished

Page generated in 0.4747 seconds