Global ETD Search

191	Prediction of Protein-Protein Interaction Sites with Conditional Random Fields / Vorhersage der Protein-Protein Wechselwirkungsstellen mit Conditional Random Fields Dong, Zhijie 27 April 2012 (has links) No description available. 004 Informatik EGIT 050 Learning and adaptive systems Mathematics and Computer Science Conditional Random Fields (CRFs) Maschinelles Lernen Bioinformatik Mathematische Modelle Stochastische Modelle Conditional Random Fields (CRFs) protein-protein interaction prediction machine learning bioinformatics mathematical models stochastical models 54.80 Angewandte Informatik
192	[en] EXTRACTING AND CONNECTING PLAINTIFF S LEGAL CLAIMS AND JUDICIAL PROVISIONS FROM BRAZILIAN COURT DECISIONS / [pt] EXTRAÇÃO E CONEXÃO ENTRE PEDIDOS E DECISÕES JUDICIAIS DE UM TRIBUNAL BRASILEIRO WILLIAM PAULO DUCCA FERNANDES 03 November 2020 (has links) [pt] Neste trabalho, propomos uma metodologia para anotar decisões judiciais, criar modelos de Deep Learning para extração de informação, e visualizar de forma agregada a informação extraída das decisões. Instanciamos a metodologia em dois sistemas. O primeiro extrai modificações de um tribunal de segunda instância, que consiste em um conjunto de categorias legais que são comumente modificadas pelos tribunais de segunda instância. O segundo (i) extrai as causas que motivaram uma pessoa a propor uma ação judicial (causa de pedir), os pedidos do autor e os provimentos judiciais dessas ações proferidas pela primeira e segunda instância de um tribunal, e (ii) conecta os pedidos com os provimentos judiciais correspondentes. O sistema apresenta seus resultados através de visualizações. Extração de Informação para textos legais tem sido abordada usando diferentes técnicas e idiomas. Nossas propostas diferem dos trabalhos anteriores, pois nossos corpora são compostos por decisões de primeira e segunda instância de um tribunal brasileiro. Para extrair as informações, usamos uma abordagem tradicional de Aprendizado de Máquina e outra usando Deep Learning, tanto individualmente quanto como uma solução combinada. Para treinar e avaliar os sistemas, construímos quatro corpora: Kauane Junior para o primeiro sistema, e Kauane Insurance Report, Kauane Insurance Lower e Kauane Insurance Upper para o segundo. Usamos dados públicos disponibilizados pelo Tribunal de Justiça do Estado do Rio de Janeiro para construir os corpora. Para o Kauane Junior, o melhor modelo (Fbeta=1 de 94.79 por cento) foi uma rede neural bidirecional Long Short-Term Memory combinada com Conditional Random Fields (BILSTM-CRF); para o Kauane Insurance Report, o melhor (Fbeta=1 de 67,15 por cento) foi uma rede neural bidirecional Long Short-Term Memory com embeddings de caracteres concatenados a embeddings de palavras combinada com Conditional Random Fields (BILSTM-CE-CRF). Para o Kauane Insurance Lower, o melhor (Fbeta=1 de 89,12 por cento) foi uma BILSTM-CE-CRF; para o Kauane Insurance Upper, uma BILSTM-CRF (Fbeta=1 de 83,66 por cento). / [en] In this work, we propose a methodology to annotate Court decisions, create Deep Learning models to extract information, and visualize the aggregated information extracted from the decisions. We instantiate our methodology in two systems we have developed. The first one extracts Appellate Court modifications, a set of legal categories that are commonly modified by Appellate Courts. The second one (i) extracts plaintiff s legal claims and each specific provision on legal opinions enacted by lower and Appellate Courts, and (ii) connects each legal claim with the corresponding judicial provision. The system presents the results through visualizations. Information Extraction for legal texts has been previously addressed using different techniques and languages. Our proposals differ from previous work, since our corpora are composed of Brazilian lower and Appellate Court decisions. To automatically extract that information, we use a traditional Machine Learning approach and a Deep Learning approach, both as alternative solutions and also as a combined solution. In order to train and evaluate the systems, we have built Kauane Junior corpus for the first system, and three corpora for the second system – Kauane Insurance Report, Kauane Insurance Lower, and Kauane Insurance Upper. We used public data disclosed by the State Court of Rio de Janeiro to build the corpora. For Kauane Junior, the best model, which is a Bidirectional Long Short-Term Memory network combined with Conditional Random Fields (BILSTM-CRF), obtained an (F)beta=1 score of 94.79 percent. For Kauane Insurance Report, the best model, which is a Bidirectional Long Short-Term Memory network with character embeddings concatenated to word embeddings combined with Conditional Random Fields (BILSTM-CE-CRF), obtained an (F)beta=1 score of 67.15 percent. For Kauane Insurance Lower, the best model, which is a BILSTM-CE-CRF, obtained an (F)beta=1 score of 89.12 percent. For Kauane Insurance Upper, the best model, which is a BILSTM-CRF, obtained an (F)beta=1 score of 83.66 percent. [pt] APRENDIZADO DE MAQUINA [pt] PROVISOES MODIFICATORIAS [pt] CONDITIONAL RANDOM FIELDS [pt] LONG SHORT-TERM MEMORY [pt] REDES NEURAIS RECORRENTES [pt] APRENDIZADO PROFUNDO [pt] PROCESSAMENTO DE LINGUAGEM NATURAL [pt] EXTRACAO DE INFORMACAO [pt] DIREITO [en] MACHINE LEARNING [en] MODIFICATORY PROVISIONS [en] CONDITIONAL RANDOM FIELDS [en] LONG SHORT-TERM MEMORY [en] RECURRENT NEURAL NETWORKS [en] DEEP LEARNING [en] NATURAL LANGUAGE PROCESSING [en] EXTRATION OF INFORMATION [en] LAW
193	3D real time object recognition Amplianitis, Konstantinos 01 March 2017 (has links) Die Objekterkennung ist ein natürlicher Prozess im Menschlichen Gehirn. Sie ndet im visuellen Kortex statt und nutzt die binokulare Eigenschaft der Augen, die eine drei- dimensionale Interpretation von Objekten in einer Szene erlaubt. Kameras ahmen das menschliche Auge nach. Bilder von zwei Kameras, in einem Stereokamerasystem, werden von Algorithmen für eine automatische, dreidimensionale Interpretation von Objekten in einer Szene benutzt. Die Entwicklung von Hard- und Software verbessern den maschinellen Prozess der Objek- terkennung und erreicht qualitativ immer mehr die Fähigkeiten des menschlichen Gehirns. Das Hauptziel dieses Forschungsfeldes ist die Entwicklung von robusten Algorithmen für die Szeneninterpretation. Sehr viel Aufwand wurde in den letzten Jahren in der zweidimen- sionale Objekterkennung betrieben, im Gegensatz zur Forschung zur dreidimensionalen Erkennung. Im Rahmen dieser Arbeit soll demnach die dreidimensionale Objekterkennung weiterent- wickelt werden: hin zu einer besseren Interpretation und einem besseren Verstehen von sichtbarer Realität wie auch der Beziehung zwischen Objekten in einer Szene. In den letzten Jahren aufkommende low-cost Verbrauchersensoren, wie die Microsoft Kinect, generieren Farb- und Tiefendaten einer Szene, um menschenähnliche visuelle Daten zu generieren. Das Ziel hier ist zu zeigen, wie diese Daten benutzt werden können, um eine neue Klasse von dreidimensionalen Objekterkennungsalgorithmen zu entwickeln - analog zur Verarbeitung im menschlichen Gehirn. / Object recognition is a natural process of the human brain performed in the visual cor- tex and relies on a binocular depth perception system that renders a three-dimensional representation of the objects in a scene. Hitherto, computer and software systems are been used to simulate the perception of three-dimensional environments with the aid of sensors to capture real-time images. In the process, such images are used as input data for further analysis and development of algorithms, an essential ingredient for simulating the complexity of human vision, so as to achieve scene interpretation for object recognition, similar to the way the human brain perceives it. The rapid pace of technological advancements in hardware and software, are continuously bringing the machine-based process for object recognition nearer to the inhuman vision prototype. The key in this eld, is the development of algorithms in order to achieve robust scene interpretation. A lot of recognisable and signi cant e ort has been successfully carried out over the years in 2D object recognition, as opposed to 3D. It is therefore, within this context and scope of this dissertation, to contribute towards the enhancement of 3D object recognition; a better interpretation and understanding of reality and the relationship between objects in a scene. Through the use and application of low-cost commodity sensors, such as Microsoft Kinect, RGB and depth data of a scene have been retrieved and manipulated in order to generate human-like visual perception data. The goal herein is to show how RGB and depth information can be utilised in order to develop a new class of 3D object recognition algorithms, analogous to the perception processed by the human brain. 3D Objekt Erkennung 3D Mensch Segmentierung Objekt Erkennung Conditional Random Fields Kinect-Sensor RGBD-Daten 3D Rekonstruktionen Bundle Adjustment ICP Registrierung 3D Object Recognition 3D Human Segmentation Object Detection Conditional Random Fields Kinect Sensor RGBD Data 3D Reconstructions Bundle Adjustment ICP registration 004 Informatik 28 Informatik, Datenverarbeitung ST 330 ddc:004
194	Échantillonnage dynamique de champs markoviens Breuleux, Olivier 11 1900 (has links) L'un des modèles d'apprentissage non-supervisé générant le plus de recherche active est la machine de Boltzmann --- en particulier la machine de Boltzmann restreinte, ou RBM. Un aspect important de l'entraînement ainsi que l'exploitation d'un tel modèle est la prise d'échantillons. Deux développements récents, la divergence contrastive persistante rapide (FPCD) et le herding, visent à améliorer cet aspect, se concentrant principalement sur le processus d'apprentissage en tant que tel. Notamment, le herding renonce à obtenir un estimé précis des paramètres de la RBM, définissant plutôt une distribution par un système dynamique guidé par les exemples d'entraînement. Nous généralisons ces idées afin d'obtenir des algorithmes permettant d'exploiter la distribution de probabilités définie par une RBM pré-entraînée, par tirage d'échantillons qui en sont représentatifs, et ce sans que l'ensemble d'entraînement ne soit nécessaire. Nous présentons trois méthodes: la pénalisation d'échantillon (basée sur une intuition théorique) ainsi que la FPCD et le herding utilisant des statistiques constantes pour la phase positive. Ces méthodes définissent des systèmes dynamiques produisant des échantillons ayant les statistiques voulues et nous les évaluons à l'aide d'une méthode d'estimation de densité non-paramétrique. Nous montrons que ces méthodes mixent substantiellement mieux que la méthode conventionnelle, l'échantillonnage de Gibbs. / One of the most active topics of research in unsupervised learning is the Boltzmann machine --- particularly the Restricted Boltzmann Machine or RBM. In order to train, evaluate or exploit such models, one has to draw samples from it. Two recent algorithms, Fast Persistent Contrastive Divergence (FPCD) and Herding aim to improve sampling during training. In particular, herding gives up on obtaining a point estimate of the RBM's parameters, rather defining the model's distribution with a dynamical system guided by training samples. We generalize these ideas in order to obtain algorithms capable of exploiting the probability distribution defined by a pre-trained RBM, by sampling from it, without needing to make use of the training set. We present three methods: Sample Penalization, based on a theoretical argument as well as FPCD and Herding using constant statistics for their positive phases. These methods define dynamical systems producing samples with the right statistics and we evaluate them using non-parametric density estimation. We show that these methods mix substantially better than Gibbs sampling, which is the conventional sampling method used for RBMs. Apprentissage machine Champs markoviens Machine de Boltzmann MCMC Modèles probabilistes Machine learning Markov random fields Boltzmann machine MCMC Probabilistic models
195	Inférence topologique Prévost, Noémie 02 1900 (has links) Les données provenant de l'échantillonnage fin d'un processus continu (champ aléatoire) peuvent être représentées sous forme d'images. Un test statistique permettant de détecter une différence entre deux images peut être vu comme un ensemble de tests où chaque pixel est comparé au pixel correspondant de l'autre image. On utilise alors une méthode de contrôle de l'erreur de type I au niveau de l'ensemble de tests, comme la correction de Bonferroni ou le contrôle du taux de faux-positifs (FDR). Des méthodes d'analyse de données ont été développées en imagerie médicale, principalement par Keith Worsley, utilisant la géométrie des champs aléatoires afin de construire un test statistique global sur une image entière. Il s'agit d'utiliser l'espérance de la caractéristique d'Euler de l'ensemble d'excursion du champ aléatoire sous-jacent à l'échantillon au-delà d'un seuil donné, pour déterminer la probabilité que le champ aléatoire dépasse ce même seuil sous l'hypothèse nulle (inférence topologique). Nous exposons quelques notions portant sur les champs aléatoires, en particulier l'isotropie (la fonction de covariance entre deux points du champ dépend seulement de la distance qui les sépare). Nous discutons de deux méthodes pour l'analyse des champs anisotropes. La première consiste à déformer le champ puis à utiliser les volumes intrinsèques et les compacités de la caractéristique d'Euler. La seconde utilise plutôt les courbures de Lipschitz-Killing. Nous faisons ensuite une étude de niveau et de puissance de l'inférence topologique en comparaison avec la correction de Bonferroni. Finalement, nous utilisons l'inférence topologique pour décrire l'évolution du changement climatique sur le territoire du Québec entre 1991 et 2100, en utilisant des données de température simulées et publiées par l'Équipe Simulations climatiques d'Ouranos selon le modèle régional canadien du climat. / Data coming from a fine sampling of a continuous process (random field) can be represented as images. A statistical test aiming at detecting a difference between two images can be seen as a group of tests in which each pixel is compared to the corresponding pixel in the other image. We then use a method to control the type I error over all the tests, such as the Bonferroni correction or the control of the false discovery rate (FDR). Methods of data analysis have been developped in the field of medical imaging, mainly by Keith Worsley, using the geometry of random fields in order to build a global statistical test over the whole image. The expected Euler characteristic of the excursion set of the random field underlying the sample over a given threshold is used in order to determine the probability that the random field exceeds this same threshold under the null hypothesis (topological inference). We present some notions relevant to random fields, in particular isotropy (the covariance function between two given points of a field depends only on the distance between them). We discuss two methods for the analysis of non\-isotropic random fields. The first one consists in deforming the field and then using the intrinsic volumes and the Euler characteristic densities. The second one uses the Lipschitz-Killing curvatures. We then perform a study of sensitivity and power of the topological inference technique comparing it to the Bonferonni correction. Finally, we use topological inference in order to describe the evolution of climate change over Quebec territory between 1991 and 2100 using temperature data simulated and published by the Climate Simulation Team at Ouranos, with the Canadian Regional Climate Model CRCM4.2. Comparaisons multiples Caractéristique d'Euler Champs aléatoires Isotropie Courbures de Lipschitz-Killing Inférence topologique Changement climatique Multiple comparisons Euler characteristic Random fields Isotropy Lipschitz-Killing curvatures Climate change
196	Développement de modèles géostatistiques à l’aide d’équations aux dérivées partielles stochastiques / Development of geostatistical models using stochastic partial differential equations Carrizo Vergara, Ricardo 18 December 2018 (has links) Ces travaux présentent des avancées théoriques pour l'application de l'approche EDPS (Équation aux Dérivées Partielles Stochastique) en Géostatistique. On considère dans cette approche récente que les données régionalisées proviennent de la réalisation d'un Champ Aléatoire satisfaisant une EDPS. Dans le cadre théorique des Champs Aléatoires Généralisés, l'influence d'une EDPS linéaire sur la structure de covariance de ses éventuelles solutions a été étudiée avec une grande généralité. Un critère d'existence et d'unicité des solutions stationnaires pour une classe assez large d'EDPSs linéaires a été obtenu, ainsi que des expressions pour les mesures spectrales associées. Ces résultats permettent de développer des modèles spatio-temporels présentant des propriétés non-triviales grâce à l'analyse d'équations d'évolution présentant un ordre de dérivation temporel fractionnaire. Des paramétrisations adaptées de ces modèles permettent de contrôler leur séparabilité et leur symétrie ainsi que leur régularité spatiale et temporelle séparément. Des résultats concernant des solutions stationnaires pour des EDPSs issues de la physique telles que l'équation de la Chaleur et l'équation d'Onde sont présentés. Puis, une méthode de simulation non-conditionnelle adaptée à ces modèles est étudiée. Cette méthode est basée sur le calcul d'une approximation de la Transformée de Fourier du champ, et elle peut être implémentée de façon efficace grâce à la Transformée de Fourier Rapide. La convergence de cette méthode a été montrée théoriquement dans un sens faible et dans un sens fort. Cette méthode est appliquée à la résolution numérique des EDPSs présentées dans ces travaux. Des illustrations de modèles présentant des propriétés non-triviales et reliés à des équations de la physique sont alors présentées. / This dissertation presents theoretical advances in the application of the Stochastic Partial Differential Equation (SPDE) approach in Geostatistics. This recently developed approach consists in interpreting a regionalised data-set as a realisation of a Random Field satisfying a SPDE. Within the theoretical framework of Generalized Random Fields, the influence of a linear SPDE over the covariance structure of its potential solutions can be studied with a great generality. A criterion of existence and uniqueness of stationary solutions for a wide-class of linear SPDEs has been obtained, together with an expression for the related spectral measures. These results allow to develop spatio-temporal covariance models presenting non-trivial properties through the analysis of evolution equations presenting a fractional temporal derivative order. Suitable parametrizations of such models allow to control their separability, symmetry and separated space-time regularities. Results concerning stationary solutions for physically inspired SPDEs such as the Heat equation and the Wave equation are also presented. A method of non-conditional simulation adapted to these models is then studied. This method is based on the computation of an approximation of the Fourier Transform of the field, and it can be implemented efficiently thanks to the Fast Fourier Transform algorithm. The convergence of this method has been theoretically proven in suitable weak and strong senses. This method is applied to numerically solve the SPDEs studied in this work. Illustrations of models presenting non-trivial properties and related to physically driven equations are then given. Modèles géostatistiques Champs aléatoires généralisés Approche EDPS Géostatistique spatio-temporelle Simulation Geostatistical models Generalized random fields SPDE Approach Space-time geostatistics Simulation 551.015
197	Microstructural optimization of Solid Oxide Cells : a coupled stochastic geometrical and electrochemical modeling approach applied to LSCF-CGO electrode / Optimisation microstructurale des cellules à oxydes solides : approche numérique couplant modélisation géométrique et électrochimique appliquée à l'électrode LSCF-CGO Moussaoui, Hamza 29 April 2019 (has links) Ce travail porte sur la compréhension de l’impact de la microstructure sur les performances des Cellules à Oxyde Solide (SOC), avec une illustration sur l’électrode à oxygène en LSCF-CGO. Une approche couplant de la modélisation géométrique et électrochimique a été adoptée pour cet effet. Le modèle des champs aléatoires plurigaussiens et un autre basé sur des empilements de sphères ont été développés et adaptés pour les microstructures des SOCs. Ces modèles 3D de géométrie stochastique ont été ensuite validés sur différentes électrodes reconstruites par nano-holotomographie aux rayons X au synchrotron ou par tomographie avec un microscope électronique à balayage couplé à une sonde ionique focalisée. Ensuite, des corrélations semi-analytiques ont été proposées et validées sur une large base de microstructures synthétiques. Ces relations permettent de relier les paramètres ‘primaires’ de l’électrode (la composition, la porosité et les diamètres des phases) aux paramètres qui pilotent les réactions électrochimiques (la densité de points triples, les surfaces spécifiques interphases) et sont particulièrement pertinents pour les équipes de mise-en-forme des électrodes qui ont plus de contrôle sur ce premier ensemble de paramètres. Concernant la partie portant sur l’électrochimie, des tests sur une cellule symétrique en LSCF-CGO ont permis de valider un modèle déjà développé au sein du laboratoire, et qui permet de simuler la réponse électrochimique d’une électrode à oxygène à partir des données thermodynamiques et de microstructure. Finalement, le couplage des deux modèles validés a permis d’étudier l’impact de la composition des électrodes, leur porosité ou encore taille des grains sur leurs performances. Ces résultats pourront guider les équipes de mise-en-forme des électrodes vers des électrodes plus optimisées. / This work aims at better understanding the impact of Solid Oxide Cells (SOC) microstructure on their performance, with an illustration on an LSCF-CGO electrode. A coupled 3D stochastic geometrical and electrochemical modeling approach has been adopted. In this frame, a plurigaussian random field model and an in-house sphere packing algorithm have been adapted to simulate the microstructure of SOCs. The geometrical models have been validated on different electrodes reconstructed by synchrotron X-ray nano-holotomography or focused ion-beam tomography. Afterwards, semi-analytical microstructural correlations have been proposed and validated on a large dataset of representative synthetic microstructures. These relationships allow establishing the link between the electrode ‘basic’ parameters (composition, porosity and grain size), to the ‘key’ electrochemical parameters (Triple Phase Boundary length density and Specific surface areas), and are particularly useful for cell manufacturers who can easily control the first set of parameters. Concerning the electrochemical part, a reference symmetrical cell made of LSCF-CGO has been tested in a three-electrode setup. This enabled the validation of an oxygen electrode model that links the electrode morphological parameters to its polarization resistance, taking into account the thermodynamic data. Finally, the coupling of the validated models has enabled the investigation of the impact of electrode composition, porosity and grain size on the cell electrochemical performance, and thus providing useful insights to cell manufacturers. Cellule à Oxyde solide Champs aléatoires plurigaussiens Modèle d’empilement de sphères Caractérisation 3D des microstructures Corrélations microstructurales Solid Oxide Cell Plurigaussian random fields Sphere packing model 3D microstructure characterization Microstructural correlations 530
198	Segmentation d'images de documents manuscrits composites : application aux documents de chimie / Heterogenous handwritten document image segmentation : application to chemistry document Ghanmi, Nabil 30 September 2016 (has links) Cette thèse traite de la segmentation structurelle de documents issus de cahiers de chimie. Ce travail est utile pour les chimistes en vue de prendre connaissance des conditions des expériences réalisées. Les documents traités sont manuscrits, hétérogènes et multi-scripteurs. Bien que leur structure physique soit relativement simple, une succession de trois régions représentant : la formule chimique de l’expérience, le tableau des produits utilisés et un ou plusieurs paragraphes textuels décrivant le déroulement de l’expérience, les lignes limitrophes des régions portent souvent à confusion, ajouté à cela des irrégularités dans la disposition des cellules du tableau, rendant le travail de séparation un vrai défi. La méthodologie proposée tient compte de ces difficultés en opérant une segmentation à plusieurs niveaux de granularité, et en traitant la segmentation comme un problème de classification. D’abord, l’image du document est segmentée en structures linéaires à l’aide d’un lissage horizontal approprié. Le seuil horizontal combiné avec une tolérance verticale avantage le regroupement des éléments fragmentés de la formule sans trop fusionner le texte. Ces structures linéaires sont classées en Texte ou Graphique en s’appuyant sur des descripteurs structurels spécifiques, caractéristiques des deux classes. Ensuite, la segmentation est poursuivie sur les lignes textuelles pour séparer les lignes du tableau de celles de la description. Nous avons proposé pour cette classification un modèle CAC qui permet de déterminer la séquence optimale d’étiquettes associées à la séquence des lignes d’un document. Le choix de ce type de modèle a été motivé par sa capacité à absorber la variabilité des lignes et à exploiter les informations contextuelles. Enfin, pour le problème de la segmentation de tableaux en cellules, nous avons proposé une méthode hybride qui fait coopérer deux niveaux d’analyse : structurel et syntaxique. Le premier s’appuie sur la présence des lignes graphiques et de l’alignement de texte et d’espaces ; et le deuxième tend à exploiter la cohérence de la syntaxe très réglementée du contenu des cellules. Nous avons proposé, dans ce cadre, une approche contextuelle pour localiser les champs numériques dans le tableau, avec reconnaissance des chiffres isolés et connectés. La thèse étant effectuée dans le cadre d’une convention CIFRE, en collaboration avec la société eNovalys, nous avons implémenté et testé les différentes étapes du système sur une base conséquente de documents de chimie / This thesis deals with chemistry document segmentation and structure analysis. This work aims to help chemists by providing the information on the experiments which have already been carried out. The documents are handwritten, heterogeneous and multi-writers. Although their physical structure is relatively simple, since it consists of a succession of three regions representing: the chemical formula of the experiment, a table of the used products and one or more text blocks describing the experimental procedure, several difficulties are encountered. In fact, the lines located at the region boundaries and the imperfections of the table layout make the separation task a real challenge. The proposed methodology takes into account these difficulties by performing segmentation at several levels and treating the region separation as a classification problem. First, the document image is segmented into linear structures using an appropriate horizontal smoothing. The horizontal threshold combined with a vertical overlapping tolerance favor the consolidation of fragmented elements of the formula without too merge the text. These linear structures are classified in text or graphic based on discriminant structural features. Then, the segmentation is continued on text lines to separate the rows of the table from the lines of the raw text locks. We proposed for this classification, a CRF model for determining the optimal labelling of the line sequence. The choice of this kind of model has been motivated by its ability to absorb the variability of lines and to exploit contextual information. For the segmentation of table into cells, we proposed a hybrid method that includes two levels of analysis: structural and syntactic. The first relies on the presence of graphic lines and the alignment of both text and spaces. The second tends to exploit the coherence of the cell content syntax. We proposed, in this context, a Recognition-based approach using contextual knowledge to detect the numeric fields present in the table. The thesis was carried out in the framework of CIFRE, in collaboration with the eNovalys campany.We have implemented and tested all the steps of the proposed system on a consequent dataset of chemistry documents Document de chimie Structures linéaires Séparation texte/graphique Classification Extraction de tableaux Champs aléatoires conditionnels Extraction de numériques Chemistry document Linear structure Text/Graphic separation Classification Table extraction Conditional Random Fields Numeric extraction 006.32 006.4
199	Modélisation stochastique de l'expression des gènes et inférence de réseaux de régulation / From stochastic modelling of gene expression to inference of regulatory networks Herbach, Ulysse 27 September 2018 (has links) L'expression des gènes dans une cellule a longtemps été observable uniquement à travers des quantités moyennes mesurées sur des populations. L'arrivée des techniques «single-cell» permet aujourd'hui d'observer des niveaux d'ARN et de protéines dans des cellules individuelles : il s'avère que même dans une population de génome identique, la variabilité entre les cellules est parfois très forte. En particulier, une description moyenne est clairement insuffisante étudier la différenciation cellulaire, c'est-à-dire la façon dont les cellules souches effectuent des choix de spécialisation. Dans cette thèse, on s'intéresse à l'émergence de tels choix à partir de réseaux de régulation sous-jacents entre les gènes, que l'on souhaiterait pouvoir inférer à partir de données. Le point de départ est la construction d'un modèle stochastique de réseaux de gènes capable de reproduire les observations à partir d'arguments physiques. Les gènes sont alors décrits comme un système de particules en interaction qui se trouve être un processus de Markov déterministe par morceaux, et l'on cherche à obtenir un modèle statistique à partir de sa loi invariante. Nous présentons deux approches : la première correspond à une approximation de champ assez populaire en physique, pour laquelle nous obtenons un résultat de concentration, et la deuxième se base sur un cas particulier que l'on sait résoudre explicitement, ce qui aboutit à un champ de Markov caché aux propriétés intéressantes / Gene expression in a cell has long been only observable through averaged quantities over cell populations. The recent development of single-cell transcriptomics has enabled gene expression to be measured in individual cells: it turns out that even in an isogenic population, the molecular variability can be very important. In particular, an averaged description is not sufficient to account for cell differentiation. In this thesis, we are interested in the emergence of such cell decision-making from underlying gene regulatory networks, which we would like to infer from data. The starting point is the construction of a stochastic gene network model that is able to explain the data using physical arguments. Genes are then seen as an interacting particle system that happens to be a piecewise-deterministic Markov process, and our aim is to derive a tractable statistical model from its stationary distribution. We present two approaches: the first one is a popular field approximation, for which we obtain a concentration result, and the second one is based on an analytically tractable particular case, which provides a hidden Markov random field with interesting properties Expression stochastique des gènes Réseaux de régulation Champs de Markov Stochastic gene expression Single-cell data Gene regulatory networks Piecewise-deterministic Markov processes Markov random fields 519.8
200	Reconnaissance d’activités humaines à partir de séquences vidéo / Human activity recognition from video sequences Selmi, Mouna 12 December 2014 (has links) Cette thèse s’inscrit dans le contexte de la reconnaissance des activités à partir de séquences vidéo qui est une des préoccupations majeures dans le domaine de la vision par ordinateur. Les domaines d'application pour ces systèmes de vision sont nombreux notamment la vidéo surveillance, la recherche et l'indexation automatique de vidéos ou encore l'assistance aux personnes âgées. Cette tâche reste problématique étant donnée les grandes variations dans la manière de réaliser les activités, l'apparence de la personne et les variations des conditions d'acquisition des activités. L'objectif principal de ce travail de thèse est de proposer une méthode de reconnaissance efficace par rapport aux différents facteurs de variabilité. Les représentations basées sur les points d'intérêt ont montré leur efficacité dans les travaux d'art; elles ont été généralement couplées avec des méthodes de classification globales vue que ses primitives sont temporellement et spatialement désordonnées. Les travaux les plus récents atteignent des performances élevées en modélisant le contexte spatio-temporel des points d'intérêts par exemple certains travaux encodent le voisinage des points d'intérêt à plusieurs échelles. Nous proposons une méthode de reconnaissance des activités qui modélise explicitement l'aspect séquentiel des activités tout en exploitant la robustesse des points d'intérêts dans les conditions réelles. Nous commençons par l'extractivité des points d'intérêt dont a montré leur robustesse par rapport à l'identité de la personne par une étude tensorielle. Ces primitives sont ensuite représentées en tant qu'une séquence de sac de mots (BOW) locaux: la séquence vidéo est segmentée temporellement en utilisant la technique de fenêtre glissante et chacun des segments ainsi obtenu est représenté par BOW des points d'intérêt lui appartenant. Le premier niveau de notre système de classification séquentiel hybride consiste à appliquer les séparateurs à vaste marge (SVM) en tant que classifieur de bas niveau afin de convertir les BOWs locaux en des vecteurs de probabilités des classes d'activité. Les séquences de vecteurs de probabilité ainsi obtenues sot utilisées comme l'entrées de classifieur séquentiel conditionnel champ aléatoire caché (HCRF). Ce dernier permet de classifier d'une manière discriminante les séries temporelles tout en modélisant leurs structures internes via les états cachés. Nous avons évalué notre approche sur des bases publiques ayant des caractéristiques diverses. Les résultats atteints semblent être intéressant par rapport à celles des travaux de l'état de l'art. De plus, nous avons montré que l'utilisation de classifieur de bas niveau permet d'améliorer la performance de système de reconnaissance vue que le classifieur séquentiel HCRF traite directement des informations sémantiques des BOWs locaux, à savoir la probabilité de chacune des activités relativement au segment en question. De plus, les vecteurs de probabilités ont une dimension faible ce qui contribue à éviter le problème de sur apprentissage qui peut intervenir si la dimension de vecteur de caractéristique est plus importante que le nombre des données; ce qui le cas lorsqu'on utilise les BOWs qui sont généralement de dimension élevée. L'estimation les paramètres du HCRF dans un espace de dimension réduite permet aussi de réduire le temps d'entrainement / Human activity recognition (HAR) from video sequences is one of the major active research areas of computer vision. There are numerous application HAR systems, including video-surveillance, search and automatic indexing of videos, and the assistance of frail elderly. This task remains a challenge because of the huge variations in the way of performing activities, in the appearance of the person and in the variation of the acquisition conditions. The main objective of this thesis is to develop an efficient HAR method that is robust to different sources of variability. Approaches based on interest points have shown excellent state-of-the-art performance over the past years. They are generally related to global classification methods as these primitives are temporally and spatially disordered. More recent studies have achieved a high performance by modeling the spatial and temporal context of interest points by encoding, for instance, the neighborhood of the interest points over several scales. In this thesis, we propose a method of activity recognition based on a hybrid model Support Vector Machine - Hidden Conditional Random Field (SVM-HCRF) that models the sequential aspect of activities while exploiting the robustness of interest points in real conditions. We first extract the interest points and show their robustness with respect to the person's identity by a multilinear tensor analysis. These primitives are then represented as a sequence of local "Bags of Words" (BOW): The video is temporally fragmented using the sliding window technique and each of the segments thus obtained is represented by the BOW of interest points belonging to it. The first layer of our hybrid sequential classification system is a Support Vector Machine that converts each local BOW extracted from the video sequence into a vector of activity classes’ probabilities. The sequence of probability vectors thus obtained is used as input of the HCRF. The latter permits a discriminative classification of time series while modeling their internal structures via the hidden states. We have evaluated our approach on various human activity datasets. The results achieved are competitive with those of the current state of art. We have demonstrated, in fact, that the use of a low-level classifier (SVM) improves the performance of the recognition system since the sequential classifier HCRF directly exploits the semantic information from local BOWs, namely the probability of each activity relatively to the current local segment, rather than mere raw information from interest points. Furthermore, the probability vectors have a low-dimension which prevents significantly the risk of overfitting that can occur if the feature vector dimension is relatively high with respect to the training data size; this is precisely the case when using BOWs that generally have a very high dimension. The estimation of the HCRF parameters in a low dimension allows also to significantly reduce the duration of the HCRF training phase Reconnaissance des activités Points d’intérêt Points denses Analyse tensorielle multilinéaire Séparateurs à vaste marge Champs aléatoires conditionnels cachés Human activity recognition Interest points Dense points Multilinear tensor analysis Classification of sequential data Support vector machines Hidden conditional random fields

Search results