Spelling suggestions: "subject:"multirelational data"" "subject:"multirrelational data""
1 |
Mineração multirrelacional de regras de associação em grandes bases de dadosOyama, Fernando Takeshi [UNESP] 22 February 2010 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:29:40Z (GMT). No. of bitstreams: 0
Previous issue date: 2010-02-22Bitstream added on 2014-06-13T20:39:07Z : No. of bitstreams: 1
oyama_ft_me_sjrp.pdf: 1107324 bytes, checksum: 0977db2af1589dece4aa46b5882d84d6 (MD5) / O crescente avanço e a disponibilidade de recursos computacionais viabilizam o armazenamento e a manipulação de grandes bases de dados. As técnicas típicas de mineração de dados possibilitam a extração de padrões desde que os dados estejam armazenados em uma única tabela. A mineração de dados multirrelacional, por sua vez, apresenta-se como uma abordagem mais recente que permite buscar padrões provenientes de múltiplas tabelas, sendo indicada para a aplicação em bases de dados relacionais. No entanto, os algoritmos multirrelacionais de mineração de regras de associação existentes tornam-se impossibilitados de efetuar a tarefa de mineração em grandes volumes de dados, uma vez que a quantia de memória exigida para a conclusão do processamento ultrapassa a quantidade disponível. O objetivo do presente trabalho consiste em apresentar um algoritmo multirrelacional de extração de regras de associação com o foco na aplicação em grandes bases de dados relacionais. Para isso, o algoritmo proposto, MR-RADIX, apresenta uma estrutura denominada Radix-tree que representa comprimidamente a base de dados em memória. Além disso, o algoritmo utiliza-se do conceito de particionamento para subdividir a base de dados, de modo que cada partição possa ser processada integralmente em memória. Os testes realizados demonstram que o algoritmo MR-RADIX proporciona um desempenho superior a outros algoritmos correlatos e, ainda, efetua com êxito, diferentemente dos demais, a mineração de regras de associação em grandes bases de dados. / The increasing spread and availability of computing resources make feasible storage and handling of large databases. Traditional techniques of data mining allows the extraction of patterns provided that data is stored in a single table. The multi- relational data mining presents itself as a more recent approach that allows search patterns from multiple tables, indicated for use in relational databases. However, the existing multi-relational association rules mining algorithms become unable to make mining task in large data, since the amount of memory required for the completion of processing exceed the amount available. The goal of this work is to present a multi- relational algorithm for extracting association rules with focus application in large relational databases. For this the proposed algorithm MR-RADIX presents a structure called Radix-tree that represents compressly the database in memory. Moreover, the algorithm uses the concept of partitioning to subdivide the database, so that each partition can be processed entirely in memory. The tests show that the MR-RADIX algorithm provides better performance than other related algorithms, and also performs successfully, unlike others, the association rules mining in large databases.
|
2 |
Mineração multirrelacional de regras de associação em grandes bases de dados /Oyama, Fernando Takeshi. January 2010 (has links)
Orientador: Carlos Roberto Valêncio / Banca: Cristina Dutra de Aguiar Ciferri / Banca: Rogéria Cristiane Gratão de Souza / Resumo: O crescente avanço e a disponibilidade de recursos computacionais viabilizam o armazenamento e a manipulação de grandes bases de dados. As técnicas típicas de mineração de dados possibilitam a extração de padrões desde que os dados estejam armazenados em uma única tabela. A mineração de dados multirrelacional, por sua vez, apresenta-se como uma abordagem mais recente que permite buscar padrões provenientes de múltiplas tabelas, sendo indicada para a aplicação em bases de dados relacionais. No entanto, os algoritmos multirrelacionais de mineração de regras de associação existentes tornam-se impossibilitados de efetuar a tarefa de mineração em grandes volumes de dados, uma vez que a quantia de memória exigida para a conclusão do processamento ultrapassa a quantidade disponível. O objetivo do presente trabalho consiste em apresentar um algoritmo multirrelacional de extração de regras de associação com o foco na aplicação em grandes bases de dados relacionais. Para isso, o algoritmo proposto, MR-RADIX, apresenta uma estrutura denominada Radix-tree que representa comprimidamente a base de dados em memória. Além disso, o algoritmo utiliza-se do conceito de particionamento para subdividir a base de dados, de modo que cada partição possa ser processada integralmente em memória. Os testes realizados demonstram que o algoritmo MR-RADIX proporciona um desempenho superior a outros algoritmos correlatos e, ainda, efetua com êxito, diferentemente dos demais, a mineração de regras de associação em grandes bases de dados. / Abstract: The increasing spread and availability of computing resources make feasible storage and handling of large databases. Traditional techniques of data mining allows the extraction of patterns provided that data is stored in a single table. The multi- relational data mining presents itself as a more recent approach that allows search patterns from multiple tables, indicated for use in relational databases. However, the existing multi-relational association rules mining algorithms become unable to make mining task in large data, since the amount of memory required for the completion of processing exceed the amount available. The goal of this work is to present a multi- relational algorithm for extracting association rules with focus application in large relational databases. For this the proposed algorithm MR-RADIX presents a structure called Radix-tree that represents compressly the database in memory. Moreover, the algorithm uses the concept of partitioning to subdivide the database, so that each partition can be processed entirely in memory. The tests show that the MR-RADIX algorithm provides better performance than other related algorithms, and also performs successfully, unlike others, the association rules mining in large databases. / Mestre
|
3 |
Learning probabilistic relational models: a novel approach. / Aprendendo modelos probabilísticos relacionais: uma nova abordagem.Mormille, Luiz Henrique Barbosa 17 August 2018 (has links)
While most statistical learning methods are designed to work with data stored in a single table, many large datasets are stored in relational database systems. Probabilistic Relational Models (PRM) extend Bayesian networks by introducing relations and individuals, thus making it possible to represent information in a relational database. However, learning a PRM from relational data is a more complex task than learning a Bayesian Network from \"flat\" data. The main difficulties that arise while learning a PRM are establishing what are the legal dependency structures, searching for possible structures, and scoring them. This thesis focuses on the development of a novel approach to learn the structure of a PRM, describes a package in the R language to support the learning framework, and applies it to a real, large scale scenario of a city named Atibaia, in the state of São Paulo, Brazil. The research is based on a database combining three different tables, each representing one class in the domain of study. The first table contains 27 attributes from 110,816 citizens of Atibaia. The second table contains 9 attributes from 20,162 companies located in the city. And finally, the third table has 8 attributes from 327 census sectors (small territorial units that comprise the city of Atibaia). The proposed framework is applied to learn a PRM structure and parameters from the database. The model is used to verify if the Social Class of a person can be explained by the location where they live, their neighbors, and the companies nearby. Preliminary experiments have been conducted and a paper published in the 2017 Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). The algorithm performance was further evaluated by extensive experimentation, and a broader study using Serasa Experian data was conducted. Finally, the package in the R language that supports our method was refined along with proper documentation and a tutorial. / Embora a maioria dos métodos de aprendizado estatístico tenha sido desenvolvida para se trabalhar com dados armazenados em uma única tabela, muitas bases de dados estão armazenadas em bancos de dados relacionais. Modelos Probabilísticos Relacionai (PRM) estendem Redes Bayesianas introduzindo relações e indivíduos, tornando possível a representação de informação em uma base de dados relacional. Entretanto, aprender um PRM através de dados relacionais é uma tarefa mais complexa que aprender uma Rede Bayesiana de uma única tabela. As maiores dificuldades que se impõe enquanto se aprende um PRM são estabelecer quais são as estruturas de dependência legais, procurar por possíveis estruturas, e avalia-las. Esta tese foca em desenvolver um novo método de aprendizado de estruturas de PRM, descrever um pacote na linguagem R que suporte este método e aplica-lo a um cenário real e de grande escala, a cidade de Atibaia, no estado de São Paulo, Brasil. Esta pesquisa está baseada em uma base de dados combinando três tabelas distintas, cada uma representando uma classe no domínio de estudo. A primeira tabela contém 27 atributos de 110.816 habitantes de Atibaia, e a segunda tabela contém 9 atributos de 20.162 empresas da cidade. Por fim, a terceira tabela possui 8 atributos para 327 setores censitários (pequenas unidades territoriais que formam a cidade de Atibaia). A proposta é aplicada para aprender-se a estrutura de um PRM e seus parâmetros através desta base de dados. O modelo foi utilizado para verificar se a classe social de uma pessoa pode ser explicada pelo local onde ela vive, seus vizinhos e as companhias próximas. Experimentos preliminares foram conduzidos e um artigo foi publicado no Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). O desempenho do algoritmo foi reavaliada através de extensiva experimentação, e um estudo mais amplo foi conduzido com os dados da Serasa Experian. Por fim, o pacote em R que suporta o método proposto foi refinado, e documentação e tutorial apropriado foram descritos.
|
4 |
An Ilp-based Concept Discovery System For Multi-relational Data MiningKavurucu, Yusuf 01 July 2009 (has links) (PDF)
Multi Relational Data Mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. However, as patterns involve multiple relations, the search space of possible hypothesis becomes
intractably complex. In order to cope with this problem, several relational knowledge discovery systems have been developed employing various search strategies, heuristics and
language pattern limitations.
In this thesis, Inductive Logic Programming (ILP) based concept discovery is studied and two systems based on a hybrid methodology employing ILP and APRIORI, namely Confidence-based Concept Discovery and Concept Rule Induction System, are proposed. In Confidence-based Concept Discovery and Concept Rule Induction System, the main aim
is to relax the strong declarative biases and user-defined specifications. Moreover, this new method directly works on relational databases. In addition to this, the traditional definition
of confidence from relational database perspective is modified to express Closed World Assumption in first-order logic. A new confidence-based pruning method based on the improved definition is applied in the APRIORI lattice. Moreover, a new hypothesis evaluation criterion is used for expressing the quality of patterns in the search space. In addition to this, in Concept
Rule Induction System, the constructed rule quality is further improved by using an improved generalization metod.
Finally, a set of experiments are conducted on real-world problems to evaluate the performance of the proposed method with similar systems in terms of support and confidence.
|
5 |
Probabilistic Modeling of Multi-relational and Multivariate Discrete DataWu, Hao 07 February 2017 (has links)
Modeling and discovering knowledge from multi-relational and multivariate discrete data is a crucial task that arises in many research and application domains, e.g. text mining, intelligence analysis, epidemiology, social science, etc. In this dissertation, we study and address three problems involving the modeling of multi-relational discrete data and multivariate multi-response count data, viz. (1) discovering surprising patterns from multi-relational data, (2) constructing a generative model for multivariate categorical data, and (3) simultaneously modeling multivariate multi-response count data and estimating covariance structures between multiple responses.
To discover surprising multi-relational patterns, we first study the ``where do I start?'' problem originating from intelligence analysis. By studying nine methods with origins in association analysis, graph metrics, and probabilistic modeling, we identify several classes of algorithmic strategies that can supply starting points to analysts, and thus help to discover interesting multi-relational patterns from datasets. To actually mine for interesting multi-relational patterns, we represent the multi-relational patterns as dense and well-connected chains of biclusters over multiple relations, and model the discrete data by the maximum entropy principle, such that in a statistically well-founded way we can gauge the surprisingness of a discovered bicluster chain with respect to what we already know. We design an algorithm for approximating the most informative multi-relational patterns, and provide strategies to incrementally organize discovered patterns into the background model. We illustrate how our method is adept at discovering the hidden plot in multiple synthetic and real-world intelligence analysis datasets. Our approach naturally generalizes traditional attribute-based maximum entropy models for single relations, and further supports iterative, human-in-the-loop, knowledge discovery.
To build a generative model for multivariate categorical data, we apply the maximum entropy principle to propose a categorical maximum entropy model such that in a statistically well-founded way we can optimally use given prior information about the data, and are unbiased otherwise. Generally, inferring the maximum entropy model could be infeasible in practice. Here, we leverage the structure of the categorical data space to design an efficient model inference algorithm to estimate the categorical maximum entropy model, and we demonstrate how the proposed model is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and US census datasets, and demonstrate its feasibility using an epidemic simulation application.
Modeling data with multivariate count responses is a challenging problem due to the discrete nature of the responses. Existing methods for univariate count responses cannot be easily extended to the multivariate case since the dependency among multiple responses needs to be properly accounted for. To model multivariate data with multiple count responses, we propose a novel multivariate Poisson log-normal model (MVPLN). By simultaneously estimating the regression coefficients and inverse covariance matrix over the latent variables with an efficient Monte Carlo EM algorithm, the proposed model takes advantages of association among multiple count responses to improve the model prediction accuracy. Simulation studies and applications to real world data are conducted to systematically evaluate the performance of the proposed method in comparison with conventional methods. / Ph. D. / In this decade of big data, massive data of various types are generated every day from different research areas and industry sectors. Among all these types of data, text data, i.e. text documents, are important to many research and real world applications. One challenge faced when analyzing massive text data is which documents we should investigate first to initialize the analysis and how to identify stories and plots, if any, that hide inside the massive text documents. For example, in intelligence analysis, when analyzing intelligence documents, some common questions that analysts ask are ‘How is a suspect connected to the passenger manifest on this flight?’ and ‘How do distributed terrorist cells interface with each other?’. This is a crucial task so called storytelling. In the first half of this dissertation, we will study this problem and design mathematical models and computer algorithms to automatically identify useful information from text data to help analysts to discover hidden stories and plots from massive text documents. We also incorporate visual analytics techniques and design a visualization system to support human-in-the-loop exploratory data analysis so that analysts could interact with the algorithms and models iteratively to investigate given datasets.
In the second half of this dissertation, we study two problems that arise from the domain of public health. When epidemic of certain disease happens, e.g. flu seasons, public health officials need to make certain policies in advance to prevent or alleviate the epidemic. A data-driven approach would be to make such public health policies using simulation results and predictions based on historical data. One problem usually faced in epidemic simulation is that researchers would like to run simulations with real-world data so that the simulation results can be close to real-world scenarios but at the same time protect the private information of individuals. To solve this problem, we design and implement a mathematical model that could generate realistic sythetic population using U.S. Census Survey to help conduct the epidemic simulation. Using flus as an example, we also propose a mathematical model to study associations between different types of flus with the information collected from social media, like Twitter. We believe that identifying such associations between different types of flus will help officials to make appropriate public health policies.
|
6 |
Learning with Markov logic networks : transfer learning, structure learning, and an application to Web query disambiguationMihalkova, Lilyana Simeonova 18 March 2011 (has links)
Traditionally, machine learning algorithms assume that training data is provided as a set of independent instances, each of which can be described as a feature vector. In contrast, many domains of interest are inherently multi-relational, consisting of entities connected by a rich set of relations. For example, the participants in a social network are linked by friendships, collaborations, and shared interests. Likewise, the users of a search engine are related by searches for similar items and clicks to shared sites. The ability to model and reason about such relations is essential not only because better predictive accuracy is achieved by exploiting this additional information, but also because frequently the goal is to predict whether a set of entities are related in a particular way. This thesis falls within the area of Statistical Relational Learning (SRL), which combines ideas from two traditions within artificial intelligence, first-order logic and probabilistic graphical models to address the challenge of learning from multi-relational data. We build on one particular SRL model, Markov logic networks (MLNs), which consist of a set of weighted first-order-logic formulae and provide a principled way of defining a probability distribution over possible worlds. We develop algorithms for learning of MLN structure both from scratch and by transferring a previously learned model, as well as an application of MLNs to the problem of Web query disambiguation. The ideas we present are unified by two main themes: the need to deal with limited training data and the use of bottom-up learning techniques. Structure learning, the task of automatically acquiring a set of dependencies among the relations in the domain, is a central problem in SRL. We introduce BUSL, an algorithm for learning MLN structure from scratch that proceeds in a more bottom-up fashion, breaking away from the tradition of top-down learning typical in SRL. Our approach first constructs a novel data structure called a Markov network template that is used to restrict the search space for clauses. Our experiments in three relational domains demonstrate that BUSL dramatically reduces the search space for clauses and attains a significantly higher accuracy than a structure learner that follows a top-down approach. Accurate and efficient structure learning can also be achieved by transferring a model obtained in a source domain related to the current target domain of interest. We view transfer as a revision task and present an algorithm that diagnoses a source MLN to determine which of its parts transfer directly to the target domain and which need to be updated. This analysis focuses the search for revisions on the incorrect portions of the source structure, thus speeding up learning. Transfer learning is particularly important when target-domain data is limited, such as when data on only a few individuals is available from domains with hundreds of entities connected by a variety of relations. We also address this challenging case and develop a general transfer learning approach that makes effective use of such limited target data in several social network domains. Finally, we develop an application of MLNs to the problem of Web query disambiguation in a more privacy-aware setting where the only information available about a user is that captured in a short search session of 5-6 previous queries on average. This setting contrasts with previous work that typically assumes the availability of long user-specific search histories. To compensate for the scarcity of user-specific information, our approach exploits the relations between users, search terms, and URLs. We demonstrate the effectiveness of our approach in the presence of noise and show that it outperforms several natural baselines on a large data set collected from the MSN search engine. / text
|
7 |
Modèles d'embeddings à valeurs complexes pour les graphes de connaissances / Complex-Valued Embedding Models for Knowledge GraphsTrouillon, Théo 29 September 2017 (has links)
L'explosion de données relationnelles largement disponiblessous la forme de graphes de connaissances a permisle développement de multiples applications, dont les agents personnels automatiques,les systèmes de recommandation et l'amélioration desrésultats de recherche en ligne.La grande taille et l'incomplétude de ces bases de donnéesnécessite le développement de méthodes de complétionautomatiques pour rendre ces applications viables.La complétion de graphes de connaissances, aussi appeléeprédiction de liens, se doit de comprendre automatiquementla structure des larges graphes de connaissances (graphes dirigéslabellisés) pour prédire les entrées manquantes (les arêtes labellisées).Une approche gagnant en popularité consiste à représenter ungraphe de connaissances comme un tenseur d'ordre 3, etd'utiliser des méthodes de décomposition de tenseur pourprédire leurs entrées manquantes.Les modèles de factorisation existants proposent différentscompromis entre leur expressivité, et leur complexité en temps et en espace.Nous proposons un nouveau modèle appelé ComplEx, pour"Complex Embeddings", pour réconcilier expressivité etcomplexité par l'utilisation d'une factorisation en nombre complexes,dont nous explorons le lien avec la diagonalisation unitaire.Nous corroborons notre approche théoriquement en montrantque tous les graphes de connaissances possiblespeuvent être exactement décomposés par le modèle proposé.Notre approche, basées sur des embeddings complexesreste simple, car n'impliquant qu'un produit trilinéaire complexe,là où d'autres méthodes recourent à des fonctions de compositionde plus en plus compliquées pour accroître leur expressivité.Le modèle proposé ayant une complexité linéaire en tempset en espace est passable à l'échelle, tout endépassant les approches existantes sur les jeux de données de référencepour la prédiction de liens.Nous démontrons aussi la capacité de ComplEx àapprendre des représentations vectorielles utiles pour d'autres tâches,en enrichissant des embeddings de mots, qui améliorentles prédictions sur le problème de traitement automatiquedu langage d'implication entre paires de phrases.Dans la dernière partie de cette thèse, nous explorons lescapacités de modèles de factorisation à apprendre lesstructures relationnelles à partir d'observations.De part leur nature vectorielle,il est non seulement difficile d'interpréter pourquoicette classe de modèles fonctionne aussi bien,mais aussi où ils échouent et comment ils peuventêtre améliorés. Nous conduisons une étude expérimentalesur les modèles de l'état de l'art, non pas simplementpour les comparer, mais pour comprendre leur capacitésd'induction. Pour évaluer les forces et faiblessesde chaque modèle, nous créons d'abord des tâches simplesreprésentant des propriétés atomiques despropriétés des relations des graphes de connaissances ;puis des tâches représentant des inférences multi-relationnellescommunes au travers de généalogies synthétisées.À partir de ces résultatsexpérimentaux, nous proposons de nouvelles directionsde recherches pour améliorer les modèles existants,y compris ComplEx. / The explosion of widely available relational datain the form of knowledge graphsenabled many applications, including automated personalagents, recommender systems and enhanced web search results.The very large size and notorious incompleteness of these data basescalls for automatic knowledge graph completion methods to make these applicationsviable. Knowledge graph completion, also known as link-prediction,deals with automatically understandingthe structure of large knowledge graphs---labeled directed graphs---topredict missing entries---labeled edges. An increasinglypopular approach consists in representing knowledge graphs as third-order tensors,and using tensor factorization methods to predict their missing entries.State-of-the-art factorization models propose different trade-offs between modelingexpressiveness, and time and space complexity. We introduce a newmodel, ComplEx---for Complex Embeddings---to reconcile both expressivenessand complexity through the use of complex-valued factorization, and exploreits link with unitary diagonalization.We corroborate our approach theoretically and show that all possibleknowledge graphs can be exactly decomposed by the proposed model.Our approach based on complex embeddings is arguably simple,as it only involves a complex-valued trilinear product,whereas other methods resort to more and more complicated compositionfunctions to increase their expressiveness. The proposed ComplEx model isscalable to large data sets as it remains linear in both space and time, whileconsistently outperforming alternative approaches on standardlink-prediction benchmarks. We also demonstrateits ability to learn useful vectorial representations for other tasks,by enhancing word embeddings that improve performanceson the natural language problem of entailment recognitionbetween pair of sentences.In the last part of this thesis, we explore factorization models abilityto learn relational patterns from observed data.By their vectorial nature, it is not only hard to interpretwhy this class of models works so well,but also to understand where they fail andhow they might be improved. We conduct an experimentalsurvey of state-of-the-art models, not towardsa purely comparative end, but as a means to get insightabout their inductive abilities.To assess the strengths and weaknesses of each model, we create simple tasksthat exhibit first, atomic properties of knowledge graph relations,and then, common inter-relational inference through synthetic genealogies.Based on these experimental results, we propose new researchdirections to improve on existing models, including ComplEx.
|
8 |
Apprentissage des réseaux de neurones profonds et applications en traitement automatique de la langue naturelleGlorot, Xavier 11 1900 (has links)
En apprentissage automatique, domaine qui consiste à utiliser des données pour apprendre une solution aux problèmes que nous voulons confier à la machine, le modèle des Réseaux de Neurones Artificiels (ANN) est un outil précieux. Il a été inventé voilà maintenant près de soixante ans, et pourtant, il est encore de nos jours le sujet d'une recherche active. Récemment, avec l'apprentissage profond, il a en effet permis d'améliorer l'état de l'art dans de nombreux champs d'applications comme la vision par ordinateur, le traitement de la parole et le traitement des langues naturelles.
La quantité toujours grandissante de données disponibles et les améliorations du matériel informatique ont permis de faciliter l'apprentissage de modèles à haute capacité comme les ANNs profonds. Cependant, des difficultés inhérentes à l'entraînement de tels modèles, comme les minima locaux, ont encore un impact important. L'apprentissage profond vise donc à trouver des solutions, en régularisant ou en facilitant l'optimisation. Le pré-entraînnement non-supervisé, ou la technique du ``Dropout'', en sont des exemples.
Les deux premiers travaux présentés dans cette thèse suivent cette ligne de recherche. Le premier étudie les problèmes de gradients diminuants/explosants dans les architectures profondes. Il montre que des choix simples, comme la fonction d'activation ou l'initialisation des poids du réseaux, ont une grande influence. Nous proposons l'initialisation normalisée pour faciliter l'apprentissage. Le second se focalise sur le choix
de la fonction d'activation et présente le rectifieur, ou unité rectificatrice linéaire. Cette étude a été la première à mettre l'accent sur les fonctions d'activations linéaires par morceaux pour les réseaux de neurones profonds en apprentissage supervisé. Aujourd'hui, ce type de fonction d'activation est une composante essentielle des réseaux de neurones profonds.
Les deux derniers travaux présentés se concentrent sur les applications des ANNs en traitement des langues naturelles. Le premier aborde le sujet de l'adaptation de domaine pour l'analyse de sentiment, en utilisant des Auto-Encodeurs Débruitants. Celui-ci est encore l'état de l'art de nos jours. Le second traite de l'apprentissage de données multi-relationnelles avec un modèle à base d'énergie, pouvant être utilisé pour la tâche
de désambiguation de sens. / Machine learning aims to leverage data in order for computers to solve problems of interest. Despite being invented close to sixty years ago, Artificial Neural Networks (ANN) remain an area of active research and a powerful tool. Their resurgence in the context of deep learning has led to dramatic improvements in various domains from computer vision and speech processing to natural language processing.
The quantity of available data and the computing power are always increasing, which is desirable to train high capacity models such as deep ANNs. However, some intrinsic learning difficulties, such as local minima, remain problematic. Deep learning aims to find solutions to these problems, either by adding some regularisation or improving optimisation. Unsupervised pre-training or Dropout are examples of such solutions.
The two first articles presented in this thesis follow this line of research. The first analyzes the problem of vanishing/exploding gradients in deep architectures. It shows that simple choices, like the activation function or the weights initialization, can have an important impact. We propose the normalized initialization scheme to improve learning. The second focuses on the activation function, where we propose the rectified linear unit. This work was the first to emphasise the use of linear by parts activation functions for deep supervised neural networks, which is now an essential component of such models.
The last two papers show some applications of ANNs to Natural Language Processing. The first focuses on the specific subject of domain adaptation in the context of sentiment analysis, using Stacked Denoising Auto-encoders. It remains state of the art to this day. The second tackles learning with multi-relational data using an energy based model which can also be applied to the task of word-sense disambiguation.
|
Page generated in 0.1031 seconds