Spelling suggestions: "subject:"[een] MIXTURE OF EXPERTS"" "subject:"[enn] MIXTURE OF EXPERTS""
11 |
A Mixture-of-Experts Approach for Gene Regulatory Network InferenceShao, Borong January 2014 (has links)
Context. Gene regulatory network (GRN) inference is an important and challenging problem in bioinformatics. A variety of machine learning algorithms have been applied to increase the GRN inference accuracy. Ensemble learning methods are shown to yield a higher inference accuracy than individual algorithms. Objectives. We propose an ensemble GRN inference method, which is based on the principle of Mixture-of-Experts ensemble learning. The proposed method can quantitatively measure the accuracy of individual GRN inference algorithms at the network motifs level. Based on the accuracy of the individual algorithms at predicting different types of network motifs, weights are assigned to the individual algorithms so as to take advantages of their strengths and weaknesses. In this way, we can improve the accuracy of the ensemble prediction. Methods. The research methodology is controlled experiment. The independent variable is method. It has eight groups: five individual algorithms, the generic average ranking method used in the DREAM5 challenge, the proposed ensemble method including four types of network motifs and five types of network motifs. The dependent variable is GRN inference accuracy, measured by the area under the precision-recall curve (AUPR). The experiment has training and testing phases. In the training phase, we analyze the accuracy of five individual algorithms at the network motifs level to decide their weights. In the testing phase, the weights are used to combine predictions from the five individual algorithms to generate ensemble predictions. We compare the accuracy of the eight method groups on Escherichia coli microarray dataset using AUPR. Results. In the training phase, we obtain the AUPR values of the five individual algorithms at predicting each type of the network motifs. In the testing phase, we collect the AUPR values of the eight methods on predicting the GRN of the Escherichia coli microarray dataset. Each method group has a sample size of ten (ten AUPR values). Conclusions. Statistical tests on the experiment results show that the proposed method yields a significantly higher accuracy than the generic average ranking method. In addition, a new type of network motif is found in GRN, the inclusion of which can increase the accuracy of the proposed method significantly. / Genes are DNA molecules that control the biological traits and biochemical processes that comprise life. They interact with each other to realize the precise regulation of life activities. Biologists aim to understand the regulatory network among the genes, with the help of high-throughput techonologies, such as microarrays, RNA-seq, etc. These technologies produce large amount of gene expression data which contain useful information. Therefore, effective data mining is necessary to discover the information to promote biological research. Gene regulatory network (GRN) inference is to infer the gene interactions from gene expression data, such as microarray datasets. The inference results can be used to guide the direction of further experiments to discover or validate gene interactions. A variety of machine learning (data mining) methods have been proposed to solve this problem. In recent years, experiments have shown that ensemble learning methods achieve higher accuracy than the individual learning methods. Because the ensemble learning methods can take advantages of the strength of different individual methods and it is robust to different network structures. In this thesis, we propose an ensemble GRN inference method, which is based on the principle of the Mixture-of-Experts ensemble learning. By quantitatively measure the accuracy of individual methods at the network motifs level, the proposed method is able to take advantage of the complementarity among the individual methods. The proposed method yields a significantly higher accuracy than the generic average ranking method, which is the most accurate method out of 35 GRN inference methods in the DREAM5 challenge. / 0769607980
|
12 |
Restructuring partitioned knowledge : evidence of strategy retention in category learningSewell, David K January 2008 (has links)
A recurring theme in the cognitive development literature is the notion that people restructure their task knowledge as they develop increasingly sophisticated strategies. A large body of empirical literature spanning several domains suggests that in some cases, the process of knowledge restructuring is best characterized by a process of sequentially replacing old strategies with newer ones. In other cases, restructuring appears to be better characterized as a process involving changes in the way partial knowledge elements are selectively applied to a task. Critically, the former, but not the latter position, suggests that it may be quite difficult for people to revert to using an old strategy after restructuring has already occurred. The three experiments reported herein suggest that knowledge restructuring observed in experimental settings is aptly characterized by a process of strategy retention. Specifically, people are shown to readily revert to using an old categorization strategy even after demonstrably having restructured their knowledge, suggesting that knowledge is best conceptualized as having a heterogeneous structure. Formal modeling further supports this interpretation of the empirical results, and highlights the important role of selective attention in determining the manifest response strategy. The implications of these findings are discussed in terms of an overarching mixture-of-experts framework of knowledge representation.
|
13 |
Essays in empirical financeFaria, Adriano Augusto de 16 March 2017 (has links)
Submitted by Adriano Faria (afaria@fgvmail.br) on 2017-12-13T19:49:29Z
No. of bitstreams: 1
Tese_deFaria.pdf: 3657553 bytes, checksum: 11ec67914c866ca46d83c67c1592e093 (MD5) / Approved for entry into archive by GILSON ROCHA MIRANDA (gilson.miranda@fgv.br) on 2017-12-21T11:41:13Z (GMT) No. of bitstreams: 1
Tese_deFaria.pdf: 3657553 bytes, checksum: 11ec67914c866ca46d83c67c1592e093 (MD5) / Made available in DSpace on 2017-12-27T12:18:22Z (GMT). No. of bitstreams: 1
Tese_deFaria.pdf: 3657553 bytes, checksum: 11ec67914c866ca46d83c67c1592e093 (MD5)
Previous issue date: 2017-03-16 / This thesis is a collection of essays in empirical finance mainly focused on term structure models. In the first three chapters, we developed methods to extract the yield curve from government and corporate bonds. We measure the performance of such methods in pricing, Value at Risk and forecasting exercises. In its turn, the last chapter brings a discussion about the effects of different metrics of the optimal portfolio on the estimation of a CCAPM model.In the first chapter, we propose a segmented model to deal with the seasonalities appearing in real yield curves. In different markets, the short end of the real yield curve is influenced by seasonalities of the price index that imply a lack of smoothness in this segment. Borrowing from the flexibility of spline models, a B-spline function is used to fit the short end of the yield curve, while the medium and the long end are captured by a parsimonious parametric four-factor exponential model. We illustrate the benefits of the proposed term structure model by estimating real yield curves in one of the biggest government index-linked bond markets in the world. Our model is simultaneously able to fit the yield curve and to provide unbiased Value at Risk estimates for different portfolios of bonds negotiated in this market.Chapter 2 introduces a novel framework for the estimation of corporate bond spreads based on mixture models. The modeling methodology allows us to enhance the informational content used to estimate the firm level term structure by clustering firms together using observable firm characteristics. Our model builds on the previous literature linking firm level characteristics to credit spreads. Specifically, we show that by clustering firms using their observable variables, instead of the traditional matrix pricing (cluster by rating/sector), it is possible to achieve gains of several orders of magnitude in terms of bond pricing. Empirically, we construct a large panel of firm level explanatory variables based on results from a handful of previous research and evaluate their performance in explaining credit spread differences. Relying on panel data regressions we identify the most significant factors driving the credit spreads to include in our term structure model. Using this selected sample, we show that our methodology significantly improves in sample fitting as well as produces reliable out of sample price estimations when compared to the traditional models.Chapter 3 brings the paper “Forecasting the Brazilian Term Structure Using Macroeconomic Factors”, published in Brazilian Review of Econometrics (BRE). This paper studies the forecasting of the Brazilian interest rate term structure using common factors from a wide database of macroeconomic series, from the period of January 2000 to May 2012. Firstly the model proposed by Moench (2008) is implemented, in which the dynamic of the short term interest rate is modeled using a Factor Augmented VAR and the term structure is derived using the restrictions implied by no-arbitrage. Similarly to the original study, this model resulted in better predictive performance when compared to the usual benchmarks, but presented deterioration of the results with increased maturity. To avoid this problem, we proposed that the dynamic of each rate be modeled in conjunction with the macroeconomic factors, thus eliminating the no-arbitrage restrictions. This attempt produced superior forecasting results. Finally, the macro factors were inserted in a parsimonious parametric three-factor exponential model.The last chapter presents the paper “Empirical Selection of Optimal Portfolios and its Influence in the Estimation of Kreps-Porteus Utility Function Parameters”, also published in BRE. This paper investigates the effects on the estimation of parameters related to the elasticity of intertemporal substitution and risk aversion, of the selection of different portfolios to represent the optimal aggregate wealth endogenously derived in equilibrium models with Kreps-Porteus recursive utility. We argue that the usual stock market wide index is not a good portfolio to represent optimal wealth of the representative agent, and we propose as an alternative the portfolio from the Investment Fund Industry. Especially for Brazil, where that industry invests most of its resources in fixed income, the aforementioned substitution of the optimal proxy portfolio caused a significant increase in the risk aversion coefficient and the elasticity of the intertemporal substitution in consumption.
|
14 |
Cooperative coevolutionary mixture of experts : a neuro ensemble approach for automatic decomposition of classification problemsNguyen, Minh Ha, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2006 (has links)
Artificial neural networks have been widely used for machine learning and optimization. A neuro ensemble is a collection of neural networks that works cooperatively on a problem. In the literature, it has been shown that by combining several neural networks, the generalization of the overall system could be enhanced over the separate generalization ability of the individuals. Evolutionary computation can be used to search for a suitable architecture and weights for neural networks. When evolutionary computation is used to evolve a neuro ensemble, it is usually known as evolutionary neuro ensemble. In most real-world problems, we either know little about these problems or the problems are too complex to have a clear vision on how to decompose them by hand. Thus, it is usually desirable to have a method to automatically decompose a complex problem into a set of overlapping or non-overlapping sub-problems and assign one or more specialists (i.e. experts, learning machines) to each of these sub-problems. An important feature of neuro ensemble is automatic problem decomposition. Some neuro ensemble methods are able to generate networks, where each individual network is specialized on a unique sub-task such as mapping a subspace of the feature space. In real world problems, this is usually an important feature for a number of reasons including: (1) it provides an understanding of the decomposition nature of a problem; (2) if a problem changes, one can replace the network associated with the sub-space where the change occurs without affecting the overall ensemble; (3) if one network fails, the rest of the ensemble can still function in their sub-spaces; (4) if one learn the structure of one problem, it can potentially be transferred to other similar problems. In this thesis, I focus on classification problems and present a systematic study of a novel evolutionary neuro ensemble approach which I call cooperative coevolutionary mixture of experts (CCME). Cooperative coevolution (CC) is a branch of evolutionary computation where individuals in different populations cooperate to solve a problem and their fitness function is calculated based on their reciprocal interaction. The mixture of expert model (ME) is a neuro ensemble approach which can generate networks that are specialized on different sub-spaces in the feature space. By combining CC and ME, I have a powerful framework whereby it is able to automatically form the experts and train each of them. I show that the CCME method produces competitive results in terms of generalization ability without increasing the computational cost when compared to traditional training approaches. I also propose two different mechanisms for visualizing the resultant decomposition in high-dimensional feature spaces. The first mechanism is a simple one where data are grouped based on the specialization of each expert and a color-map of the data records is visualized. The second mechanism relies on principal component analysis to project the feature space onto lower dimensions, whereby decision boundaries generated by each expert are visualized through convex approximations. I also investigate the regularization effect of learning by forgetting on the proposed CCME. I show that learning by forgetting helps CCME to generate neuro ensembles of low structural complexity while maintaining their generalization abilities. Overall, the thesis presents an evolutionary neuro ensemble method whereby (1) the generated ensemble generalizes well; (2) it is able to automatically decompose the classification problem; and (3) it generates networks with small architectures.
|
15 |
[en] THE LINEAR LOCAL-GLOBAL NEURAL NETWORK MODEL / [pt] O MODELO DE REDES NEURAIS GLOBAIS-LOCAISMAYTE SUAREZ FARINAS 02 July 2003 (has links)
[pt] Nesta tese apresenta-se o Modelo de Redes Neurais Globais-
Locais (RNGL) dentro do contexto de modelos de séries
temporais. Esta formulação abrange alguns modelos não-
lineares já existentes e admite também o enfoque de Mistura
de Especialistas. Dedica-se especial atenção ao caso de
especialistas lineares, e são discutidos extensivamente
aspectos teóricos do modelo: condições de estacionariedade,
identificabilidade do modelo, existência, consistência e
normalidade assintótica dos estimadores dos parâmetros.
Considera-se também uma estratégia de construção do modelo
e são discutidos os procedimentos numéricos de estimação,
apresentando uma solução para o cálculo de valores
iniciais. Finalmente, ilustra-se a metodologia apresentada
em duas séries temporais reais, amplamente utilizada na
literatura de modelos não lineares. / [en] In this thesis, the Local Global Neural Networks model is
proposed within the context of time series models. This
formulation encompasses some already existing nonlinear
models and also admits the Mixture of Experts approach. We
place emphasis on the linear expert case and extensively
discuss the theoretical aspects of the model: stationary
conditions, existence, consistency and asymptotic normality
of the parameter estimates, and model identifiability. A
model building strategy is also considered and the whole
procedure is illustrated with two real time-series.
|
16 |
Distributed conditional computationLéonard, Nicholas 08 1900 (has links)
L'objectif de cette thèse est de présenter différentes applications du programme de recherche de calcul conditionnel distribué.
On espère que ces applications, ainsi que la théorie présentée ici, mènera à une solution générale du problème
d'intelligence artificielle, en particulier en ce qui a trait à la nécessité d'efficience.
La vision du calcul conditionnel distribué consiste à accélérer l'évaluation et l'entraînement de modèles profonds,
ce qui est très différent de l'objectif usuel d'améliorer sa capacité de généralisation et d'optimisation.
Le travail présenté ici a des liens étroits avec les modèles de type mélange d'experts.
Dans le chapitre 2, nous présentons un nouvel algorithme d'apprentissage profond qui
utilise une forme simple d'apprentissage par renforcement sur un modèle d'arbre de décisions à base
de réseau de neurones. Nous démontrons la nécessité d'une contrainte d'équilibre pour maintenir la
distribution d'exemples aux experts uniforme et empêcher les monopoles. Pour rendre le calcul efficient,
l'entrainement et l'évaluation sont contraints à être éparse en utilisant un routeur échantillonnant
des experts d'une distribution multinomiale étant donné un exemple.
Dans le chapitre 3, nous présentons un nouveau modèle profond constitué d'une représentation
éparse divisée en segments d'experts. Un modèle de langue à base de réseau de neurones est construit à partir
des transformations éparses entre ces segments. L'opération éparse par bloc est implémentée pour utilisation
sur des cartes graphiques. Sa vitesse est comparée à deux opérations denses du même calibre pour démontrer
le gain réel de calcul qui peut être obtenu. Un modèle profond utilisant des opérations éparses contrôlées
par un routeur distinct des experts est entraîné sur un ensemble de données d'un milliard de mots.
Un nouvel algorithme de partitionnement de données est appliqué sur un ensemble de mots pour
hiérarchiser la couche de sortie d'un modèle de langage, la rendant ainsi beaucoup plus efficiente.
Le travail présenté dans cette thèse est au centre de la vision de calcul conditionnel distribué
émis par Yoshua Bengio. Elle tente d'appliquer la recherche dans le domaine des mélanges d'experts
aux modèles profonds pour améliorer leur vitesse ainsi que leur capacité d'optimisation.
Nous croyons que la théorie et les expériences de cette thèse sont une étape importante sur
la voie du calcul conditionnel distribué car elle cadre bien le problème, surtout en ce qui
concerne la compétitivité des systèmes d'experts. / The objective of this paper is to present different applications of the distributed conditional computation research program.
It is hoped that these applications and the theory presented here will lead to a general solution of the problem of
artificial intelligence, especially with regard to the need for efficiency.
The vision of distributed conditional computation is to accelerate the evaluation and training of deep models
which is very different from the usual objective of improving its generalization and optimization capacity.
The work presented here has close ties with mixture of experts models.
In Chapter 2, we present a new deep learning algorithm that
uses a form of reinforcement learning on a novel neural network decision tree model.
We demonstrate the need for a balancing constraint to keep the
distribution of examples to experts uniform and to prevent monopolies. To make the calculation efficient,
the training and evaluation are constrained to be sparse by using a gater that
samples experts from a multinomial distribution given examples.
In Chapter 3 we present a new deep model consisting of a
sparse representation divided into segments of experts.
A neural network language model is constructed from blocks of sparse transformations between these expert segments.
The block-sparse operation is implemented for use on graphics cards.
Its speed is compared with two dense operations of the same caliber to demonstrate
and measure the actual efficiency gain that can be obtained. A deep model using
these block-sparse operations controlled by a distinct gater is trained on a dataset of one billion words.
A new algorithm for data partitioning (clustering) is applied to a set of words to
organize the output layer of a language model into a conditional hierarchy, thereby making it much more efficient.
The work presented in this thesis is central to the vision of distributed conditional computation
as issued by Yoshua Bengio. It attempts to apply research in the area of
mixture of experts to deep models to improve their speed and their optimization capacity.
We believe that the theory and experiments of this thesis are an important step
on the path to distributed conditional computation because it provides a good framework for the problem,
especially concerning competitiveness inherent to systems of experts.
|
17 |
Predicting stock market trends using time-series classification with dynamic neural networksMocanu, Remus 09 1900 (has links)
L’objectif de cette recherche était d’évaluer l’efficacité du paramètre de classification pour prédire suivre les tendances boursières. Les méthodes traditionnelles basées sur la prévision, qui ciblent l’immédiat pas de temps suivant, rencontrent souvent des défis dus à des données non stationnaires, compromettant le modèle précision et stabilité. En revanche, notre approche de classification prédit une évolution plus large du cours des actions avec des mouvements sur plusieurs pas de temps, visant à réduire la non-stationnarité des données. Notre ensemble de données, dérivé de diverses actions du NASDAQ-100 et éclairé par plusieurs indicateurs techniques, a utilisé un mélange d'experts composé d'un mécanisme de déclenchement souple et d'une architecture basée sur les transformateurs. Bien que la méthode principale de cette expérience ne se soit pas révélée être aussi réussie que nous l'avions espéré et vu initialement, la méthodologie avait la capacité de dépasser toutes les lignes de base en termes de performance dans certains cas à quelques époques, en démontrant le niveau le plus bas taux de fausses découvertes tout en ayant un taux de rappel acceptable qui n'est pas zéro. Compte tenu de ces résultats, notre approche encourage non seulement la poursuite des recherches dans cette direction, dans lesquelles un ajustement plus précis du modèle peut être mis en œuvre, mais offre également aux personnes qui investissent avec l'aide de l'apprenstissage automatique un outil différent pour prédire les tendances boursières, en utilisant un cadre de classification et un problème défini différemment de la norme. Il est toutefois important de noter que notre étude est basée sur les données du NASDAQ-100, ce qui limite notre l’applicabilité immédiate du modèle à d’autres marchés boursiers ou à des conditions économiques variables. Les recherches futures pourraient améliorer la performance en intégrant les fondamentaux des entreprises et effectuer une analyse du sentiment sur l'actualité liée aux actions, car notre travail actuel considère uniquement indicateurs techniques et caractéristiques numériques spécifiques aux actions. / The objective of this research was to evaluate the classification setting's efficacy in predicting stock market trends. Traditional forecasting-based methods, which target the immediate next time step, often encounter challenges due to non-stationary data, compromising model accuracy and stability. In contrast, our classification approach predicts broader stock price movements over multiple time steps, aiming to reduce data non-stationarity. Our dataset, derived from various NASDAQ-100 stocks and informed by multiple technical indicators, utilized a Mixture of Experts composed of a soft gating mechanism and a transformer-based architecture. Although the main method of this experiment did not prove to be as successful as we had hoped and seen initially, the methodology had the capability in surpassing all baselines in certain instances at a few epochs, demonstrating the lowest false discovery rate while still having an acceptable recall rate. Given these results, our approach not only encourages further research in this direction, in which further fine-tuning of the model can be implemented, but also offers traders a different tool for predicting stock market trends, using a classification setting and a differently defined problem. It's important to note, however, that our study is based on NASDAQ-100 data, limiting our model's immediate applicability to other stock markets or varying economic conditions. Future research could enhance performance by integrating company fundamentals and conducting sentiment analysis on stock-related news, as our current work solely considers technical indicators and stock-specific numerical features.
|
Page generated in 0.0476 seconds