361 |
Apprentissage machine efficace : théorie et pratiqueDelalleau, Olivier 03 1900 (has links)
Malgré des progrès constants en termes de capacité de calcul, mémoire et quantité de données disponibles, les algorithmes d'apprentissage machine doivent se montrer efficaces dans l'utilisation de ces ressources. La minimisation des coûts est évidemment un facteur important, mais une autre motivation est la recherche de mécanismes d'apprentissage capables de reproduire le comportement d'êtres intelligents. Cette thèse aborde le problème de l'efficacité à travers plusieurs articles traitant d'algorithmes d'apprentissage variés : ce problème est vu non seulement du point de vue de l'efficacité computationnelle (temps de calcul et mémoire utilisés), mais aussi de celui de l'efficacité statistique (nombre d'exemples requis pour accomplir une tâche donnée).
Une première contribution apportée par cette thèse est la mise en lumière d'inefficacités statistiques dans des algorithmes existants. Nous montrons ainsi que les arbres de décision généralisent mal pour certains types de tâches (chapitre 3), de même que les algorithmes classiques d'apprentissage semi-supervisé à base de graphe (chapitre 5), chacun étant affecté par une forme particulière de la malédiction de la dimensionalité. Pour une certaine classe de réseaux de neurones, appelés réseaux sommes-produits, nous montrons qu'il peut être exponentiellement moins efficace de représenter certaines fonctions par des réseaux à une seule couche cachée, comparé à des réseaux profonds (chapitre 4). Nos analyses permettent de mieux comprendre certains problèmes intrinsèques liés à ces algorithmes, et d'orienter la recherche dans des directions qui pourraient permettre de les résoudre.
Nous identifions également des inefficacités computationnelles dans les algorithmes d'apprentissage semi-supervisé à base de graphe (chapitre 5), et dans l'apprentissage de mélanges de Gaussiennes en présence de valeurs manquantes (chapitre 6). Dans les deux cas, nous proposons de nouveaux algorithmes capables de traiter des ensembles de données significativement plus grands. Les deux derniers chapitres traitent de l'efficacité computationnelle sous un angle différent. Dans le chapitre 7, nous analysons de manière théorique un algorithme existant pour l'apprentissage efficace dans les machines de Boltzmann restreintes (la divergence contrastive), afin de mieux comprendre les raisons qui expliquent le succès de cet algorithme. Finalement, dans le chapitre 8 nous présentons une application de l'apprentissage machine dans le domaine des jeux vidéo, pour laquelle le problème de l'efficacité computationnelle est relié à des considérations d'ingénierie logicielle et matérielle, souvent ignorées en recherche mais ô combien importantes en pratique. / Despite constant progress in terms of available computational power, memory and amount of data, machine learning algorithms need to be efficient in how they use them. Although minimizing cost is an obvious major concern, another motivation is to attempt to design algorithms that can learn as efficiently as intelligent species. This thesis tackles the problem of efficient learning through various papers dealing with a wide range of machine learning algorithms: this topic is seen both from the point of view of computational efficiency (processing power and memory required by the algorithms) and of statistical efficiency (n
umber of samples necessary to solve a given learning task).The first contribution of this thesis is in shedding light on various statistical inefficiencies in existing algorithms. Indeed, we show that decision trees do not generalize well on tasks with some particular properties (chapter 3), and that a similar flaw affects typical graph-based semi-supervised learning algorithms (chapter 5). This flaw is a form of curse of dimensionality that is specific to each of these algorithms. For a subclass of neural networks, called sum-product networks, we prove that using networks with a single hidden layer can be exponentially less efficient than when using deep networks (chapter 4). Our analyses help better understand some inherent flaws found in these algorithms, and steer research towards approaches that may potentially overcome them.
We also exhibit computational inefficiencies in popular graph-based semi-supervised learning algorithms (chapter 5) as well as in the learning of mixtures of Gaussians with missing data (chapter 6). In both cases we propose new algorithms that make it possible to scale to much larger datasets. The last two chapters also deal with computational efficiency, but in different ways. Chapter 7 presents a new view on the contrastive divergence algorithm (which has been used for efficient training of restricted Boltzmann machines). It provides additional insight on the reasons why this algorithm has been so successful. Finally, in chapter 8 we describe an application of machine learning to video games, where computational efficiency is tied to software and hardware engineering constraints which, although often ignored in research papers, are ubiquitous in practice.
|
362 |
Evaluating perceptual maps of asymmetries for gait symmetry quantification and pathology detectionMoevus, Antoine 12 1900 (has links)
Le mouvement de la marche est un processus essentiel de l'activité
humaine et aussi le résultat de nombreuses interactions collaboratives
entre les systèmes neurologiques, articulaires et
musculo-squelettiques fonctionnant ensemble efficacement. Ceci
explique pourquoi une analyse de la marche est aujourd'hui de plus en
plus utilisée pour le diagnostic (et aussi la prévention) de
différents types de maladies (neurologiques, musculaires,
orthopédique, etc.). Ce rapport présente une nouvelle méthode pour
visualiser rapidement les différentes parties du corps humain liées à
une possible asymétrie (temporellement invariante par translation)
existant dans la démarche d'un patient pour une possible utilisation
clinique quotidienne. L'objectif est de fournir une méthode à la fois
facile et peu dispendieuse permettant la mesure et l'affichage visuel,
d'une manière intuitive et perceptive, des différentes parties
asymétriques d'une démarche. La méthode proposée repose sur
l'utilisation d'un capteur de profondeur peu dispendieux (la Kinect)
qui est très bien adaptée pour un diagnostique rapide effectué dans de
petites salles médicales car ce capteur est d'une part facile à
installer et ne nécessitant aucun marqueur. L'algorithme que nous
allons présenter est basé sur le fait que la marche saine possède des
propriétés de symétrie (relativement à une invariance temporelle) dans
le plan coronal. / The gait movement is an essential process of the human activity and
also the result of coordinated effort between the neurological,
articular and musculoskeletal systems. This motivates why gait
analysis is important and also increasingly used nowadays for the
(possible early) diagnosis of many different types (neurological,
muscular, orthopedic, etc.) of diseases. This paper introduces a
novel method to quickly visualize the different parts of the body
related to an asymmetric movement in the human gait of a patient for
daily clinical. The goal is to provide a cheap and easy-to-use method
to measure the gait asymmetry and display results in a perceptually
relevant manner. This method relies on an affordable consumer depth
sensor, the Kinect. The Kinect was chosen because this device is
amenable for use in small, confined area, like a living room. Also,
since it is marker-less, it provides a fast non-invasive diagnostic.
The algorithm we are going to introduce relies on the fact that a
healthy walk has (temporally shift-invariant) symmetry properties in
the coronal plane.
|
363 |
Machine learning via dynamical processes on complex networks / Aprendizado de máquina via processos dinâmicos em redes complexasCupertino, Thiago Henrique 20 December 2013 (has links)
Extracting useful knowledge from data sets is a key concept in modern information systems. Consequently, the need of efficient techniques to extract the desired knowledge has been growing over time. Machine learning is a research field dedicated to the development of techniques capable of enabling a machine to \"learn\" from data. Many techniques have been proposed so far, but there are still issues to be unveiled specially in interdisciplinary research. In this thesis, we explore the advantages of network data representation to develop machine learning techniques based on dynamical processes on networks. The network representation unifies the structure, dynamics and functions of the system it represents, and thus is capable of capturing the spatial, topological and functional relations of the data sets under analysis. We develop network-based techniques for the three machine learning paradigms: supervised, semi-supervised and unsupervised. The random walk dynamical process is used to characterize the access of unlabeled data to data classes, configuring a new heuristic we call ease of access in the supervised paradigm. We also propose a classification technique which combines the high-level view of the data, via network topological characterization, and the low-level relations, via similarity measures, in a general framework. Still in the supervised setting, the modularity and Katz centrality network measures are applied to classify multiple observation sets, and an evolving network construction method is applied to the dimensionality reduction problem. The semi-supervised paradigm is covered by extending the ease of access heuristic to the cases in which just a few labeled data samples and many unlabeled samples are available. A semi-supervised technique based on interacting forces is also proposed, for which we provide parameter heuristics and stability analysis via a Lyapunov function. Finally, an unsupervised network-based technique uses the concepts of pinning control and consensus time from dynamical processes to derive a similarity measure used to cluster data. The data is represented by a connected and sparse network in which nodes are dynamical elements. Simulations on benchmark data sets and comparisons to well-known machine learning techniques are provided for all proposed techniques. Advantages of network data representation and dynamical processes for machine learning are highlighted in all cases / A extração de conhecimento útil a partir de conjuntos de dados é um conceito chave em sistemas de informação modernos. Por conseguinte, a necessidade de técnicas eficientes para extrair o conhecimento desejado vem crescendo ao longo do tempo. Aprendizado de máquina é uma área de pesquisa dedicada ao desenvolvimento de técnicas capazes de permitir que uma máquina \"aprenda\" a partir de conjuntos de dados. Muitas técnicas já foram propostas, mas ainda há questões a serem reveladas especialmente em pesquisas interdisciplinares. Nesta tese, exploramos as vantagens da representação de dados em rede para desenvolver técnicas de aprendizado de máquina baseadas em processos dinâmicos em redes. A representação em rede unifica a estrutura, a dinâmica e as funções do sistema representado e, portanto, é capaz de capturar as relações espaciais, topológicas e funcionais dos conjuntos de dados sob análise. Desenvolvemos técnicas baseadas em rede para os três paradigmas de aprendizado de máquina: supervisionado, semissupervisionado e não supervisionado. O processo dinâmico de passeio aleatório é utilizado para caracterizar o acesso de dados não rotulados às classes de dados configurando uma nova heurística no paradigma supervisionado, a qual chamamos de facilidade de acesso. Também propomos uma técnica de classificação de dados que combina a visão de alto nível dos dados, por meio da caracterização topológica de rede, com relações de baixo nível, por meio de medidas de similaridade, em uma estrutura geral. Ainda no aprendizado supervisionado, as medidas de rede modularidade e centralidade Katz são aplicadas para classificar conjuntos de múltiplas observações, e um método de construção evolutiva de rede é aplicado ao problema de redução de dimensionalidade. O paradigma semissupervisionado é abordado por meio da extensão da heurística de facilidade de acesso para os casos em que apenas algumas amostras de dados rotuladas e muitas amostras não rotuladas estão disponíveis. É também proposta uma técnica semissupervisionada baseada em forças de interação, para a qual fornecemos heurísticas para selecionar parâmetros e uma análise de estabilidade mediante uma função de Lyapunov. Finalmente, uma técnica não supervisionada baseada em rede utiliza os conceitos de controle pontual e tempo de consenso de processos dinâmicos para derivar uma medida de similaridade usada para agrupar dados. Os dados são representados por uma rede conectada e esparsa na qual os vértices são elementos dinâmicos. Simulações com dados de referência e comparações com técnicas de aprendizado de máquina conhecidas são fornecidos para todas as técnicas propostas. As vantagens da representação de dados em rede e de processos dinâmicos para o aprendizado de máquina são evidenciadas em todos os casos
|
364 |
"Resultados analíticos para as distribuições estatísticas relacionadas à caminhada determinista do turista sem memória: efeito da dimensionalidade do sistema e modelos de campo médio". / Analytical results for the statistical distribution related to a memoryless deterministic walk: Dimensionality effect and mean-field modelsTerçariol, César Augusto Sangaletti 21 December 2004 (has links)
Considere um meio caracterizado por $N$ pontos cujas coordenadas são geradas aleatoriamente de maneira uniforme nas arestas unitárias de um hipercubo $d$-dimensional. Um caminhante parte de cada ponto deste meio desordenado e se movimenta obedecendo à regra determinista de ir para o ponto mais próximo que não tenha sido visitado nos últimos $mu$ passos. Este processo foi denominado de caminhada determinista do turista. Cada trajetória gerada por esta dinâmica possui uma parte inicial não-periódica de $t$ passos (transiente) e uma parte final periódica de $p$ passos (atrator). As probabilidades de vizinhança são expressas através da fórmula de Cox, que é parametrizada pela função beta incompleta normalizada $I_d = I_{1/4}[1/2,(d+1)/2]$. Enfati-zamos aqui que a distribuição relevante é $S_{mu,d}^{(N)}(t,p)$, a distribuição conjunta de $t$ e $p$, que tem como casos particulares as distribuições marginais, previamente estudadas. O objetivo deste estudo é obter analiticamente estas distribuições para a caminhada determinista do turista sem memória no espaço euclideano, no modelo de distâncias aleatórias (que corresponde ao limite $d
ightarrow infty$) e no modelo de mapeamento aleatório (que é um caso limite das redes de Kauffman). As distribuições analíticas obtidas foram validadas através de experimentos numéricos. A distribuição conjunta de tempos de transiente e período de atratores, no limite termodinâmico para uma dimensionalidade arbitrária vale: $S_{1,d}^{(infty)}(t,p) = [Gamma(1+I_d^{-1}) cdot (t+I_d^{-1})/Gamma(t+p+I_d^{-1})] cdot delta_{p,2}$, onde $t=0,1,2,ldots,infty$; $Gamma(z)$ é a função gama e $delta_{i,j}$ é o delta de Kronecker. A caminhada determinista do turista sem memória no modelo de mapeamento aleatório produz uma distribuição de períodos não-trivial ($S_{0,rm}^{(N)}(p) propto p^{-1}$), que é obtida de $S_{0,rm}^{(N)}(t,p) = Gamma(N)/{Gamma[N+1-(t+p)]N^{t+p}}$, onde enfatizamos que o número de pontos explorados $n_e=t+p$ é a grandeza fundamental nos problemas considerados. / Consider a medium characterized by $N$ points whose coordinates are randomly generated by a uniform distribution along the unitary edges of a $d$-dimensional hypercube. A walker leaves from each point of this disordered medium and moves according to the deterministic rule to go the nearest point which has not been visited in the preceding $mu$ steps. This process has been called the deterministic tourist walk. Each trajectory generated by this dynamics has an initial non-periodic part of $t$ steps (transient) and a final periodic part of $p$ steps (attractor). The neighborhood probabilities are given by the Cox formula, which is parameterized by the normalized incomplete beta function $I_d = I_{1/4}[1/2,(d+1)/2]$. Here we stress that the relevant distribution is the joint $t$ and $p$ distribution $S_{mu,d}^{(N)}(t,p)$, which has as particular cases, the marginal distributions previously studied. The objective of this study is to obtain analytically these distributions for the memoryless deterministic tourist walk in the euclidean space, random link model (which corresponds to $d
ightarrow infty$ limit) and random map model (which is a limiting case of the Kauffman model). The obtained distributions have been validated by numerical experiments. The joint transient time and attractor period distribution in the thermodynamic limit for an arbitrary dimensionality is: $S_{1,d}^{(infty)}(t,p) = [Gamma(1+I_d^{-1}) cdot (t+I_d^{-1})/Gamma(t+p+I_d^{-1})] cdot delta_{p,2}$, where $t=0,1,2,ldots,infty$; $Gamma(z)$ is the gamma function and $delta_{i,j}$ is the Kronecker's delta. The memoryless deterministic tourist walk in the random map leads to a non-trivial cycle distribution ($S_{0,rm}^{(N)}(p) propto p^{-1}$), which is obtained from $S_{0,rm}^{(N)}(t,p) = Gamma(N)/{Gamma[N+1-(t+p)]N^{t+p}}$, where we stress that the number of explored points $n_e=t+p$ is the fundamental quantity in the considered problems.
|
365 |
Categorical structural optimization : methods and applications / Optimisation structurelle catégorique : méthodes et applicationsGao, Huanhuan 07 February 2019 (has links)
La thèse se concentre sur une recherche méthodologique sur l'optimisation structurelle catégorielle au moyen d'un apprentissage multiple. Dans cette thèse, les variables catégorielles non ordinales sont traitées comme des variables discrètes multidimensionnelles. Afin de réduire la dimensionnalité, les nombreuses techniques d'apprentissage sont introduites pour trouver la dimensionnalité intrinsèque et mapper l'espace de conception d'origine sur un espace d'ordre réduit. Les mécanismes des techniques d'apprentissage à la fois linéaires et non linéaires sont d'abord étudiés. Ensuite, des exemples numériques sont testés pour comparer les performances de nombreuses techniques d’apprentissage. Sur la base de la représentation d'ordre réduit obtenue par Isomap, les opérateurs de mutation et de croisement évolutifs basés sur les graphes sont proposés pour traiter des problèmes d'optimisation structurelle catégoriels, notamment la conception du dôme, du cadre rigide de six étages et des structures en forme de dame. Ensuite, la méthode de recherche continue consistant à déplacer des asymptotes est exécutée et fournit une solution compétitive, mais inadmissible, en quelques rares itérations. Ensuite, lors de la deuxième étape, une stratégie de recherche discrète est proposée pour rechercher de meilleures solutions basées sur la recherche de voisins. Afin de traiter le cas dans lequel les instances de conception catégorielles sont réparties sur plusieurs variétés, nous proposons une méthode d'apprentissage des variétés k-variétés basée sur l'analyse en composantes principales pondérées. / The thesis concentrates on a methodological research on categorical structural optimizationby means of manifold learning. The main difficulty of handling the categorical optimization problems lies in the description of the categorical variables: they are presented in a category and do not have any orders. Thus the treatment of the design space is a key issue. In this thesis, the non-ordinal categorical variables are treated as multi-dimensional discrete variables, thus the dimensionality of corresponding design space becomes high. In order to reduce the dimensionality, the manifold learning techniques are introduced to find the intrinsic dimensionality and map the original design space to a reduced-order space. The mechanisms of both linear and non-linear manifold learning techniques are firstly studied. Then numerical examples are tested to compare the performance of manifold learning techniques mentioned above. It is found that the PCA and MDS can only deal with linear or globally approximately linear cases. Isomap preserves the geodesic distances for non-linear manifold however, its time consuming is the most. LLE preserves the neighbour weights and can yield good results in a short time. KPCA works like a non-linear classifier and we proves why it cannot preserve distances or angles in some cases. Based on the reduced-order representation obtained by Isomap, the graph-based evolutionary crossover and mutation operators are proposed to deal with categorical structural optimization problems, including the design of dome, six-story rigid frame and dame-like structures. The results show that the proposed graph-based evolutionary approach constructed on the reduced-order space performs more efficiently than traditional methods including simplex approach or evolutionary approach without reduced-order space. In chapter 5, the LLE is applied to reduce the data dimensionality and a polynomial interpolation helps to construct the responding surface from lower dimensional representation to original data. Then the continuous search method of moving asymptotes is executed and yields a competitively good but inadmissible solution within only a few of iteration numbers. Then in the second stage, a discrete search strategy is proposed to find out better solutions based on a neighbour search. The ten-bar truss and dome structural design problems are tested to show the validity of the method. In the end, this method is compared to the Simulated Annealing algorithm and Covariance Matrix Adaptation Evolutionary Strategy, showing its better optimization efficiency. In chapter 6, in order to deal with the case in which the categorical design instances are distributed on several manifolds, we propose a k-manifolds learning method based on the Weighted Principal Component Analysis. And the obtained manifolds are integrated in the lower dimensional design space. Then the method introduced in chapter 4 is applied to solve the ten-bar truss, the dome and the dame-like structural design problems.
|
366 |
Shozo Ohmori’s 'Fancy' : A Third Mode of AwarenessLagelius, Robin January 2019 (has links)
This thesis is an investigation into the phenomenon which Shozo Ohmori (1921-1997) considered “a peculiar manner of awareness”, and to which he attributed the term ‘fancy’. The objective is to achieve an approximate understanding of Ohmori’s theory of ‘fancy’, as it relates to awareness of entities in three-dimensional space, and the extensions mentioned in his only publication in English: “Beyond Hume’s Fancy” (1974). This objective will be realized by asking three questions. The first question is how we are to understand the demarcation of the different phenomena of awareness which Ohmori identifies. The second question that this thesis asks is what applications that the phenomenon ‘fancy’ mentioned in Ohmori’s account have, as Ohmori saw it. Having answered these questions, I will then make an assessment of another salient consideration: how does Ohmori’s employment of the term ‘fancy’ relate to Hume’s employment of the same term (seeing as the name of Ohmori’s article makes such a reference). As we shall see, Ohmori is attempting to identify a more specific phenomenon than the widely discussed issue of thinking about something that is not currently perceivable in our perceptual field. The third and final question that this thesis asks is whether there are any salient issues with Ohmori’s theory of ‘fancy’ and, if so, whether those issues can be resolved. When we are aware of entities in three-dimensional space, we are subject to various mental processes. Our awareness, seemingly, uses different modes of interpretation and orientation. In other words, our ‘point of view’ (which is something that not only pertains to the use of our visual sensory organs) determines both our place and relation towards other entities. One salient issue when considering the notion of awareness is how and by which order awareness emerges. Impressions, as David Hume would call them, seemingly precede our ideas. Sense-data, as Shozo Ohmori phrased it, is unquestionably inseparable from conceptions. Our conceptions, in turn, seem to inform our perceptions with expectations and predictions of how things are. When we perceive an entity, we are ready to make judgements about its being at this moment. When we see the front of a desk, we are ready to claim awareness of said desk-front as part of a desk (which entails the ontology of a desk, namely, being a three-dimensional construction of a particular variety). In everyday situations we simply speak of such an awareness as ‘perception’ when in actuality, all we see (which constitutes the sense-data or content of a perception) is the front of a desk. It seems we cannot regard our awareness of a desk (a three-dimensional entity) as a perception simpliciter. Of course, by having a notion of what a desk is, our awareness is pregnant with a ‘conception’ in the form of an idea that is informing our awareness of said desk. But our conceptual understanding of the notion of something being a desk is not enough to explain what our awareness of a desk-at-this-moment is. At least, that is what Ohmori thought.
|
367 |
Apprentissage machine efficace : théorie et pratiqueDelalleau, Olivier 03 1900 (has links)
Malgré des progrès constants en termes de capacité de calcul, mémoire et quantité de données disponibles, les algorithmes d'apprentissage machine doivent se montrer efficaces dans l'utilisation de ces ressources. La minimisation des coûts est évidemment un facteur important, mais une autre motivation est la recherche de mécanismes d'apprentissage capables de reproduire le comportement d'êtres intelligents. Cette thèse aborde le problème de l'efficacité à travers plusieurs articles traitant d'algorithmes d'apprentissage variés : ce problème est vu non seulement du point de vue de l'efficacité computationnelle (temps de calcul et mémoire utilisés), mais aussi de celui de l'efficacité statistique (nombre d'exemples requis pour accomplir une tâche donnée).
Une première contribution apportée par cette thèse est la mise en lumière d'inefficacités statistiques dans des algorithmes existants. Nous montrons ainsi que les arbres de décision généralisent mal pour certains types de tâches (chapitre 3), de même que les algorithmes classiques d'apprentissage semi-supervisé à base de graphe (chapitre 5), chacun étant affecté par une forme particulière de la malédiction de la dimensionalité. Pour une certaine classe de réseaux de neurones, appelés réseaux sommes-produits, nous montrons qu'il peut être exponentiellement moins efficace de représenter certaines fonctions par des réseaux à une seule couche cachée, comparé à des réseaux profonds (chapitre 4). Nos analyses permettent de mieux comprendre certains problèmes intrinsèques liés à ces algorithmes, et d'orienter la recherche dans des directions qui pourraient permettre de les résoudre.
Nous identifions également des inefficacités computationnelles dans les algorithmes d'apprentissage semi-supervisé à base de graphe (chapitre 5), et dans l'apprentissage de mélanges de Gaussiennes en présence de valeurs manquantes (chapitre 6). Dans les deux cas, nous proposons de nouveaux algorithmes capables de traiter des ensembles de données significativement plus grands. Les deux derniers chapitres traitent de l'efficacité computationnelle sous un angle différent. Dans le chapitre 7, nous analysons de manière théorique un algorithme existant pour l'apprentissage efficace dans les machines de Boltzmann restreintes (la divergence contrastive), afin de mieux comprendre les raisons qui expliquent le succès de cet algorithme. Finalement, dans le chapitre 8 nous présentons une application de l'apprentissage machine dans le domaine des jeux vidéo, pour laquelle le problème de l'efficacité computationnelle est relié à des considérations d'ingénierie logicielle et matérielle, souvent ignorées en recherche mais ô combien importantes en pratique. / Despite constant progress in terms of available computational power, memory and amount of data, machine learning algorithms need to be efficient in how they use them. Although minimizing cost is an obvious major concern, another motivation is to attempt to design algorithms that can learn as efficiently as intelligent species. This thesis tackles the problem of efficient learning through various papers dealing with a wide range of machine learning algorithms: this topic is seen both from the point of view of computational efficiency (processing power and memory required by the algorithms) and of statistical efficiency (n
umber of samples necessary to solve a given learning task).The first contribution of this thesis is in shedding light on various statistical inefficiencies in existing algorithms. Indeed, we show that decision trees do not generalize well on tasks with some particular properties (chapter 3), and that a similar flaw affects typical graph-based semi-supervised learning algorithms (chapter 5). This flaw is a form of curse of dimensionality that is specific to each of these algorithms. For a subclass of neural networks, called sum-product networks, we prove that using networks with a single hidden layer can be exponentially less efficient than when using deep networks (chapter 4). Our analyses help better understand some inherent flaws found in these algorithms, and steer research towards approaches that may potentially overcome them.
We also exhibit computational inefficiencies in popular graph-based semi-supervised learning algorithms (chapter 5) as well as in the learning of mixtures of Gaussians with missing data (chapter 6). In both cases we propose new algorithms that make it possible to scale to much larger datasets. The last two chapters also deal with computational efficiency, but in different ways. Chapter 7 presents a new view on the contrastive divergence algorithm (which has been used for efficient training of restricted Boltzmann machines). It provides additional insight on the reasons why this algorithm has been so successful. Finally, in chapter 8 we describe an application of machine learning to video games, where computational efficiency is tied to software and hardware engineering constraints which, although often ignored in research papers, are ubiquitous in practice.
|
368 |
Efficient estimation using the characteristic function : theory and applications with high frequency dataKotchoni, Rachidi 05 1900 (has links)
Nous abordons deux sujets distincts dans cette thèse: l'estimation de la volatilité des prix d'actifs financiers à partir des données à haute fréquence, et l'estimation des paramétres d'un processus aléatoire à partir de sa fonction caractéristique.
Le chapitre 1 s'intéresse à l'estimation de la volatilité des prix d'actifs. Nous supposons que les données à haute fréquence disponibles sont entachées de bruit de microstructure. Les propriétés que l'on prête au bruit sont déterminantes dans le choix de l'estimateur de la volatilité. Dans ce chapitre, nous spécifions un nouveau modèle dynamique pour le bruit de microstructure qui intègre trois propriétés importantes: (i) le bruit peut être autocorrélé, (ii) le retard maximal au delà duquel l'autocorrélation est nulle peut être une fonction croissante de la fréquence journalière d'observations; (iii) le bruit peut avoir une composante correlée avec le rendement efficient. Cette dernière composante est alors dite endogène. Ce modèle se différencie de ceux existant en ceci qu'il implique que l'autocorrélation d'ordre 1 du bruit converge vers 1 lorsque la fréquence journalière d'observation tend vers l'infini.
Nous utilisons le cadre semi-paramétrique ainsi défini pour dériver un nouvel estimateur de la volatilité intégrée baptisée "estimateur shrinkage". Cet estimateur se présente sous la forme d'une combinaison linéaire optimale de deux estimateurs aux propriétés différentes, l'optimalité étant défini en termes de minimisation de la variance. Les simulations indiquent que l'estimateur shrinkage a une variance plus petite que le meilleur des deux estimateurs initiaux. Des estimateurs sont également proposés pour les paramètres du modèle de microstructure. Nous clôturons ce chapitre par une application empirique basée sur des actifs du Dow Jones Industrials. Les résultats indiquent qu'il est pertinent de tenir compte de la dépendance temporelle du bruit de microstructure dans le processus d'estimation de la volatilité.
Les chapitres 2, 3 et 4 s'inscrivent dans la littérature économétrique qui traite de la méthode des moments généralisés. En effet, on rencontre en finance des modèles dont la fonction de vraisemblance n'est pas connue. On peut citer en guise d'exemple la loi stable ainsi que les modèles de diffusion observés en temps discrets. Les méthodes d'inférence basées sur la fonction caractéristique peuvent être envisagées dans ces cas. Typiquement, on spécifie une condition de moment basée sur la différence entre la fonction caractéristique (conditionnelle) théorique et sa contrepartie empirique. Le défit ici est d'exploiter au mieux le continuum de conditions de moment ainsi spécifié pour atteindre la même efficacité que le maximum de vraisemblance dans les inférences.
Ce défit a été relevé par Carrasco et Florens (2000) qui ont proposé la procédure CGMM (continuum GMM). La fonction objectif que ces auteurs proposent est une forme quadratique hilbertienne qui fait intervenir l'opérateur inverse de covariance associé au continuum de condition de moments. Cet opérateur inverse est régularisé à la Tikhonov pour en assurer l'existence globale et la continuité. Carrasco et Florens (2000) ont montré que l'estimateur obtenu en minimisant cette forme quadratique est asymptotiquement aussi efficace que l'estimateur du maximum de vraisemblance si le paramètre de régularisation (α) tend vers zéro lorsque la taille de l'échatillon tend vers l'infini. La nature de la fonction objectif du CGMM soulève deux questions importantes. La première est celle de la calibration de α en pratique, et la seconde est liée à la présence d'intégrales multiples dans l'expression de la fonction objectif. C'est à ces deux problématiques qu'essayent de répondent les trois derniers chapitres de la présente thèse.
Dans le chapitre 2, nous proposons une méthode de calibration de α basée sur la minimisation de l'erreur quadratique moyenne (EQM) de l'estimateur. Nous suivons une approche similaire à celle de Newey et Smith (2004) pour calculer un développement d'ordre supérieur de l'EQM de l'estimateur CGMM de sorte à pouvoir examiner sa dépendance en α en échantillon fini. Nous proposons ensuite deux méthodes pour choisir α en pratique. La première se base sur le développement de l'EQM, et la seconde se base sur des simulations Monte Carlo. Nous montrons que la méthode Monte Carlo délivre un estimateur convergent de α optimal. Nos simulations confirment la pertinence de la calibration de α en pratique.
Le chapitre 3 essaye de vulgariser la théorie du chapitre 2 pour les modèles univariés ou bivariés. Nous commençons par passer en revue les propriétés de convergence et de normalité asymptotique de l'estimateur CGMM. Nous proposons ensuite des recettes numériques pour l'implémentation. Enfin, nous conduisons des simulations Monte Carlo basée sur la loi stable. Ces simulations démontrent que le CGMM est une méthode fiable d'inférence. En guise d'application empirique, nous estimons par CGMM un modèle de variance autorégressif Gamma. Les résultats d'estimation confirment un résultat bien connu en finance: le rendement est positivement corrélé au risque espéré et négativement corrélé au choc sur la volatilité.
Lorsqu'on implémente le CGMM, une difficulté majeure réside dans l'évaluation numérique itérative des intégrales multiples présentes dans la fonction objectif. Les méthodes de quadrature sont en principe parmi les plus précises que l'on puisse utiliser dans le présent contexte. Malheureusement, le nombre de points de quadrature augmente exponentiellement en fonction de la dimensionalité (d) des intégrales. L'utilisation du CGMM devient pratiquement impossible dans les modèles multivariés et non markoviens où d≥3. Dans le chapitre 4, nous proposons une procédure alternative baptisée "reéchantillonnage dans le domaine fréquentielle" qui consiste à fabriquer des échantillons univariés en prenant une combinaison linéaire des éléments du vecteur initial, les poids de la combinaison linéaire étant tirés aléatoirement dans un sous-espace normalisé de ℝ^{d}. Chaque échantillon ainsi généré est utilisé pour produire un estimateur du paramètre d'intérêt. L'estimateur final que nous proposons est une combinaison linéaire optimale de tous les estimateurs ainsi obtenus. Finalement, nous proposons une étude par simulation et une application empirique basées sur des modèles autorégressifs Gamma.
Dans l'ensemble, nous faisons une utilisation intensive du bootstrap, une technique selon laquelle les propriétés statistiques d'une distribution inconnue peuvent être estimées à partir d'un estimé de cette distribution. Nos résultats empiriques peuvent donc en principe être améliorés en faisant appel aux connaissances les plus récentes dans le domaine du bootstrap. / In estimating the integrated volatility of financial assets using noisy high frequency data, the time series properties assumed for the microstructure noise determines the proper choice of the volatility estimator. In the first chapter of the current thesis, we propose a new model for the microstructure noise with three important features. First of all, our model assumes that the noise is L-dependent. Secondly, the memory lag L is allowed to increase with the sampling frequency. And thirdly, the noise may include an endogenous part, that is, a piece that is correlated with the latent returns. The main difference between this microstructure model and existing ones is that it implies a first order autocorrelation that converges to 1 as the sampling frequency goes to infinity.
We use this semi-parametric model to derive a new shrinkage estimator for the integrated volatility. The proposed estimator makes an optimal signal-to-noise trade-off by combining a consistent estimators with an inconsistent one. Simulation results show that the shrinkage estimator behaves better than the best of the two combined ones. We also propose some estimators for the parameters of the noise model. An empirical study based on stocks listed in the Dow Jones Industrials shows the relevance of accounting for possible time dependence in the noise process.
Chapters 2, 3 and 4 pertain to the generalized method of moments based on the characteristic function. In fact, the likelihood functions of many financial econometrics models are not known in close form. For example, this is the case for the stable distribution and a discretely observed continuous time model. In these cases, one may estimate the parameter of interest by specifying a moment condition based on the difference between the theoretical (conditional) characteristic function and its empirical counterpart. The challenge is then to exploit the whole continuum of moment conditions hence defined to achieve the maximum likelihood efficiency.
This problem has been solved in Carrasco and Florens (2000) who propose the CGMM procedure. The objective function of the CGMM is a quadrqtic form on the Hilbert space defined by the moment function. That objective function depends on a Tikhonov-type regularized inverse of the covariance operator associated with the moment function. Carrasco and Florens (2000) have shown that the estimator obtained by minimizing the proposed objective function is asymptotically as efficient as the maximum likelihood estimator provided that the regularization parameter (α) converges to zero as the sample size goes to infinity. However, the nature of this objective function raises two important questions. First of all, how do we select α in practice? And secondly, how do we implement the CGMM when the multiplicity (d) of the integrals embedded in the objective-function d is large. These questions are tackled in the last three chapters of the thesis.
In Chapter 2, we propose to choose α by minimizing the approximate mean square error (MSE) of the estimator. Following an approach similar to Newey and Smith (2004), we derive a higher-order expansion of the estimator from which we characterize the finite sample dependence of the MSE on α. We provide two data-driven methods for selecting the regularization parameter in practice. The first one relies on the higher-order expansion of the MSE whereas the second one uses only simulations. We show that our simulation technique delivers a consistent estimator of α. Our Monte Carlo simulations confirm the importance of the optimal selection of α.
The goal of Chapter 3 is to illustrate how to efficiently implement the CGMM for d≤2. To start with, we review the consistency and asymptotic normality properties of the CGMM estimator. Next we suggest some numerical recipes for its implementation. Finally, we carry out a simulation study with the stable distribution that confirms the accuracy of the CGMM as an inference method. An empirical application based on the autoregressive variance Gamma model led to a well-known conclusion: investors require a positive premium for bearing the expected risk while a negative premium is attached to the unexpected risk.
In implementing the characteristic function based CGMM, a major difficulty lies in the evaluation of the multiple integrals embedded in the objective function. Numerical quadratures are among the most accurate methods that can be used in the present context. Unfortunately, the number of quadrature points grows exponentially with d. When the data generating process is Markov or dependent, the accurate implementation of the CGMM becomes roughly unfeasible when d≥3. In Chapter 4, we propose a strategy that consists in creating univariate samples by taking a linear combination of the elements of the original vector process. The weights of the linear combinations are drawn from a normalized set of ℝ^{d}. Each univariate index generated in this way is called a frequency domain bootstrap sample that can be used to compute an estimator of the parameter of interest. Finally, all the possible estimators obtained in this fashion can be aggregated to obtain the final estimator. The optimal aggregation rule is discussed in the paper. The overall method is illustrated by a simulation study and an empirical application based on autoregressive Gamma models.
This thesis makes an extensive use of the bootstrap, a technique according to which the statistical properties of an unknown distribution can be estimated from an estimate of that distribution. It is thus possible to improve our simulations and empirical results by using the state-of-the-art refinements of the bootstrap methodology. / The attached file is created with Scientific Workplace Latex
|
369 |
Robot navigation in sensor spaceKeeratipranon, Narongdech January 2009 (has links)
This thesis investigates the problem of robot navigation using only landmark bearings. The proposed system allows a robot to move to a ground target location specified by the sensor values observed at this ground target posi- tion. The control actions are computed based on the difference between the current landmark bearings and the target landmark bearings. No Cartesian coordinates with respect to the ground are computed by the control system. The robot navigates using solely information from the bearing sensor space. Most existing robot navigation systems require a ground frame (2D Cartesian coordinate system) in order to navigate from a ground point A to a ground point B. The commonly used sensors such as laser range scanner, sonar, infrared, and vision do not directly provide the 2D ground coordi- nates of the robot. The existing systems use the sensor measurements to localise the robot with respect to a map, a set of 2D coordinates of the objects of interest. It is more natural to navigate between the points in the sensor space corresponding to A and B without requiring the Cartesian map and the localisation process. Research on animals has revealed how insects are able to exploit very limited computational and memory resources to successfully navigate to a desired destination without computing Cartesian positions. For example, a honeybee balances the left and right optical flows to navigate in a nar- row corridor. Unlike many other ants, Cataglyphis bicolor does not secrete pheromone trails in order to find its way home but instead uses the sun as a compass to keep track of its home direction vector. The home vector can be inaccurate, so the ant also uses landmark recognition. More precisely, it takes snapshots and compass headings of some landmarks. To return home, the ant tries to line up the landmarks exactly as they were before it started wandering. This thesis introduces a navigation method based on reflex actions in sensor space. The sensor vector is made of the bearings of some landmarks, and the reflex action is a gradient descent with respect to the distance in sensor space between the current sensor vector and the target sensor vec- tor. Our theoretical analysis shows that except for some fully characterized pathological cases, any point is reachable from any other point by reflex action in the bearing sensor space provided the environment contains three landmarks and is free of obstacles. The trajectories of a robot using reflex navigation, like other image- based visual control strategies, do not correspond necessarily to the shortest paths on the ground, because the sensor error is minimized, not the moving distance on the ground. However, we show that the use of a sequence of waypoints in sensor space can address this problem. In order to identify relevant waypoints, we train a Self Organising Map (SOM) from a set of observations uniformly distributed with respect to the ground. This SOM provides a sense of location to the robot, and allows a form of path planning in sensor space. The navigation proposed system is analysed theoretically, and evaluated both in simulation and with experiments on a real robot.
|
370 |
"Resultados analíticos para as distribuições estatísticas relacionadas à caminhada determinista do turista sem memória: efeito da dimensionalidade do sistema e modelos de campo médio". / Analytical results for the statistical distribution related to a memoryless deterministic walk: Dimensionality effect and mean-field modelsCésar Augusto Sangaletti Terçariol 21 December 2004 (has links)
Considere um meio caracterizado por $N$ pontos cujas coordenadas são geradas aleatoriamente de maneira uniforme nas arestas unitárias de um hipercubo $d$-dimensional. Um caminhante parte de cada ponto deste meio desordenado e se movimenta obedecendo à regra determinista de ir para o ponto mais próximo que não tenha sido visitado nos últimos $mu$ passos. Este processo foi denominado de caminhada determinista do turista. Cada trajetória gerada por esta dinâmica possui uma parte inicial não-periódica de $t$ passos (transiente) e uma parte final periódica de $p$ passos (atrator). As probabilidades de vizinhança são expressas através da fórmula de Cox, que é parametrizada pela função beta incompleta normalizada $I_d = I_{1/4}[1/2,(d+1)/2]$. Enfati-zamos aqui que a distribuição relevante é $S_{mu,d}^{(N)}(t,p)$, a distribuição conjunta de $t$ e $p$, que tem como casos particulares as distribuições marginais, previamente estudadas. O objetivo deste estudo é obter analiticamente estas distribuições para a caminhada determinista do turista sem memória no espaço euclideano, no modelo de distâncias aleatórias (que corresponde ao limite $d
ightarrow infty$) e no modelo de mapeamento aleatório (que é um caso limite das redes de Kauffman). As distribuições analíticas obtidas foram validadas através de experimentos numéricos. A distribuição conjunta de tempos de transiente e período de atratores, no limite termodinâmico para uma dimensionalidade arbitrária vale: $S_{1,d}^{(infty)}(t,p) = [Gamma(1+I_d^{-1}) cdot (t+I_d^{-1})/Gamma(t+p+I_d^{-1})] cdot delta_{p,2}$, onde $t=0,1,2,ldots,infty$; $Gamma(z)$ é a função gama e $delta_{i,j}$ é o delta de Kronecker. A caminhada determinista do turista sem memória no modelo de mapeamento aleatório produz uma distribuição de períodos não-trivial ($S_{0,rm}^{(N)}(p) propto p^{-1}$), que é obtida de $S_{0,rm}^{(N)}(t,p) = Gamma(N)/{Gamma[N+1-(t+p)]N^{t+p}}$, onde enfatizamos que o número de pontos explorados $n_e=t+p$ é a grandeza fundamental nos problemas considerados. / Consider a medium characterized by $N$ points whose coordinates are randomly generated by a uniform distribution along the unitary edges of a $d$-dimensional hypercube. A walker leaves from each point of this disordered medium and moves according to the deterministic rule to go the nearest point which has not been visited in the preceding $mu$ steps. This process has been called the deterministic tourist walk. Each trajectory generated by this dynamics has an initial non-periodic part of $t$ steps (transient) and a final periodic part of $p$ steps (attractor). The neighborhood probabilities are given by the Cox formula, which is parameterized by the normalized incomplete beta function $I_d = I_{1/4}[1/2,(d+1)/2]$. Here we stress that the relevant distribution is the joint $t$ and $p$ distribution $S_{mu,d}^{(N)}(t,p)$, which has as particular cases, the marginal distributions previously studied. The objective of this study is to obtain analytically these distributions for the memoryless deterministic tourist walk in the euclidean space, random link model (which corresponds to $d
ightarrow infty$ limit) and random map model (which is a limiting case of the Kauffman model). The obtained distributions have been validated by numerical experiments. The joint transient time and attractor period distribution in the thermodynamic limit for an arbitrary dimensionality is: $S_{1,d}^{(infty)}(t,p) = [Gamma(1+I_d^{-1}) cdot (t+I_d^{-1})/Gamma(t+p+I_d^{-1})] cdot delta_{p,2}$, where $t=0,1,2,ldots,infty$; $Gamma(z)$ is the gamma function and $delta_{i,j}$ is the Kronecker's delta. The memoryless deterministic tourist walk in the random map leads to a non-trivial cycle distribution ($S_{0,rm}^{(N)}(p) propto p^{-1}$), which is obtained from $S_{0,rm}^{(N)}(t,p) = Gamma(N)/{Gamma[N+1-(t+p)]N^{t+p}}$, where we stress that the number of explored points $n_e=t+p$ is the fundamental quantity in the considered problems.
|
Page generated in 0.0726 seconds