1 |
Reinforcement learning with time perceptionLiu, Chong January 2012 (has links)
Classical value estimation reinforcement learning algorithms do not perform very well in dynamic environments. On the other hand, the reinforcement learning of animals is quite flexible: they can adapt to dynamic environments very quickly and deal with noisy inputs very effectively. One feature that may contribute to animals' good performance in dynamic environments is that they learn and perceive the time to reward. In this research, we attempt to learn and perceive the time to reward and explore situations where the learned time information can be used to improve the performance of the learning agent in dynamic environments. The type of dynamic environments that we are interested in is that type of switching environment which stays the same for a long time, then changes abruptly, and then holds for a long time before another change. The type of dynamics that we mainly focus on is the time to reward, though we also extend the ideas to learning and perceiving other criteria of optimality, e.g. the discounted return, so that they can still work even when the amount of reward may also change. Specifically, both the mean and variance of the time to reward are learned and then used to detect changes in the environment and to decide whether the agent should give up a suboptimal action. When a change in the environment is detected, the learning agent responds specifically to the change in order to recover quickly from it. When it is found that the current action is still worse than the optimal one, the agent gives up this time's exploration of the action and then remakes its decision in order to avoid longer than necessary exploration. The results of our experiments using two real-world problems show that they have effectively sped up learning, reduced the time taken to recover from environmental changes, and improved the performance of the agent after the learning converges in most of the test cases compared with classical value estimation reinforcement learning algorithms. In addition, we have successfully used spiking neurons to implement various phenomena of classical conditioning, the simplest form of animal reinforcement learning in dynamic environments, and also pointed out a possible implementation of instrumental conditioning and general reinforcement learning using similar models.
|
2 |
Propriétés algébriques et analytiques de certaines suites indexées par les nombres premiers / Algebraic and analytic properties of some sequences over prime numbersDevin, Lucile 26 June 2017 (has links)
Dans la première partie de cette thèse, on s'intéresse à la suite NX(p) [mod p] où X est un schéma séparé réduit de type fini sur Z,et pour tout p premier, NX(p) est le nombre de Fp-points de la réduction modulo p de X.Sous certaines hypothèses sur la géométrie de X, on donne une condition simple pour garantir que cette suite diffèreen une densité positive de coordonnées de la suite identiquement nulle,ou plus généralement de suites dont les coordonnées sont obtenues par réduction modulo p d'un nombre fini d'entiers.Dans le cas où X parcourt une famille de courbes hyperelliptiques, on donne une borne en moyenne sur le plus petit premier p pour lequel NX (p) [mod p] n'est pas dans un certain ensemble de valeurs fixées.La seconde partie est dédiée à des généralisations de la notion de biais de Chebyshev.On se donne une fonction L vérifiant certaines propriétés analytiquesgénéralisant celles vérifiées par les fonctions L de Dirichlet.On s'intéresse à la suite des coefficients de Fourier a_p pour p premier.Plus précisément on étudie le signe de la fonction sommatoire des coefficients de Fourier de la fonction L.On montre sous des conditions classiques que cette fonction admet une distribution logarithmique limite.Sous des hypothèses supplémentaires on obtient de bonnes propriétés telles que la régularité, la symétrie et des informations sur le support de cette distribution. / In the first part of this Thesis, we study the sequence NX (p) [mod p] where X is a reduced separated scheme of finite type over Z,and NX (p) is the number of Fp-points of the reduction modulo p of X, for every prime p. Under some hypotheses on the geometry of X, we give a simple condition to ensure that this sequence is distinctat a positive proportion of indices from the zero sequence,or generalizations obtained by reduction modulo p of finitely many integers.We give a bound on average over a family of hyperelliptic curves for the least prime p such that NX (p) [mod p] avoids the reductionmodulo p of finitely many fixed integers.The second part deals with generalizations of Chebyshev’s bias.We consider an L-function satisfying some analytic properties that generalize those satisfied by Dirichlet L-functions.We study the sequence of coefficients a_p as p runs through the set of prime numbers.Precisely, we study the sign of the summatory function of the Fourier coefficients of the L-function.Under some classical conditions, we show that this function admits a limiting logarithmic distribution.Under stronger hypotheses, we prove regularity, symmetry and get information about the support of this distribution.
|
3 |
Prime number racesHaddad, Tony 08 1900 (has links)
Sous l’hypothèse de Riemann généralisée et l’hypothèse d’indépendance linéaire, Rubinstein
et Sarnak ont prouvé que les valeurs de x > 1 pour lesquelles nous avons plus de nombres
premiers de la forme 4n + 3 que de nombres premiers de la forme 4n + 1 en dessous de
x ont une densité logarithmique d’environ 99,59%. En général, l’étude de la différence
#{p < x : p dans A} − #{p < x : p dans B} pour deux sous-ensembles de nombres premiers A et
B s’appelle la course entre les nombres premiers de A et de B. Dans ce mémoire, nous
cherchons ultimement à analyser d’un point de vue numérique et statistique la course entre
les nombres premiers p tels que 2p + 1 est aussi premier (aussi appelés nombres premiers de
Sophie Germain) et les nombres premiers p tels que 2p − 1 est aussi premier. Pour ce faire,
nous présentons au préalable l’analyse de Rubinstein et Sarnak pour pouvoir repérer d’où
vient le biais dans la course entre les nombres premiers 1 (mod 4) et les nombres premiers
3 (mod 4) et émettons une conjecture sur la distribution des nombres premiers de Sophie
Germain. / Under the Generalized Riemann Hypothesis and the Linear Independence Hypothesis, Rubinstein
and Sarnak proved that the values of x which have more prime numbers less than
or equal to x of the form 4n + 3 than primes of the form 4n + 1 have a logarithmic density
of approximately 99.59%. In general, the study of the difference #{p < x : p in A} − #{p < x : p in B}
for two subsets of the primes A and B is called the prime number race between A and B. In
this thesis, we will analyze the prime number race between the primes p such that 2p + 1 is
also prime (these primes are called the Sophie Germain primes) and the primes p such that
2p − 1 is also prime. To understand this, we first present Rubinstein and Sarnak’s analysis
to understand where the bias between primes that are 1 (mod 4) and the ones that are
3 (mod 4) comes from and give a conjecture on the distribution of Sophie Germain primes.
|
Page generated in 0.0388 seconds