Global ETD Search

241	Software lock elision for x86 machine code Roy, Amitabha January 2011 (has links) More than a decade after becoming a topic of intense research there is no transactional memory hardware nor any examples of software transactional memory use outside the research community. Using software transactional memory in large pieces of software needs copious source code annotations and often means that standard compilers and debuggers can no longer be used. At the same time, overheads associated with software transactional memory fail to motivate programmers to expend the needed effort to use software transactional memory. The only way around the overheads in the case of general unmanaged code is the anticipated availability of hardware support. On the other hand, architects are unwilling to devote power and area budgets in mainstream microprocessors to hardware transactional memory, pointing to transactional memory being a 'niche' programming construct. A deadlock has thus ensued that is blocking transactional memory use and experimentation in the mainstream. This dissertation covers the design and construction of a software transactional memory runtime system called SLE_x86 that can potentially break this deadlock by decoupling transactional memory from programs using it. Unlike most other STM designs, the core design principle is transparency rather than performance. SLE_x86 operates at the level of x86 machine code, thereby becoming immediately applicable to binaries for the popular x86 architecture. The only requirement is that the binary synchronise using known locking constructs or calls such as those in Pthreads or OpenMPlibraries. SLE_x86 provides speculative lock elision (SLE) entirely in software, executing critical sections in the binary using transactional memory. Optionally, the critical sections can also be executed without using transactions by acquiring the protecting lock. The dissertation makes a careful analysis of the impact on performance due to the demands of the x86 memory consistency model and the need to transparently instrument x86 machine code. It shows that both of these problems can be overcome to reach a reasonable level of performance, where transparent software transactional memory can perform better than a lock. SLE_x86 can ensure that programs are ready for transactional memory in any form, without being explicitly written for it. 005.3
242	A Model for Managing Data Integrity Mallur, Vikram January 2011 (has links) Consistent, accurate and timely data are essential to the functioning of a modern organization. Managing the integrity of an organization’s data assets in a systematic manner is a challenging task in the face of continuous update, transformation and processing to support business operations. Classic approaches to constraint-based integrity focus on logical consistency within a database and reject any transaction that violates consistency, but leave unresolved how to fix or manage violations. More ad hoc approaches focus on the accuracy of the data and attempt to clean data assets after the fact, using queries to flag records with potential violations and using manual efforts to repair. Neither approach satisfactorily addresses the problem from an organizational point of view. In this thesis, we provide a conceptual model of constraint-based integrity management (CBIM) that flexibly combines both approaches in a systematic manner to provide improved integrity management. We perform a gap analysis that examines the criteria that are desirable for efficient management of data integrity. Our approach involves creating a Data Integrity Zone and an On Deck Zone in the database for separating the clean data from data that violates integrity constraints. We provide tool support for specifying constraints in a tabular form and generating triggers that flag violations of dependencies. We validate this by performing case studies on two systems used to manage healthcare data: PAL-IS and iMED-Learn. Our case studies show that using views to implement the zones does not cause any significant increase in the running time of a process. ad hoc methods constraints data dependency data processing data quality database integrity logical consistency
243	CSR-implementeringens inverkan på bolags finansiella prestation : En kvantitativ studie som belyser vikten av takten, konsekvensen och vägen i ett bolags CSR-arbete Funke Jansson, Matilda, Forsberg, Sofia January 2017 (has links) Syfte: Ett flertal studier har undersökt om socialt ansvarstagande (CSR) är lönsamt, men inga entydiga bevis finns. Tidigare undersökningar har dock implicit antagit att CSR-aktiviteter är lönsamma oavsett hur de implementeras. Vi ifrågasätter detta antagande och undersöker hur implementeringen av CSR påverkar dess lönsamhet. Vi undersöker om variablerna takt, konsekvens och väg förklarar relationen mellan CSR och bolagens lönsamhet. Metod: Studien är utförd enligt ett positivistiskt perspektiv med en hypotetisk-deduktiv ansats och en longitudinell design med fem års observationer. Data har insamlats från Thomson Reuters Eikon och Thomson Reuters Datastream. Data har analyserats med statistiska metoder. Resultat & slutsats: Studien visaratt implementeringen av CSR påverkar bolags CFP och studien diskuterar de teoretiska förutsättningarna för sådana effekter. Bolag kan påverka lönsamheten av CSRgenom att implementera den på rätt sätt. Förslag till fortsatt forskning: Ett förslag till fortsatta studier av CSR-implementering är att inte bara mätabolagens redovisningsmässiga mått utan även se på de marknadsmässiga. Att mäta bolag CSR-aktiviteter med start i tidigare årtal än vi gjort kan dessutom öka möjligheten att mäta effekter av implementering. Uppsatsens bidrag: Föreliggande studie utvidgar CSR-litteraturen och redovisningslitteraturen genom att (1) diskutera hur implementering av CSR kan påverka dess lönsamhet och (2) empiriskt undersöka effekten av implementering av CSR på lönsamhet. Det praktiska bidraget är att kunskap om hur implementering av CSR kan påverka lönsamheten är till nytta för all implementering av CSR och för behovet av lagstiftning om socialt ansvarstagande. / Aim: A number of studies have investigated whether social responsibility (CSR) is profitable, but no unambiguous evidence exists. However, previous surveys have implicitly assumed that CSR activities are profitable regardless of how they are implemented. We question this assumption and investigate how the implementation of CSR affects its profitability. We investigate whether the variables pace, consistency and path explain the relationship between CSR and the company's profitability. Method: The study is conducted in a positivist perspective with a hypothetical deduction. The procedure is quantitative and secondary data has been collected from Thomson Reuters Eikon and Thomson Reuters Datastream. The data from this longitudinal study has been analyzed and processed in MiniTab. Result & Conclusions: The study is conducted in a positivist perspective with a hypothetical deduction and a longitudinal design with fiveyear’s observations. Data has been collected from Thomson Reuters Eikon and Thomson Reuters Datastream. Data has been analyzed using statistical methods. Suggestion for future research: A proposal for further study of CSR implementation is to not only measure the company's accounting standards but also look at the market-related. Measuring companies CSR activities starting in earlier years than we did can also increase the ability to measure effects of implementation. Contribution of the thesis: Present study extends the CSR literature and accounting literature by (1) discussing how implementation of CSR can affect its profitability and (2) empirically investigates the impact of CSR implementation on profitability. The practical contribution is that knowledge about how the implementation of CSR can affect profitability is beneficial to all implementation of CSR and the need for social responsibility legislation. CSR CFP Pace Consistency Path CSR CFP Takt Konsekvens Väg Business Administration Företagsekonomi
244	Managing consistency for big data applications : tradeoffs and self-adaptiveness / Gérer la cohérence pour les applications big data : compromis et auto-adaptabilité Chihoub, Houssem Eddine 10 December 2013 (has links) Dans l’ère de Big Data, les applications intensives en données gèrent des volumes de données extrêmement grand. De plus, ils ont besoin de temps de traitement rapide. Une grande partie de ces applications sont déployées sur des infrastructures cloud. Ceci est afin de bénéficier de l’élasticité des clouds, les déploiements sur demande et les coûts réduits strictement relatifs à l’usage. Dans ce contexte, la réplication est un moyen essentiel dans le cloud afin de surmonter les défis de Big Data. En effet, la réplication fournit les moyens pour assurer la disponibilité des données à travers de nombreuses copies de données, des accès plus rapide aux copies locales, la tolérance aux fautes. Cependant, la réplication introduit le problème majeur de la cohérence de données. La gestion de la cohérence est primordiale pour les systèmes de Big Data. Les modèles à cohérence forte présentent de grandes limitations aux aspects liées aux performances et au passage à l’échelle à cause des besoins de synchronisation. En revanche, les modèles à cohérence faible et éventuelle promettent de meilleures performances ainsi qu’une meilleure disponibilité de données. Toutefois, ces derniers modèles peuvent tolérer, sous certaines conditions, trop d’incohérence temporelle. Dans le cadre du travail de cette thèse, on s'adresse particulièrement aux problèmes liés aux compromis de cohérence dans les systèmes à large échelle de Big Data. Premièrement, on étudie la gestion de cohérence au niveau du système de stockage. On introduit un modèle de cohérence auto-adaptative (nommé Harmony). Ce modèle augmente et diminue de manière automatique le niveau de cohérence et le nombre de copies impliquées dans les opérations. Ceci permet de fournir de meilleures performances toute en satisfaisant les besoins de cohérence de l’application. De plus, on introduit une étude détaillée sur l'impact de la gestion de la cohérence sur le coût financier dans le cloud. On emploi cette étude afin de proposer une gestion de cohérence efficace qui réduit les coûts. Dans une troisième direction, on étudie les effets de gestion de cohérence sur la consommation en énergie des systèmes de stockage distribués. Cette étude nous mène à analyser les gains potentiels des reconfigurations adaptatives des systèmes de stockage en matière de réduction de la consommation. Afin de compléter notre travail au niveau système de stockage, on s'adresse à la gestion de cohérence au niveau de l’application. Les applications de Big Data sont de nature différente et ont des besoins de cohérence différents. Par conséquent, on introduit une approche de modélisation du comportement de l’application lors de ses accès aux données. Le modèle résultant facilite la compréhension des besoins en cohérence. De plus, ce modèle est utilisé afin de délivrer une cohérence customisée spécifique à l’application. / In the era of Big Data, data-intensive applications handle extremely large volumes of data while requiring fast processing times. A large number of such applications run in the cloud in order to benefit from cloud elasticity, easy on-demand deployments, and cost-efficient Pays-As-You-Go usage. In this context, replication is an essential feature in the cloud in order to deal with Big Data challenges. Therefore, replication therefore, enables high availability through multiple replicas, fast data access to local replicas, fault tolerance, and disaster recovery. However, replication introduces the major issue of data consistency across different copies. Consistency management is a critical for Big Data systems. Strong consistency models introduce serious limitations to systems scalability and performance due to the required synchronization efforts. In contrast, weak and eventual consistency models reduce the performance overhead and enable high levels of availability. However, these models may tolerate, under certain scenarios, too much temporal inconsistency. In this Ph.D thesis, we address this issue of consistency tradeoffs in large-scale Big Data systems and applications. We first, focus on consistency management at the storage system level. Accordingly, we propose an automated self-adaptive model (named Harmony) that scale up/down the consistency level at runtime when needed in order to provide as high performance as possible while preserving the application consistency requirements. In addition, we present a thorough study of consistency management impact on the monetary cost of running in the cloud. Hereafter, we leverage this study in order to propose a cost efficient consistency tuning (named Bismar) in the cloud. In a third direction, we study the consistency management impact on energy consumption within the data center. According to our findings, we investigate adaptive configurations of the storage system cluster that target energy saving. In order to complete our system-side study, we focus on the application level. Applications are different and so are their consistency requirements. Understanding such requirements at the storage system level is not possible. Therefore, we propose an application behavior modeling that apprehend the consistency requirements of an application. Based on the model, we propose an online prediction approach- named Chameleon that adapts to the application specific needs and provides customized consistency. Big Data Cloud Stockage Cohérence Performance Big Data Cloud Storage Consistency Performance
245	Decision making under compound uncertainty : experimental study of ambiguity attitudes and sequential choice behavior / Prise de décision en situation d'incertitude composée : étude expérimentale des attitudes face à l'ambiguïté et des comportements de choix séquentiels Nebout, Antoine 02 December 2011 (has links) Cette thèse appartient au domaine de la théorie de la décision en situation d'incertitude. Elle vise à comprendre, décrire, et représenter les choix individuels dans différents contextes de décision. Notre travail se concentre sur le fait que le comportement économique est souvent influencé par la structure et le déroulement de la résolution de l'incertitude. Dans une première expérience nous avons confronté nos sujets à différents types d'incertitude – à savoir du risque (probabilités connues), de l'incertain (probabilités inconnues), du risque composé et de l'incertain composé – en utilisant des mécanismes aléatoires particuliers. Le chapitre 1 analyse l'hétérogénéité des attitudes individuelles face à l'ambiguïté, au risque composé et à l'incertain composé alors que dans le chapitre 2, le modèle d'espérance d'utilité à dépendance du rang est utilisé comme outil de mesure afin d'étudier en détails ces attitudes au niveau individuel. Le chapitre 3 confronte à l'expérience l'interprétation de l'ambiguïté en terme de croyances de second ordre et propose une méthode d'élicitation de la fonction qui caractérise l'attitude face à l'ambiguïté dans les modèles « récursifs » de décision face à l'incertain. La seconde partie de la thèse s'intéresse aux comportements de décision individuelle dans un contexte dynamique et est composée de deux études expérimentales indépendantes. Néanmoins, elles reposent toutes deux sur la décomposition de l'axiome d'indépendance en trois axiomes dynamiques: conséquentialisme, cohérence dynamique et réduction des loteries composées. Le chapitre 4 rapporte les résultats d'une expérience de décision individuelle sur les facteurs de violations de chacun de ces axiomes. Le chapitre 5 présente une catégorisation conceptuelle des comportements individuels dans des problèmes de décision séquentiels face au risque. Le cas des agents ne se conformant pas à l'axiome d'indépendance y est étudié de façon systématique et les résultats d'une expérience spécialement conçue pour tester cette catégorisation sont présentés. / This thesis belongs to the domain of decision theory under uncertainty and aims to understand, describe and represent individual choices in various decision contexts. Our work focuses on the fact that economic behavior is often influenced by the structure and the timing of resolution of uncertainty. In a first experimental part, we confronted subjects with different types of uncertainty, namely risk (known probabilities), uncertainty (unknown probabilities), compound risk and compound uncertainty, which were generated using special random devices. In chapter 1 we analyze the heterogeneity of attitudes towards ambiguity, compound risk and compound uncertainty whereas in chapter 2, we use rank dependent expected utility as a measuring tool in order to individually investigate these attitudes. Chapter 3 confronts the interpretation of ambiguity in term of second order beliefs with the experimental data and proposes a method for eliciting the function that encapsulates attitudes toward ambiguity in the “recursive” or multistage models of decision under uncertainty. The second part of the thesis deals with individual decision making under risk in a dynamic context and is composed of two independent experimental studies. Both of them rely on the decomposition of the independence axiom into three dynamic axioms: consequentialism, dynamic consistency and reduction of compound lotteries. Chapter 4 reports experimental data about violations of each of the three axioms. Chapter 5 presents a conceptual categorization of individual behavior in sequential decision problems under risk, especially those which do not conform to the independence axiom. We propose an experiment specially designed to test the predictions of this categorization. Ambiguïté Cohérence dynamique Ambiguity Non-expected Utility Models Dynamic Consistency Behavioral Economics
246	Análise da influência e características das vias no número e na severidade dos acidentes: estudo de caso na autoestrada Grajaú-Jacarepaguá / Analysis of the influence of roads characteristics on the number and severity of accidents: case study on the Grajaú-Jacarepaguá Highway Bruno Alexandre Brandimarte Leal 27 April 2017 (has links) O objetivo deste trabalho é analisar a relação de características das vias com os acidentes de trânsito. A principal motivação para o desenvolvimento deste estudo está na ampla variedade de características que podem ter influência nos acidentes e na complexidade da relação entre elas, que pode tornar a análise distinta em cada caso. As vantagens decorrentes da adoção de ações baseadas no gerenciamento de riscos e no tratamento preventivo da segurança viária priorizam atuações sobre veículos e, em especial, sobre o esquema viário como alternativa viável para reduzir o peso do fator humano nos acidentes. Para isso foi elaborado um longo referencial teórico, enumerando potenciais características e as relacionando com acidentes, ressaltando a importância de avaliar e tratar aspectos concatenados ao veículo e à via, com o intuito de conseguir uma análise mais eficaz nas condições de segurança e contribuindo nas áreas acadêmica e profissional fornecendo dados e informações para ajudar na identificação e escolhas de características que mais influenciem a segurança. Também foi desenvolvido um estudo de caso, considerando a Autoestrada Grajaú-Jacarepaguá (Avenida Menezes Cortes), localizada na cidade do Rio de Janeiro. A partir de dois bancos de dados de acidentes, fornecidos pela Companhia de Engenharia de Tráfego do Rio de Janeiro (CET-Rio) e uma investigação de campo realizada na autoestrada, foram identificadas relações dos acidentes com as características viárias. Concluiu-se que as características das vias têm grande relevância sobre os acidentes ocorridos. Os resultados indicam que a consistência geométrica está relacionada aos acidentes de trânsito, porém ainda não existe um modelo de previsão que ajude a entender tal comportamento. De forma geral, as atividades de reconstrução e manutenção da via também dão apoio nas questões de segurança de tráfego. Apesar das limitações, espera-se que o estudo apresentado nesse projeto sirva como referência para futuras intervenções e melhorias das vias. / This study aims at analyzing a relationship of influence between the accidents and the road characteristics. The main study motivation is on the wide variety of characteristics that may influence the accidents, which can make the analysis distinct in each case. The advantages from the adoption of actions based on road safety risk management and the preventive treatment prioritize interventions on vehicles and, in particular, on the road system as a viable alternative to reduce the role of the human factor in accidents. For this, a long theoretical framework was developed, enumerating potential characteristics and relating them to accidents, emphasizing the importance of evaluating e treating related to the vehicle and to the highway to achieve a more effective analysis in safety conditions and contributing in the academic and professional areas providing data and information to help in the identification and selection of characteristics that influence the safety. Also, a case study was prepared, considering the Grajaú-Jacarepaguá Highway (Menezes Cortes Avenue), located in Rio de Janeiro. Based on two accidents databases provided by Agency of Engineering of Traffic of Rio de Janeiro (CET-Rio) and a highway field investigation, some relationships of accidents with roads characteristics were identified. It can be concluded that roads characteristics have great relevance on the occurred accidents. The results indicate that geometric consistency have a relationship with traffic accidents, however there isn\'t still a predictive model to support the understanding of this behavior. In general, the reconstruction and maintenance activities also support traffic safety issues. Despite the limitations, it is expected that the presented study in this project will serve as a reference for the future interventions and improvements of roads. Acidentes de trânsito Consistência do projeto geométrico Infraestrutura viária Segurança de tráfego Design consistency Traffic accidents Traffic safety
247	SUPPORTING MULTIPLE ISOLATION LEVELS IN REPLICATED ENVIRONMENTS Bernabe Gisbert, Jose Maria 20 March 2014 (has links) La replicación de bases de datos aporta fiabilidad y escalabilidad aunque hacerlo de forma transparente no es una tarea sencilla. Una base de datos replicada es transparente si puede reemplazar a una base de datos centralizada tradicional sin que sea necesario adaptar el resto de componentes del sistema. La transparencia en bases de datos replicadas puede obtenerse siempre que (a) la gestión de la replicación quede totalmente oculta a dichos componentes y (b) se ofrezca la misma funcionalidad que en una base de datos tradicional. Para mejorar el rendimiento general del sistema, los gestores de bases de datos centralizadas actuales permiten ejecutar de forma concurrente transacciones bajo distintos niveles de aislamiento. Por ejemplo, la especificación del benchmark TPC-C permite la ejecución de algunas transacciones con niveles de aislamiento débiles. No obstante, este soporte todavía no está disponible en los protocolos de replicación. En esta tesis mostramos cómo estos protocolos pueden ser extendidos para permitir la ejecución de transacciones con distintos niveles de aislamiento. / Bernabe Gisbert, JM. (2014). SUPPORTING MULTIPLE ISOLATION LEVELS IN REPLICATED ENVIRONMENTS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/36535 / TESIS Databases Replication Distributed systems Isolation Computer science Algorithms Fault tolerance Reliability Consistency LENGUAJES Y SISTEMAS INFORMATICOS
248	Decision consistency and accuracy indices for the bifactor and testlet response theory models LaFond, Lee James 01 July 2014 (has links) The primary goal of this study was to develop a new procedure for estimating decision consistency and accuracy indices using the bifactor and testlet response theory (TRT) models. This study is the first to investigate decision consistency and accuracy from a multidimensional perspective, and the results have shown that the bifactor model at least behaved in way that met the author's expectations and represents a potential useful procedure. The TRT model, on the other hand, did not meet the author's expectations and generally showed poor model performance. The multidimensional decision consistency and accuracy indices proposed in this study appear to provide good performance, at least for the bifactor model, in the case of a substantial testlet effect. For practitioners examining a test containing testlets for decision consistency and accuracy, a recommended first step is to check for dimensionality. If the testlets show a significant degree of multidimensionality, then the usage of the multidimensional indices proposed can be recommended as the simulation study showed an improved level of performance over unidimensional IRT models. However, if there is a not a significant degree of multidimensionality then the unidimensional IRT models and indices would perform as well, or even better, than the multidimensional models. Another goal of this study was to compare methods for numerical integration used in the calculation of decision consistency and accuracy indices. This study investigated a new method (M method) that sampled ability estimates through a Monte-Carlo approach. In summary, the M method seems to be just as accurate as the other commonly used methods for numerical integration. However, it has some practical advantages over the D and P methods. As previously mentioned, it is not as nearly as computationally intensive as the D method. Also, the P method requires large sample sizes. In addition, the P method has conceptual disadvantage in that the conditioning variable, in theory, should be the true theta, not an estimated theta. The M method avoids both of these issues and seems to provide equally accurate estimates of decision consistency and accuracy indices, which makes it a strong option particularly in multidimensional cases. Bifactor Model Decision Accuracy Decision Consistency Multidimensional Item Response Theory Testlet Response Model Educational Psychology
249	Quelques propriétés asymptotiques en estimation non paramétrique de fonctionnelles de processus stationnaires en temps continu / Some asymptotic properties for nonparametric estimation of functional of stationary continuous time processes Didi, Sultana 15 September 2014 (has links) Les travaux de cette thèse portent sur les problèmes d’estimation non paramétrique des fonctions de densité, de régression et du mode conditionnel associés à des processus stationnaires à temps continu. La motivation essentielle est d’établir des propriétés asymptotiques tout en considérant un cadre de dépendance des données assez général qui puisse être facilement utilisé en pratique. Cette contribution se compose de quatre parties. La première partie est consacrée à l’état de l’art relatif à la problématique qui situe bien notre contribution dans la littérature. Dans le deuxième partie, nous nous intéressons à l’estimation, par la méthode du noyau, de la densité pour laquelle nous établissons des résultats de convergence presque sûre, ponctuelle et uniforme, avec des vitesses de convergence. Dans les parties suivantes, les données sont supposées stationnaires et ergodiques. Dans la troisième partie, des propriétés asymptotiques similaires sont établies pour l’estimation à noyau de la fonction de régression. Dans le même esprit, nous étudions dans la quatrième partie, l’estimation à noyau de la fonction mode conditionnel pour lequel nous établissons des propriétés de consistance avec des vitesses de convergence. L’estimateur proposé ici se positionne comme une alternative à celui de la fonction de régression dans les problèmes de prévision. / The work of this thesis focuses upon some nonparametric estimation problems. More precisely, considering kernel estimators of the density, the regression and the conditional mode functions associated to a stationary continuous-time process, we aim at establishing some asymptotic properties while taking a sufficiently general dependency framework for the data as to be easily used in practice. The present manuscript includes four parts. The first one gives the state of the art related to the field of our concern and identifies well our contribution as compared to the existing results in the literature. In the second part, we focus on the kernel density estimation. In a rather general dependency setting, where we use a martingale difference device and a technique based on a sequence of projections on -fields, we establish the almost sure pointwise and uniform consistencies with rates of our estimate. In the third part, similar asymptotic properties are established for the kernel estimator of the regression function. Here and below, the processes are assumed to be ergodic In the same spirit, we study in the fourth part, the kernel estimate of conditional mode function for which we establish consistency properties with rates of convergence. The proposed estimator may be viewed as an alternative in the prediction issues to the usual regression function. Estimation non-paramétrique Estimateur à noyau Consistance Convergence presque sûre et uniforme Ergodicité Stationarité Consistency Continuous time 519.5
250	Apprentissage et forêts aléatoires / Learning with random forests Scornet, Erwan 30 November 2015 (has links) Cette thèse est consacrée aux forêts aléatoires, une méthode d'apprentissage non paramétrique introduite par Breiman en 2001. Très répandues dans le monde des applications, les forêts aléatoires possèdent de bonnes performances et permettent de traiter efficacement de grands volumes de données. Cependant, la théorie des forêts ne permet pas d'expliquer à ce jour l'ensemble des bonnes propriétés de l'algorithme. Après avoir dressé un état de l'art des résultats théoriques existants, nous nous intéressons en premier lieu au lien entre les forêts infinies (analysées en théorie) et les forêts finies (utilisées en pratique). Nous proposons en particulier une manière de choisir le nombre d'arbres pour que les erreurs des forêts finies et infinies soient proches. D'autre part, nous étudions les forêts quantiles, un type d'algorithme proche des forêts de Breiman. Dans ce cadre, nous démontrons l'intérêt d'agréger des arbres : même si chaque arbre de la forêt quantile est inconsistant, grâce à un sous-échantillonnage adapté, la forêt quantile est consistante. Dans un deuxième temps, nous prouvons que les forêts aléatoires sont naturellement liées à des estimateurs à noyau que nous explicitons. Des bornes sur la vitesse de convergence de ces estimateurs sont également établies. Nous démontrons, dans une troisième approche, deux théorèmes sur la consistance des forêts de Breiman élaguées et complètement développées. Dans ce dernier cas, nous soulignons, comme pour les forêts quantiles, l'importance du sous-échantillonnage dans la consistance de la forêt. Enfin, nous présentons un travail indépendant portant sur l'estimation de la toxicité de certains composés chimiques. / This is devoted to a nonparametric estimation method called random forests, introduced by Breiman in 2001. Extensively used in a variety of areas, random forests exhibit good empirical performance and can handle massive data sets. However, the mathematical forces driving the algorithm remain largely unknown. After reviewing theoretical literature, we focus on the link between infinite forests (theoretically analyzed) and finite forests (used in practice) aiming at narrowing the gap between theory and practice. In particular, we propose a way to select the number of trees such that the errors of finite and infinite forests are similar. On the other hand, we study quantile forests, a type of algorithms close in spirit to Breiman's forests. In this context, we prove the benefit of trees aggregation: while each tree of quantile forest is not consistent, with a proper subsampling step, the forest is. Next, we show the connection between forests and some particular kernel estimates, which can be made explicit in some cases. We also establish upper bounds on the rate of convergence for these kernel estimates. Then we demonstrate two theorems on the consistency of both pruned and unpruned Breiman forests. We stress the importance of subsampling to demonstrate the consistency of the unpruned Breiman's forests. At last, we present the results of a Dreamchallenge whose goal was to predict the toxicity of several compounds for several patients based on their genetic profile. Estimation non-Paramétrique Forêt aléatoire Méthodes à noyau Consistance Arbre de régression Agrégation Random forest Consistency Breiman's forests 519.5

Search results