Global ETD Search

731	Imbalanced Learning and Feature Extraction in Fraud Detection with Applications / Obalanserade Metoder och Attribut Aggregering för Upptäcka Bedrägeri, med Appliceringar Jacobson, Martin January 2021 (has links) This thesis deals with fraud detection in a real-world environment with datasets coming from Svenska Handelsbanken. The goal was to investigate how well machine learning can classify fraudulent transactions and how new additional features affected classification. The models used were EFSVM, RUTSVM, CS-SVM, ELM, MLP, Decision Tree, Extra Trees, and Random Forests. To determine the best results the Mathew Correlation Coefficient was used as performance metric, which has been shown to have a medium bias for imbalanced datasets. Each model could deal with high imbalanced datasets which is common for fraud detection. Best results were achieved with Random Forest and Extra Trees. The best scores were around 0.4 for the real-world datasets, though the score itself says nothing as it is more a testimony to the dataset’s separability. These scores were obtained when using aggregated features and not the standard raw dataset. The performance measure recall’s scores were around 0.88-0.93 with an increase in precision by 34.4%-67%, resulting in a large decrease of False Positives. Evaluation results showed a great difference compared to test-runs, either substantial increase or decrease. Two theories as to why are discussed, a great distribution change in the evaluation set, and the sample size increase (100%) for evaluation could have lead to the tests not being well representing of the performance. Feature aggregation were a central topic of this thesis, with the main focus on behaviour features which can describe patterns and habits of customers. For these there were five categories: Sender’s fraud history, Sender’s transaction history, Sender’s time transaction history, Sender’shistory to receiver, and receiver’s history. Out of these, the best performance increase was from the first which gave the top score, the other datasets did not show as much potential, with mostn ot increasing the results. Further studies need to be done before discarding these features, to be certain they don’t improve performance. Together with the data aggregation, a tool (t-SNE) to visualize high dimension data was usedto great success. With it an early understanding of what to expect from newly added features would bring to classification. For the best dataset it could be seen that a new sub-cluster of transactions had been created, leading to the belief that classification scores could improve, whichthey did. Feature selection and PCA-reduction techniques were also studied and PCA showedgood results and increased performance. Feature selection had not conclusive improvements. Over- and under-sampling were used and neither improved the scores, though undersampling could maintain the results which is interesting when increasing the dataset. / Denna avhandling handlar om upptäcka bedrägerier i en real-world miljö med data från Svenska Handelsbanken. Målet var att undersöka hur bra maskininlärning är på att klassificera bedrägliga transaktioner, och hur nya attributer hjälper klassificeringen. Metoderna som användes var EFSVM, RUTSVM, CS-SVM, ELM, MLP, Decision Tree, Extra Trees och Random Forests. För evaluering av resultat används Mathew Correlation Coefficient, vilket har visat sig ha småttt beroende med hänsyn till obalanserade datamängder. Varje modell har inbygda värden för attklara av att bearbeta med obalanserade datamängder, vilket är viktigt för att upptäcka bedrägerier. Resultatmässigt visade det sig att Random Forest och Extra Trees var bäst, utan att göra p-test:s, detta på grund att dataseten var relativt sätt små, vilket gör att små skillnader i resultat ej är säkra. De högsta resultaten var cirka 0.4, det absoluta värdet säger ingenting mer än som en indikation om graden av separation mellan klasserna. De bäst resultaten ficks när nya aggregerade attributer användes och inte standard datasetet. Dessa resultat hade recall värden av 0,88-0,93 och för dessa kunde det synas precision ökade med 34,4% - 67%, vilket ger en stor minskning av False Positives. Evluation-resultaten hade stor skillnad mot test-resultaten, denna skillnad hade antingen en betydande ökning eller minskning. Två anledningar om varför diskuterades, förändring av evaluation-datan mot test-datan eller att storleksökning (100%) för evaluation har lett till att testerna inte var representativa. Attribute-aggregering var ett centralt ämne, med fokus på beteende-mönster för att beskriva kunders vanor. För dessa fanns det fem kategorier: Avsändarens bedrägerihistorik, Avsändarens transaktionshistorik, Avsändarens historik av tid för transaktion, Avsändarens historik till mottagaren och mottagarens historik. Av dessa var den största prestationsökningen från bedrägerihistorik, de andra attributerna hade inte lika positiva resultat, de flesta ökade inte resultaten.Ytterligare mer omfattande studier måste göras innan dessa attributer kan sägas vara givande eller ogivande. Tillsammans med data-aggregering användes t-SNE för att visualisera högdimensionsdata med framgång. Med t-SNE kan en tidig förståelse för vad man kan förvänta sig av tillagda attributer, inom klassificering. För det bästa dataset kan man se att ett nytt kluster som hade skapats, vilket kan tolkas som datan var mer beskrivande. Där förväntades också resultaten förbättras, vilket de gjorde. Val av attributer och PCA-dimensions reducering studerades och PCA-visadeförbättring av resultaten. Over- och under-sampling testades och kunde ej förbättrade resultaten, även om undersampling kunde bibehålla resultated vilket är intressant om datamängden ökar. Fraud Fraud Detection Imbalanced Learning Feature Extraction Extreme Learning Machines ELM Machine Learning ML Bedrägeri Computational Mathematics Beräkningsmatematik
732	EXTREME FAST CHARGING FOR LITHIUM ION BATTERIES: STRUCTURAL ANALYSIS OF ELECTRODES AND SOLVENT FORMULATION OF ELECTROLYTES Xianyang Wu (10225322) 13 May 2022 (has links) <p> </p> <p>Fossil fuel has dominated the global energy market for centuries, and the world is undergoing a great energy revolution from fossil fuel energy to renewable energies, given the concerns on global warming and extreme weather caused by the emission of carbon dioxide. Lithium ion batteries (LIBs) play an irreplaceable role in this incredible energy transition from fossil energy to renewable energy, given their importance in energy storage for electricity grids and promoting the mass adoption of battery electric vehicles (BEVs). Extreme fast charging (XFC) of LIBs, aiming to shorten the charging time to 15 minutes, will significantly improve their adoption in both the EV market and grid energy storage. However, XFC has been significantly hindered by the relatively sluggish Li+ transport within LIBs.</p> <p>Herein, effects caused by increasing charging rates (from 1C, 4C to 6C) on LiNi0.6Mn0.2Co0.2O2 (NMC622) \|\| graphite cell were systematically probed via various characterization methods. From electrochemical test on their rate/long term cycling performance, the significant decrease in available capacity under high charging rates was verified. Structural evolutions of cycled NMC622 cathode and graphite anode were further probed via ex-situ powder diffraction, and it was found that lattice parameters <em>a</em> and <em>c</em> of NMC622 experience irreversible evolution due to loss of active Li+ within NMC622; no structural evolution was found for the graphite anode, even after 200 cycles under 6C (10 minutes) high charging rates. The aging behavior of liquid electrolyte was further analyzed via inductively coupled plasma-optical emission spectrometry (ICP-OES) and gas chromatography-mass spectrometry (GC-MS), increased Li+ concentration under higher charging rates and show-up of diethyl carbonate (DEC) and dimethyl carbonate (DMC) caused by transesterification both suggest faster aging/degradation of liquid electrolyte under higher charging rates. </p> <p>Given the structural evolution of NMC622 caused by irreversible Li+ loss after long term cycling, the structural evolution of both NMC622 cathode and lithiated graphite anode were further studied via operando neutron diffraction on customized LiNi0.6Mn0.2Co0.2O2 (NMC622) \|\| graphite cell. Via a quantitative analysis of collected Bragg peaks for NMC622 and lithiated graphite anode, we found the rate independent structural evolution of NMC622: its lattice parameters <em>a</em> and <em>c</em> are mainly determined by Li+ contents within it (<em>x</em> within Li<em>x</em>Ni0.6Mn0.2Co0.2O2) and follow the same evolution during the deintercalation process, from slowest 0.27 C charging to the fastest 4.4 C charging. For graphite intercalated compounds (GICs) formed during Li+ intercalating into graphite, the sequential phase transition from pure graphite → stage III (LiC30) → stage II (LiC12) → stage I (LiC6) phase under 0.27 C charging is consistent with previous studies. This sequential phase transition is generally maintained under increasing charging rates, and the co-existence of LiC12 phase and LiC6 was found for lithiated graphite under 4.4 C charging, mainly due to the large inhomogeneity under these high charging rates. Meanwhile, for the stage II (LiC12) → stage I (LiC6) transition, which contributes half the specific capacity for the graphite anode, quantitative analysis via Johnson-Mehl-Avrami-Kolmogorov (JMAK) model suggests it to be a diffusion-controlled, one-dimensional transition, with decreasing nucleation kinetics under increasing charging rates. </p> <p>Based on the LiC12 → LiC6 transition process, strategies to improve the Li+ transport properties were further utilized. Various cosolvents with smaller viscosity, from dimethyl carbonate (DMC), ethyl acetate (EA), methyl acetate (MA) to ethyl formate (EF), were further tested by replacing 20% (weight percent) ethyl methyl carbonate (EMC) of typical 1.2 M LiPF6 salt solvated in ethylene carbonate (EC)/EMC solvents (with a weight ratio of 30:70). From the measurement of their ion conductivity, the introduction of these cosolvents indeed enhanced the Li+ transport properties. This was further verified by improved rate performance from 2C, 3C to 4C charging for liquid electrolytes using these cosolvents. Both X-ray absorption spectroscopy (XAS) and X-ray powder diffraction (XRD) indicated the increase of Ni valence state and structural evolution of NMC622, all resulting from the irreversible loss of active Li+ within the NMC622 cathode. From long term cycling performance and further analysis of interfaces formed between electrode and anode, the best performance of electrolyte using DMC cosolvent was attributed to the most stable solid electrolyte interphase (SEI) and cathode electrolyte interphase (CEI) formed during the cycling. </p> Lithium ion battery Extreme fast charging Structural evolution Neutron Powder Diffraction Cosolvents
733	Vztah anomálií toků vlhkosti, extrémních srážek a povodní ve střední Evropě / Relationship among moisture flux anomalies, extreme precipitation, and floods in central Europe Gvoždíková, Blanka January 2021 (has links) Floods associated with extreme precipitation are one of the most serious natural hazards, which produce substantial human and socio-economic losses in central Europe. One way to reduce the impact of flooding is by increasing preparedness with better flood forecasts and warnings, which is not possible without a proper understanding of physical processes leading to a flood hazard. However, frequent research on floods in relation to causal precipitation and synoptic conditions is usually carried out regionally, although some events often affect areas of a size of entire countries or even larger. The thesis was focused exactly on these large-scale precipitation and flood events that occurred in the second half of the 20th century and then until 2013, for which the size of the affected area is as crucial in the extremity assessment as the magnitude of flood discharges or precipitation totals. The extremity indices used for the assessment of extreme precipitation and flood events connected both aspects. The larger area of interest defined within central Europe allowed examining the spatial structure of events, the differences between them, and their relation to conditions in the atmosphere. To connect the extremes of precipitation with extremes in atmospheric conditions, the causal circulation was...
734	Současný český nacionalismus v rámci krajní pravice / The current czech nationalism in extreme right Bauer, David January 2015 (has links) Aim of the thesis is to describe and analyse the role of nationalism in contemporary Czech extreme right movements. Its author has two fundamental objectives. The first is to evaluate strength and relevance of contemporary Czech nationalism within extreme right movements. The second objective consists in the analysis of nationalism itself, which should reveal the true nature of these organizations and their ideological platform. This thesis presents an overview of Czech extreme right spectrum. It was essential to select movements that mutually differ and therefore represent various manifestations of Czech right extremism. All three platforms can be classified as extreme right movements strongly resonating with Czech nationalism. They see themselves as patriots who defend conservative values and national traditions. Revue of The National Idea represents an attempt to create a sophisticated, intellectual forum providing conditions for ultra-right views and ideas. D.O.S.T. movement acts as a conservative "people's initiative", standing against multiculturalism and the European Union. The National Party is then an example of extreme political grouping with traces of populism, xenophobia and pure racism. Content analysis of these three movements is the main topic of the thesis. Examining their goals,...
735	Současný anarchofeminismus v ČR / The Contemporary Anarcha-feminism in the Czech Republic Chaloupková, Jana January 2014 (has links) The diploma thesis is focused on the ideology of anarcha-feminism. One of the goals is to study how actually works the linkage between anarchism and feminism and which elements are important and which are missing and how the ideology defers in time. The thesis has two parts - the first one is focused on historical phase of anarcha- feminism, especially on Emma Goldman, American anarchist philosopher, who is often labelled as the founder of this ideology; second part is concerned on Feministická skupina 8. března/Anarchofeministická skupina, contemporary anarcha-feminist collective in the Czech Republic. Feministická skupina 8. března/Anarchofeministická skupina (8 March Feminist Group/Anarchofeminist group) is the only collective in Czech history which adopted anarcha-feminist ideology. The thesis uses critical discourse analysis and also content analysis. In the first chapter we will analyze are Emma Goldman's essays that are concerned on feminist topics. In the second chapter we will critically analyze the publications of Anarchofeministická skupina - magazine Přímá cesta, Siréna, than the anarchist magazine A-kontra, and the websites of two current liberal feminist organizations in the Czech republic - Gender Studies, o.p.s. and Česká ženká lobby. The conclusion compares results of those...
736	Modeling and Simulation of Spatial Extremes Based on Max-Infinitely Divisible and Related Processes Zhong, Peng 17 April 2022 (has links) The statistical modeling of extreme natural hazards is becoming increasingly important due to climate change, whose effects have been increasingly visible throughout the last decades. It is thus crucial to understand the dependence structure of rare, high-impact events over space and time for realistic risk assessment. For spatial extremes, max-stable processes have played a central role in modeling block maxima. However, the spatial tail dependence strength is persistent across quantile levels in those models, which is often not realistic in practice. This lack of flexibility implies that max-stable processes cannot capture weakening dependence at increasingly extreme levels, resulting in a drastic overestimation of joint tail risk. To address this, we develop new dependence models in this thesis from the class of max-infinitely divisible (max-id) processes, which contain max-stable processes as a subclass and are flexible enough to capture different types of dependence structures. Furthermore, exact simulation algorithms for general max-id processes are typically not straightforward due to their complex formulations. Both simulation and inference can be computationally prohibitive in high dimensions. Fast and exact simulation algorithms to simulate max-id processes are provided, together with methods to implement our models in high dimensions based on the Vecchia approximation method. These proposed methodologies are illustrated through various environmental datasets, including air temperature data in South-Eastern Europe in an attempt to assess the effect of climate change on heatwave hazards, and sea surface temperature data for the entire Red Sea. In another application focused on assessing how the spatial extent of extreme precipitation has changed over time, we develop new time-varying $r$-Pareto processes, which are the counterparts of max-stable processes for high threshold exceedances. Max-infinitely divisible processes climate extremes extreme-value theory spatial extent max-stable processes Vecchia approximation r-Pareto processes
737	Genetic Algorithm Based Design and Optimization of VLSI ASICs and Reconfigurable Hardware Fernando, Pradeep Ruben 17 October 2008 (has links) Rapid advances in integration technology have tremendously increased the design complexity of very large scale integrated (VLSI) circuits, necessitating robust optimization techniques in many stages of VLSI design. A genetic algorithm (GA) is a stochastic optimization technique that uses principles derived from the evolutionary process in nature. In this work, genetic algorithms are used to alleviate the hardware design process of VLSI application specific integrated circuits (ASICs) and reconfigurable hardware. VLSI ASIC design suffers from high design complexity and a large number of optimization objectives requiring hierarchical design approaches and multi-objective optimization techniques. The floorplanning stage of the design cycle becomes highly important in hierarchical design methods. In this work, a multi-objective genetic algorithm based floorplanner has been developed with novel crossover operators to address the multi-objective floorplanning problem for VLSI ASICs. The genetic floorplanner achieves significant wirelength savings (>19% on average) with little or no increase in area ( < 3% penalty) over previous floorplanners that perform simultaneous area and wirelength minimization. Hardware implementation of genetic algorithms is gaining importance because of their proven effectiveness as optimization engines for real-time applications. Earlier hardware implementations suffer from major drawbacks such as absence of GA parameter programmability, rigid pre-defined system architecture, and lack of support for multiple fitness functions. A compact IP core that implements a general purpose GA engine has been designed to realize evolvable hardware in field programmable gate array devices. The designed GA core achieved a speedup of around 5.16x over an analogous software implementation. Novel reconfigurable analog architectures have been proposed to realize extreme environment analog electronics. In this work, a digital framework has been developed to realize self reconfigurable analog arrays (SRAA) where genetic algorithms are used to evolve the required analog functionality and compensate performance degradation in extreme environments. The framework supports two methods of compensation, namely, model based lookup and genetic algorithm based compensation and is scalable in terms of the number of fitness evaluation modules. The entire framework has been implemented as a digital ASIC in a leading industrystrength silicon-on-insulator (SOI) technology to obtain high performance and a small form factor. Evolutionary algorithms Multi-objective VLSI floorplanning FPGA IP core implementation Evolvable hardware eesign Extreme environment electronics American Studies Arts and Humanities
738	Étude probabiliste des contraintes de bout en bout dans les systèmes temps réel / Probabilistic study of end-to-end constraints in real-time systems Maxim, Cristian 11 December 2017 (has links) L'interaction sociale, l'éducation et la santé ne sont que quelques exemples de domaines dans lesquels l'évolution rapide de la technologie a eu un grand impact sur la qualité de vie. Les entreprises s’appuient de plus en plus sur les systèmes embarqués pour augmenter leur productivité, leur efficacité et leurs valeurs. Dans les usines, la précision des robots tend à remplacer la polyvalence humaine. Bien que les appareils connectés comme les drônes, les montres intelligentes ou les maisons intelligentes soient de plus en plus populaires ces dernières années, ce type de technologie a été utilisé depuis longtemps dans les industries concernées par la sécurité des utilisateurs. L’industrie avionique utilise des ordinateurs pour ses produits depuis 1972 avec la production du premier avion A300; elle a atteint des progrès étonnants avec le développement du premier avion Concorde en 1976 en dépassant de nombreuses années les avions de son époque, et ça a été considéré comme un miracle de la technologie. Certaines innovations et connaissances acquises pour le Concorde sont toujours utilisées dans les modèles récents comme A380 ou A350. Un système embarqué est un système à microprocesseur qui est construit pour contrôler une fonction ou une gamme de fonctions et qui n’est pas conçu pour être programmé par l'utilisateur final de la même manière qu'un ordinateur personnel. Un système temps-réel est un système de traitement de l’information qui doit répondre aux stimuli d’entrées générées de manière externe dans une période finie et spécifiée. Le comportement de ces systèmes prend en compte non seulement l'exactitude dépend non seulement du résultat logique mais aussi du temps dans lequel il a été livré. Les systèmes temps-réel peuvent être trouvés dans des industries comme l'aéronautique, l'aérospatiale, l'automobile ou l’industrie ferroviaire mais aussi dans les réseaux de capteurs, les traitements d'image, les applications multimédias, les technologies médicales, les robotiques, les communications, les jeux informatiques ou les systèmes ménagers. Dans cette thèse, nous nous concentrons sur les systèmes temps-réel embarqués et pour la facilité des notations, nous leur nommons simplement des systèmes temps réel. Nous pourrions nous référer aux systèmes cyber-physiques si tel est le cas. Le pire temps d’exécution (WCET) d'une tâche représente le temps maximum possible pour qu’elle soit exécutée. Le WCET est obtenu après une analyse de temps et souvent il ne peut pas être déterminé avec précision en déterminant toutes les exécutions possibles. C'est pourquoi, dans l'industrie, les mesures sont faites uniquement sur un sous-ensemble de scénarios possibles, celui qui générerait les temps d'exécution les plus élevés, et une limite supérieure de temps d’exécution est estimé en ajoutant une marge de sécurité au plus grand temps observé. L’analyses de temps est un concept clé qui a été utilisé dans les systèmes temps-réel pour affecter une limite supérieure aux WCET des tâches ou des fragments de programme. Cette affectation peut être obtenue soit par analyse statique, soit par analyse des mesures. Les méthodes statiques et par mesure, dans leurs approches déterministes, ont tendance à être extrêmement pessimistes. Malheureusement, ce niveau de pessimisme et le sur-provisionnement conséquent ne peut pas être accepté par tous les systèmes temps-réels, et pour ces cas, d'autres approches devraient être prises en considération. / In our times, we are surrounded by technologies meant to improve our lives, to assure its security, or programmed to realize different functions and to respect a series of constraints. We consider them as embedded systems or often as parts of cyber-physical systems. An embedded system is a microprocessor-based system that is built to control a function or a range of functions and is not designed to be programmed by the end user in the same way that a PC is. The Worst Case Execution Time (WCET) of a task represents the maximum time it can take to be executed. The WCET is obtained after analysis and most of the time it cannot be accurately determined by exhausting all the possible executions. This is why, in industry, the measurements are done only on a subset of possible scenarios (the one that would generate the highest execution times) and an execution time bound is estimated by adding a safety margin to the greatest observed time. Amongst all branches of real-time systems, an important role is played by the Critical Real-Time Embedded Systems (CRTES) domain. CRTESs are widely being used in fields like automotive, avionics, railway, health-care, etc. The performance of CRTESs is analyzed not only from the point of view of their correctness, but also from the perspective of time. In the avionics industry such systems have to undergo a strict process of analysis in order to fulfill a series of certification criteria demanded by the certifications authorities, being the European Aviation Safety Agency (EASA) in Europe or the Federal Aviation Administration (FAA) in United States. The avionics industry in particular and the real-time domain in general are known for being conservative and adapting to new technologies only when it becomes inevitable. For the avionics industry this is motivated by the high cost that any change in the existing functional systems would bring. Any change in the software or hardware has to undergo another certification process which cost the manufacturer money, time and resources. Despite their conservative tendency, the airplane producers cannot stay inactive to the constant change in technology and ignore the performance benefices brought by COTS processors which nowadays are mainly multi-processors. As a curiosity, most of the microprocessors found in airplanes flying actually in the world, have a smaller computation power than a modern home PC. Their chips-sets are specifically designed for embedded applications characterized by low power consumption, predictability and many I/O peripherals. In the actual context, where critical real-time systems are invaded by multi-core platforms, the WCET analysis using deterministic approaches becomes difficult, if not impossible. The time constraints of real-time systems need to be verified in the context of certification. This verification, done during the entire development cycle, must take into account architectures more and more complex. These architectures increase the cost and complexity of actual, deterministic, tools to identify all possible time constrains and dependencies that can occur inside the system, risking to overlook extreme cases. An alternative to these problems is the probabilistic approach, which is more adapted to deal with these hazards and uncertainty and which allows a precise modeling of the system. 2. Contributions. The contribution of the thesis is three folded containing the conditions necessary for using the theory of extremes on executions time measurements, the methods developed using the theory of extremes for analyzing real-time systems and experimental results. 2.1. Conditions for use of EVT in the real-time domain. In this chapter we establish the environment in which our work is done. The use of EVT in any domain comes with a series of restrictions for the data being analyzed. In our case the data being analyzed consists in execution time measurements. Temps-réel Pire temps d'exécution Probabilités Théorie des valeurs extrêmes Avionics Mesures Real-time Probabilistic timing analysis Theory of extreme values 519.2
739	Apprentissage automatique et extrêmes pour la détection d'anomalies / Machine learning and extremes for anomaly detection Goix, Nicolas 28 November 2016 (has links) La détection d'anomalies est tout d'abord une étape utile de pré-traitement des données pour entraîner un algorithme d'apprentissage statistique. C'est aussi une composante importante d'une grande variété d'applications concrètes, allant de la finance, de l'assurance à la biologie computationnelle en passant par la santé, les télécommunications ou les sciences environnementales. La détection d'anomalies est aussi de plus en plus utile au monde contemporain, où il est nécessaire de surveiller et de diagnostiquer un nombre croissant de systèmes autonomes. La recherche en détection d'anomalies inclut la création d'algorithmes efficaces accompagnée d'une étude théorique, mais pose aussi la question de l'évaluation de tels algorithmes, particulièrement lorsque l'on ne dispose pas de données labellisées -- comme dans une multitude de contextes industriels. En d'autres termes, l'élaboration du modèle et son étude théorique, mais aussi la sélection du modèle. Dans cette thèse, nous abordons ces deux aspects. Tout d'abord, nous introduisons un critère alternatif au critère masse-volume existant, pour mesurer les performances d'une fonction de score. Puis nous nous intéressons aux régions extrêmes, qui sont d'un intérêt particulier en détection d'anomalies, pour diminuer le taux de fausse alarme. Enfin, nous proposons deux méthodes heuristiques, l'une pour évaluer les performances d'algorithmes de détection d'anomalies en grande dimension, l'autre pour étendre l'usage des forets aléatoires à la classification à une classe. / Anomaly detection is not only a useful preprocessing step for training machine learning algorithms. It is also a crucial component of many real-world applications, from various fields like finance, insurance, telecommunication, computational biology, health or environmental sciences. Anomaly detection is also more and more relevant in the modern world, as an increasing number of autonomous systems need to be monitored and diagnosed. Important research areas in anomaly detection include the design of efficient algorithms and their theoretical study but also the evaluation of such algorithms, in particular when no labeled data is available -- as in lots of industrial setups. In other words, model design and study, and model selection. In this thesis, we focus on both of these aspects. We first propose a criterion for measuring the performance of any anomaly detection algorithm. Then we focus on extreme regions, which are of particular interest in anomaly detection, to obtain lower false alarm rates. Eventually, two heuristic methods are proposed, the first one to evaluate anomaly detection algorithms in the case of high dimensional data, the other to extend the use of random forests to the one-class setting. Apprentissage automatique Détection d'anomalies Réduction de la dimension Valeurs extrêmes multivariées Concentration Machine learning Anomaly detection Reduction in size Multivariate extreme values Concentration
740	Estimation non paramétrique adaptative dans la théorie des valeurs extrêmes : application en environnement / Nonparametric adaptive estimation in the extreme value theory : application in ecology Pham, Quang Khoai 09 January 2015 (has links) L'objectif de cette thèse est de développer des méthodes statistiques basées sur la théorie des valeurs extrêmes pour estimer des probabilités d'évènements rares et des quantiles extrêmes conditionnelles. Nous considérons une suite de variables aléatoires indépendantes X_{t_1}$, $X_{t_2}$,...$,$X_{t_n}$ associées aux temps $0≤t_{1}< … <t_{n}≤T_{\max}$ où $X_{t_i}$ a la fonction de répartition $F_{t_i}$ et $F_t$ est la loi conditionnelle de $X$ sachant $T=t \in [0,T_{\max}]$. Pour chaque $t \in [0,T_{\max}]$, nous proposons un estimateur non paramétrique de quantiles extrêmes de $F_t$. L'idée de notre approche consiste à ajuster pour chaque $t \in [0,T_{\max}]$ la queue de la distribution $F_{t}$, par une distribution de Pareto de paramètre $\theta_{t,\tau}$ à partir d'un seuil $\tau.$ Le paramètre $\theta_{t,\tau}$ est estimé en utilisant un estimateur non paramétrique à noyau de taille de fenêtre $h$ basé sur les observations plus grandes que $\tau$. Sous certaines hypothèses de régularité, nous montrons que l'estimateur adaptatif proposé de $\theta_{t,\tau} $ est consistant et nous donnons sa vitesse de convergence. Nous proposons une procédure de tests séquentiels pour déterminer le seuil $\tau$ et nous obtenons le paramètre $h$ suivant deux méthodes : la validation croisée et une approche adaptative. Nous proposons également une méthode pour choisir simultanément le seuil $\tau$ et la taille de la fenêtre $h$. Finalement, les procédures proposées sont étudiées sur des données simulées et sur des données réelles dans le but d'aider à la surveillance de systèmes aquatiques. / The objective of this PhD thesis is to develop statistical methods based on the theory of extreme values to estimate the probabilities of rare events and conditional extreme quantiles. We consider independent random variables $X_{t_1},…,X_{t_n}$ associated to a sequence of times $0 ≤t_1 <… < t_n ≤ T_{\max}$ where $X_{t_i}$ has distribution function $F_{t_i}$ and $F_t$ is the conditional distribution of $X$ given $T = t \in [0,T_{\max}]$. For each $ t \in [0, T {\max}]$, we propose a nonparametric adaptive estimator for extreme quantiles of $F_t$. The idea of our approach is to adjust the tail of the distribution function $F_t$ with a Pareto distribution of parameter $\theta {t,\tau}$ starting from a threshold $\tau$. The parameter $\theta {t,\tau}$ is estimated using a nonparametric kernel estimator of bandwidth $h$ based on the observations larger than $\tau$. We propose a sequence testing based procedure for the choice of the threshold $\tau$ and we determine the bandwidth $h$ by two methods: cross validation and an adaptive procedure. Under some regularity assumptions, we prove that the adaptive estimator of $\theta {t, \tau}$ is consistent and we determine its rate of convergence. We also propose a method to choose simultaneously the threshold $\tau$ and the bandwidth $h$. Finally, we study the proposed procedures by simulation and on real data set to contribute to the survey of aquatic systems. Estimation non paramétrique Probabilités d'évènements rares Quantiles extrêmes conditionnelles Environnement Nonparametric estimation Probabilities of rare events Extreme quantiles conditional Ecology 519.2

Search results