• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 5
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Statistical Learning with Imbalanced Data

Shipitsyn, Aleksey January 2017 (has links)
In this thesis several sampling methods for Statistical Learning with imbalanced data have been implemented and evaluated with a new metric, imbalanced accuracy. Several modifications and new algorithms have been proposed for intelligent sampling: Border links, Clean Border Undersampling, One-Sided Undersampling Modified, DBSCAN Undersampling, Class Adjusted Jittering, Hierarchical Cluster Based Oversampling, DBSCAN Oversampling, Fitted Distribution Oversampling, Random Linear Combinations Oversampling, Center Repulsion Oversampling. A set of requirements on a satisfactory performance metric for imbalanced learning have been formulated and a new metric for evaluating classification performance has been developed accordingly. The new metric is based on a combination of the worst class accuracy and geometric mean. In the testing framework nonparametric Friedman's test and post hoc Nemenyi’s test have been used to assess the performance of classifiers, sampling algorithms, combinations of classifiers and sampling algorithms on several data sets. A new approach of detecting algorithms with dominating and dominated performance has been proposed with a new way of visualizing the results in a network. From experiments on simulated and several real data sets we conclude that: i) different classifiers are not equally sensitive to sampling algorithms, ii) sampling algorithms have different performance within specific classifiers, iii) oversampling algorithms perform better than undersampling algorithms, iv) Random Oversampling and Random Undersampling outperform many well-known sampling algorithms, v) our proposed algorithms Hierarchical Cluster Based Oversampling, DBSCAN Oversampling with FDO, and Class Adjusted Jittering perform much better than other algorithms, vi) a few good combinations of a classifier and sampling algorithm may boost classification performance, while a few bad combinations may spoil the performance, but the majority of combinations are not significantly different in performance.
2

Sampling, qualification and analysis of data streams / Échantillonnage, qualification et analyse des flux de données

El Sibai, Rayane 04 July 2018 (has links)
Un système de surveillance environnementale collecte et analyse continuellement les flux de données générés par les capteurs environnementaux. L'objectif du processus de surveillance est de filtrer les informations utiles et fiables et d'inférer de nouvelles connaissances qui aident l'exploitant à prendre rapidement les bonnes décisions. L'ensemble de ce processus, de la collecte à l'analyse des données, soulève deux problèmes majeurs : le volume de données et la qualité des données. D'une part, le débit des flux de données générés n'a pas cessé d'augmenter sur les dernières années, engendrant un volume important de données continuellement envoyées au système de surveillance. Le taux d'arrivée des données est très élevé par rapport aux capacités de traitement et de stockage disponibles du système de surveillance. Ainsi, un stockage permanent et exhaustif des données est très coûteux, voire parfois impossible. D'autre part, dans un monde réel tel que les environnements des capteurs, les données sont souvent de mauvaise qualité, elles contiennent des valeurs bruitées, erronées et manquantes, ce qui peut conduire à des résultats défectueux et erronés. Dans cette thèse, nous proposons une solution appelée filtrage natif, pour traiter les problèmes de qualité et de volume de données. Dès la réception des données des flux, la qualité des données sera évaluée et améliorée en temps réel en se basant sur un modèle de gestion de la qualité des données que nous proposons également dans cette thèse. Une fois qualifiées, les données seront résumées en utilisant des algorithmes d'échantillonnage. En particulier, nous nous sommes intéressés à l'analyse de l'algorithme Chain-sample que nous comparons à d'autres algorithmes de référence comme l'échantillonnage probabiliste, l'échantillonnage déterministe et l'échantillonnage pondéré. Nous proposons aussi deux nouvelles versions de l'algorithme Chain-sample améliorant sensiblement son temps d'exécution. L'analyse des données du flux est également abordée dans cette thèse. Nous nous intéressons particulièrement à la détection des anomalies. Deux algorithmes sont étudiés : Moran scatterplot pour la détection des anomalies spatiales et CUSUM pour la détection des anomalies temporelles. Nous avons conçu une méthode améliorant l'estimation de l'instant de début et de fin de l'anomalie détectée dans CUSUM. Nos travaux ont été validés par des simulations et aussi par des expérimentations sur deux jeux de données réels et différents : Les données issues des capteurs dans le réseau de distribution de l'eau potable fournies dans le cadre du projet Waves et les données relatives au système de vélo en libre-service (Velib). / An environmental monitoring system continuously collects and analyzes the data streams generated by environmental sensors. The goal of the monitoring process is to filter out useful and reliable information and to infer new knowledge that helps the network operator to make quickly the right decisions. This whole process, from the data collection to the data analysis, will lead to two keys problems: data volume and data quality. On the one hand, the throughput of the data streams generated has not stopped increasing over the last years, generating a large volume of data continuously sent to the monitoring system. The data arrival rate is very high compared to the available processing and storage capacities of the monitoring system. Thus, permanent and exhaustive storage of data is very expensive, sometimes impossible. On the other hand, in a real world such as sensor environments, the data are often dirty, they contain noisy, erroneous and missing values, which can lead to faulty and defective results. In this thesis, we propose a solution called native filtering, to deal with the problems of quality and data volume. Upon receipt of the data streams, the quality of the data will be evaluated and improved in real-time based on a data quality management model that we also propose in this thesis. Once qualified, the data will be summarized using sampling algorithms. In particular, we focus on the analysis of the Chain-sample algorithm that we compare against other reference algorithms such as probabilistic sampling, deterministic sampling, and weighted sampling. We also propose two new versions of the Chain-sample algorithm that significantly improve its execution time. Data streams analysis is also discussed in this thesis. We are particularly interested in anomaly detection. Two algorithms are studied: Moran scatterplot for the detection of spatial anomalies and CUSUM for the detection of temporal anomalies. We have designed a method that improves the estimation of the start time and end time of the anomaly detected in CUSUM. Our work was validated by simulations and also by experimentation on two real and different data sets: The data issued from sensors in the water distribution network provided as part of the Waves project and the data relative to the bike sharing system (Velib).
3

PhenoBee: Drone-Based Robot for Advanced Field Proximal Phenotyping in Agriculture

Ziling Chen (8810570) 19 December 2023 (has links)
<p dir="ltr">The increasing global need for food security and sustainable agriculture underscores the urgency of advancing field phenotyping for enhanced plant breeding and crop management. Soybean, a major global protein source, is at the forefront of these advancements. Proximal sensing in soybean phenotyping offers a higher signal-to-noise ratio and resolution but has been underutilized in large-scale field applications due to low throughput and high labor costs. Moreover, there is an absence of automated solutions for in vivo proximal phenotyping of dicot plants. This thesis addresses these gaps by introducing a comprehensive, technologically sophisticated approach to modern field phenotyping.</p><p dir="ltr">Fully Automated Proximal Hyperspectral Imaging System: The first chapter presents the development of a cutting-edge hyperspectral imaging system integrated with a robotic arm. This system surpasses traditional imaging limitations, providing enhanced close-range data for accurate plant health assessment.</p><p dir="ltr">Robust Leaf Pose Estimation: The second chapter discusses the application of deep learning for accurate leaf pose estimation. This advancement is crucial for in-depth plant analysis, fostering better insights into plant health and growth, thereby contributing to increased crop yield and disease resistance.</p><p dir="ltr">PhenoBee – A Drone Mobility Platform: The third chapter introduces 'PhenoBee,' a dronebased platform designed for extensive field phenotyping. This innovative technology significantly broadens the capabilities of field data collection, showcasing its viability for widespread aerial phenotyping.</p><p dir="ltr">Adaptive Sampling for Dynamic Waypoint Planning: The final chapter details an adaptive sampling algorithm for efficient, real-time waypoint planning. This strategic approach enhances field scouting efficiency and precision, ensuring optimal data acquisition.</p><p dir="ltr">By integrating deep learning, robotic automation, aerial mobility, and intelligent sampling algorithms, the proposed solution revolutionizes the adaptation of in vivo proximal phenotyping on a large scale. The findings of this study highlight the potential to automate agriculture activities with high scalability and identify nutrient deficiencies, diseases, and chemical damage in crops earlier, thereby preventing yield loss, improving food quality, and expediting the development of agricultural products. Collectively, these advancements pave the way for more effective and efficient plant breeding and crop management, directly contributing to the enhancement of global food production systems. This study not only addresses current limitations in field phenotyping but also sets a new standard for technological innovation in agriculture.</p>
4

Design and Development of Intelligent Security Management Systems: Threat Detection and Response in Cyber-based Infrastructures

Yahya Javed (11792741) 19 December 2021 (has links)
<div>Cyber-based infrastructures and systems serve as the operational backbone of many industries and resilience of such systems against cyber-attacks is of paramount importance. As the complexity and scale of the Cyber-based Systems (CBSs) has increased many folds over the years, the attack surface has also been widened, making CBSs more vulnerable to cyber-attacks. This dissertation addresses the challenges in post intrusion security management operations of threat detection and threat response in the networks connecting CBSs. In threat detection, the increase in scale of cyber networks and the rise in sophistication of cyber-attacks has introduced several challenges. The primary challenge is the requirement to detect complex multi-stage cyber-attacks in realtime by processing the immense amount of traffic produced by present-day networks. In threat response, the issue of delay in responding to cyber-attacks and the functional interdependencies among different systems of CBS has been observed to have catastrophic effects, as a cyber attack that compromises one constituent system of a CBS can quickly disseminate to others. This can result in a cascade effect that can impair the operability of the entire CBS. To address the challenges in threat detection, this dissertation proposes PRISM, a hierarchical threat detection architecture that uses a novel attacker behavior model-based sampling technique to minimize the realtime traffic processing overhead. PRISM has a unique multi-layered architecture that monitors network traffic distributedly to provide efficiency in processing and modularity in design. PRISM employs a Hidden Markov Model-based prediction mechanism to identify multi-stage attacks and ascertain the attack progression for a proactive response. Furthermore, PRISM introduces a stream management procedure that rectifies the issue of alert reordering when collected from distributed alert reporting systems. To address the challenges in threat response, this dissertation presents TRAP, a novel threat response and recovery architecture that localizes the cyber-attack in a timely manner, and simultaneously recovers the affected system functionality. The dissertation presents comprehensive performance evaluation of PRISM and TRAP through extensive experimentation, and shows their effectiveness in identifying threats and responding to them while achieving all of their design objectives.</div>
5

Application of Randomized Algorithms in Path Planning and Control of a Micro Air Vehicle

Bera, Titas January 2015 (has links) (PDF)
This thesis focuses on the design and development of a fixed wing micro air vehicle (MAV) and on the development of randomized sampling based motion planning and control algorithms for path planning and stabilization of the MAV. In addition, the thesis also contains probabilis-tic analyses of the algorithmic properties of randomized sampling based algorithms, such as completeness and asymptotic optimality. The thesis begins with a detailed discussion on aerodynamic design, computational fluid dy-namic simulations of propeller wake, wind tunnel tests of a 150mm fixed wing micro air ve-hicle. The vehicle is designed in such a way that in spite of the various adverse effects of low Reynolds number aerodynamics and the complex propeller wake interactions with the airframe, the vehicle shows a balance of external forces and moments at most of the operating conditions. This is supported by various CFD analysis and wind tunnel tests and is shown in this thesis. The thesis also contains a reasonably accurate longitudinal and lateral dynamical model of the MAV, which are verified by numerous flight trials. However, there still exists a considerable amount of model uncertainties in the system descrip-tion of the MAV. A robust feedback stabilized close loop flight control law, is designed to attenuate the effects of modelling uncertainties, discrete vertical and head-on wind gusts, and to maintain flight stability and performance requirements at all allowable operating conditions. The controller is implemented in the MAV autopilot hardware with successful close loop flight trials. The flight controller is designed based on the probabilistic robust control approach. The approach is based on statistical average case analysis and synthesis techniques. It removes the conservatism present in the classical robust feedback design (which is based the worst case de-sign techniques) and associated sluggish system response characteristics. Instead of minimizing the effect of the worst case disturbance, a randomized techniques synthesizes a controller for which some performance index is minimized in an empirical average sense. In this thesis it is shown that the degree of conservatism in the design and the number of samples used to by the randomized sampling based techniques has a direct relationship. In particular, it is shown that, as the lower bound on the number of samples reduces, the degree of conservatism increases in the design. Classical motion planning and obstacle avoidance methodologies are computationally expen-sive with the number of degrees of freedom of the vehicle, and therefore, these methodologies are largely inapplicable for MAVs with 6 degrees of freedom. The problem of computational complexity can be avoided using randomized sampling based motion planning algorithms such as probabilistic roadmap method or PRM. However, as a pay-off these algorithms lack algorith-mic completeness properties. In this thesis, it is established that the algorithmic completeness properties are dependent on the choice of the sampling sequences. The thesis contains analy-sis of algorithmic features such as probabilistic completeness and asymptotic optimality of the PRM algorithm and its many variants, under the incremental and independent problem model framework. It is shown in this thesis that the structure of the random sample sequence affects the solution of the sampling based algorithms. The problem of capturing the connectivity of the configuration space in the presence of ob-stacles, which is a central problem in randomized motion planning, is also discussed in this thesis. In particular, the success probability of one such randomized algorithm, named Obsta-cle based Probabilistic Roadmap Method or OBPRM is estimated using geometric probability theory. A direct relationship between the weak upper bound of the success probability and the obstacle geometric features is established. The thesis also contains a new sampling based algorithm which is based on geometric random walk theory, which addresses the problem of capturing the connectivity of the configuration space. The algorithm shows better performance when compared with other similar algorithm such as the Randomized Bridge Builder method for identical benchmark problems. Numerical simulation shows that the algorithm shows en-hanced performance as the dimension of the motion planning problem increases. As one of the central objectives, the thesis proposes a pre-processing technique of the state space of the system to enhance the performance of sampling based kino-dynamic motion plan-ner such as rapidly exploring random tree or RRT. This pre-processing technique can not only be applied for the motion planning of the MAV, but can also be applied for a wide class of vehicle and complex systems with large number of degrees of freedom. The pre-processing techniques identifies the sequence of regions, to be searched for a solution, in order to do mo-tion planning and obstacle avoidance for an MAV, by an RRT planner. Numerical simulation shows significant improvement over the basic RRT planner with a small additional computa-tional overhead. The probabilistic analysis of RRT algorithm and an approximate asymptotic optimality analysis of the solution returned by the algorithm, is also presented in this thesis. In particular, it is shown that the RRT algorithm is not asymptotically optimal. An integral part of the motion planning algorithm is the capability of fast collision detection between various geometric objects. Image space based methods, which uses Graphics Pro-cessing Unit or GPU hardware, and do not use object geometry explicitly, are found to be fast and accurate for this purpose. In this thesis, a new collision detection method between two convex/non-convex objects using GPU, is provided. The performance of the algorithm, which is an extension of an existing algorithm, is verified with numerous collision detection scenarios.

Page generated in 0.0607 seconds