Global ETD Search

81	Fast and slow machine learning / Apprentissage automatique rapide et lent Montiel López, Jacob 07 March 2019 (has links) L'ère du Big Data a révolutionné la manière dont les données sont créées et traitées. Dans ce contexte, de nombreux défis se posent, compte tenu de la quantité énorme de données disponibles qui doivent être efficacement gérées et traitées afin d’extraire des connaissances. Cette thèse explore la symbiose de l'apprentissage en mode batch et en flux, traditionnellement considérés dans la littérature comme antagonistes, sur le problème de la classification à partir de flux de données en évolution. L'apprentissage en mode batch est une approche bien établie basée sur une séquence finie: d'abord les données sont collectées, puis les modèles prédictifs sont créés, finalement le modèle est appliqué. Par contre, l’apprentissage par flux considère les données comme infinies, rendant le problème d’apprentissage comme une tâche continue (sans fin). De plus, les flux de données peuvent évoluer dans le temps, ce qui signifie que la relation entre les caractéristiques et la réponse correspondante peut changer. Nous proposons un cadre systématique pour prévoir le surendettement, un problème du monde réel ayant des implications importantes dans la société moderne. Les deux versions du mécanisme d'alerte précoce (batch et flux) surpassent les performances de base de la solution mise en œuvre par le Groupe BPCE, la deuxième institution bancaire en France. De plus, nous introduisons une méthode d'imputation évolutive basée sur un modèle pour les données manquantes dans la classification. Cette méthode présente le problème d'imputation sous la forme d'un ensemble de tâches de classification / régression résolues progressivement.Nous présentons un cadre unifié qui sert de plate-forme d'apprentissage commune où les méthodes de traitement par batch et par flux peuvent interagir de manière positive. Nous montrons que les méthodes batch peuvent être efficacement formées sur le réglage du flux dans des conditions spécifiques. Nous proposons également une adaptation de l'Extreme Gradient Boosting algorithme aux flux de données en évolution. La méthode adaptative proposée génère et met à jour l'ensemble de manière incrémentielle à l'aide de mini-lots de données. Enfin, nous présentons scikit-multiflow, un framework open source en Python qui comble le vide en Python pour une plate-forme de développement/recherche pour l'apprentissage à partir de flux de données en évolution. / The Big Data era has revolutionized the way in which data is created and processed. In this context, multiple challenges arise given the massive amount of data that needs to be efficiently handled and processed in order to extract knowledge. This thesis explores the symbiosis of batch and stream learning, which are traditionally considered in the literature as antagonists. We focus on the problem of classification from evolving data streams.Batch learning is a well-established approach in machine learning based on a finite sequence: first data is collected, then predictive models are created, then the model is applied. On the other hand, stream learning considers data as infinite, rendering the learning problem as a continuous (never-ending) task. Furthermore, data streams can evolve over time, meaning that the relationship between features and the corresponding response (class in classification) can change.We propose a systematic framework to predict over-indebtedness, a real-world problem with significant implications in modern society. The two versions of the early warning mechanism (batch and stream) outperform the baseline performance of the solution implemented by the Groupe BPCE, the second largest banking institution in France. Additionally, we introduce a scalable model-based imputation method for missing data in classification. This method casts the imputation problem as a set of classification/regression tasks which are solved incrementally.We present a unified framework that serves as a common learning platform where batch and stream methods can positively interact. We show that batch methods can be efficiently trained on the stream setting under specific conditions. The proposed hybrid solution works under the positive interactions between batch and stream methods. We also propose an adaptation of the Extreme Gradient Boosting (XGBoost) algorithm for evolving data streams. The proposed adaptive method generates and updates the ensemble incrementally using mini-batches of data. Finally, we introduce scikit-multiflow, an open source framework in Python that fills the gap in Python for a development/research platform for learning from evolving data streams. Apprentissage automatique Flux de données Classification Données manquantes Dérive de concept Big data Machine learning Data stream Classification Missing data Concept drift Big data
82	Performance Optimizations and Operator Semantics for Streaming Data Flow Programs Sax, Matthias J. 01 July 2020 (has links) Unternehmen sammeln mehr Daten als je zuvor und müssen auf diese Informationen zeitnah reagieren. Relationale Datenbanken eignen sich nicht für die latenzfreie Verarbeitung dieser oft unstrukturierten Daten. Um diesen Anforderungen zu begegnen, haben sich in der Datenbankforschung seit dem Anfang der 2000er Jahre zwei neue Forschungsrichtungen etabliert: skalierbare Verarbeitung unstrukturierter Daten und latenzfreie Datenstromverarbeitung. Skalierbare Verarbeitung unstrukturierter Daten, auch bekannt unter dem Begriff "Big Data"-Verarbeitung, hat in der Industrie schnell Einzug erhalten. Gleichzeitig wurden in der Forschung Systeme zur latenzfreien Datenstromverarbeitung entwickelt, die auf eine verteilte Architektur, Skalierbarkeit und datenparallele Verarbeitung setzen. Obwohl diese Systeme in der Industrie vermehrt zum Einsatz kommen, gibt es immer noch große Herausforderungen im praktischen Einsatz. Diese Dissertation verfolgt zwei Hauptziele: Zuerst wird das Laufzeitverhalten von hochskalierbaren datenparallelen Datenstromverarbeitungssystemen untersucht. Im zweiten Hauptteil wird das "Dual Streaming Model" eingeführt, das eine Semantik zur gleichzeitigen Verarbeitung von Datenströmen und Tabellen beschreibt. Das Ziel unserer Untersuchung ist ein besseres Verständnis über das Laufzeitverhalten dieser Systeme zu erhalten und dieses Wissen zu nutzen um Anfragen automatisch ausreichende Rechenkapazität zuzuweisen. Dazu werden ein Kostenmodell und darauf aufbauende Optimierungsalgorithmen für Datenstromanfragen eingeführt, die Datengruppierung und Datenparallelität einbeziehen. Das vorgestellte Datenstromverarbeitungsmodell beschreibt das Ergebnis eines Operators als kontinuierlichen Strom von Veränderugen auf einer Ergebnistabelle. Dabei behandelt unser Modell die Diskrepanz der physikalischen und logischen Ordnung von Datenelementen inhärent und erreicht damit eine deterministische Semantik und eine minimale Verarbeitungslatenz. / Modern companies are able to collect more data and require insights from it faster than ever before. Relational databases do not meet the requirements for processing the often unstructured data sets with reasonable performance. The database research community started to address these trends in the early 2000s. Two new research directions have attracted major interest since: large-scale non-relational data processing as well as low-latency data stream processing. Large-scale non-relational data processing, commonly known as "Big Data" processing, was quickly adopted in the industry. In parallel, low latency data stream processing was mainly driven by the research community developing new systems that embrace a distributed architecture, scalability, and exploits data parallelism. While these systems have gained more and more attention in the industry, there are still major challenges to operate them at large scale. The goal of this dissertation is two-fold: First, to investigate runtime characteristics of large scale data-parallel distributed streaming systems. And second, to propose the "Dual Streaming Model" to express semantics of continuous queries over data streams and tables. Our goal is to improve the understanding of system and query runtime behavior with the aim to provision queries automatically. We introduce a cost model for streaming data flow programs taking into account the two techniques of record batching and data parallelization. Additionally, we introduce optimization algorithms that leverage our model for cost-based query provisioning. The proposed Dual Streaming Model expresses the result of a streaming operator as a stream of successive updates to a result table, inducing a duality between streams and tables. Our model handles the inconsistency of the logical and the physical order of records within a data stream natively, which allows for deterministic semantics as well as low latency query execution. Datenstromverarbeitung Datenflussprogram Parallelität Optimierung Verarbeitungssemantik Data Stream Processing Data Flow Program Parallelization Optimization Processing Semantics 004 Informatik ST 265 ddc:004
83	Dynamický definovatelný dashboard / Dynamic Definable Dashboard Počatko, Boris January 2012 (has links) This thesis deals with the design and implementation of a dynamic user-definable dashboard. The user will be able to define conditions dynamically, which will filter out and save only the data he needs. The application will support the changing of the condition definitions and the display of the graphs after they were created. The current implementations available on the internet are usually solutions designed to fit only one type of project and are not designed to meet general guidelines for a dashboard. The dashboard is designed for a smooth cooperation with high load databases and therefore not to slow down the whole solution.
84	Dolování v prostřední MS SQL pomocí inkrementálních algoritmů / Datamining in MS SQL Using Incremental Algorithms David, Lukáš January 2012 (has links) This work deals with issues in data streams mining which nowadays is a very dynamic area in information technology. The thesis describes the general principles of data mining. There are also the principles of data mining in the data streams. Special attention is given to the implemented algorithm CluStream. In the practical part the data stream processing solution was designed and implemented by the MSSQL technology using the above algorithm. The functionality of the algorithm was verified using own data stream generator.
85	Framework pro tvorbu generátorů dat / Framework for Data Generators Kříž, Blažej January 2012 (has links) This master's thesis is focused on the problem of data generation. At the beginning, it presents several applications for data generation and describes the data generation process. Then it deals with development of framework for data generators and demonstrational application for validating the framework.
86	Representing Data Quality in Sensor Data Streaming Environments Lehner, Wolfgang, Klein, Anja 20 May 2022 (has links) Sensors in smart-item environments capture data about product conditions and usage to support business decisions as well as production automation processes. A challenging issue in this application area is the restricted quality of sensor data due to limited sensor precision and sensor failures. Moreover, data stream processing to meet resource constraints in streaming environments introduces additional noise and decreases the data quality. In order to avoid wrong business decisions due to dirty data, quality characteristics have to be captured, processed, and provided to the respective business task. However, the issue of how to efficiently provide applications with information about data quality is still an open research problem. In this article, we address this problem by presenting a flexible model for the propagation and processing of data quality. The comprehensive analysis of common data stream processing operators and their impact on data quality allows a fruitful data evaluation and diminishes incorrect business decisions. Further, we propose the data quality model control to adapt the data quality granularity to the data stream interestingness. info:eu-repo/classification/ddc/004 ddc:004
87	Algorithmes de machine learning adaptatifs pour flux de données sujets à des changements de concept / Adaptive machine learning algorithms for data streams subject to concept drifts Loeffel, Pierre-Xavier 04 December 2017 (has links) Dans cette thèse, nous considérons le problème de la classification supervisée sur un flux de données sujets à des changements de concepts. Afin de pouvoir apprendre dans cet environnement, nous pensons qu’un algorithme d’apprentissage doit combiner plusieurs caractéristiques. Il doit apprendre en ligne, ne pas faire d’hypothèses sur le concept ou sur la nature des changements de concepts et doit être autorisé à s’abstenir de prédire lorsque c’est nécessaire. Les algorithmes en ligne sont un choix évident pour traiter les flux de données. De par leur structure, ils sont capables de continuellement affiner le modèle appris à l’aide des dernières observations reçues. La structure instance based a des propriétés qui la rende particulièrement adaptée pour traiter le problème des flux de données sujet à des changements de concept. En effet, ces algorithmes font très peu d’hypothèses sur la nature du concept qu’ils essaient d’apprendre ce qui leur donne une flexibilité qui les rend capable d’apprendre un vaste éventail de concepts. Une autre force est que stocker certaines des observations passées dans la mémoire peux amener de précieuses meta-informations qui pourront être utilisées par la suite par l’algorithme. Enfin, nous mettons en valeur l’importance de permettre à un algorithme d’apprentissage de s’abstenir de prédire lorsque c’est nécessaire. En effet, les changements de concepts peuvent être la source de beaucoup d’incertitudes et, parfois, l’algorithme peux ne pas avoir suffisamment d’informations pour donner une prédiction fiable. / In this thesis, we investigate the problem of supervised classification on a data stream subject to concept drifts. In order to learn in this environment, we claim that a successful learning algorithm must combine several characteristics. It must be able to learn and adapt continuously, it shouldn’t make any assumption on the nature of the concept or the expected type of drifts and it should be allowed to abstain from prediction when necessary. On-line learning algorithms are the obvious choice to handle data streams. Indeed, their update mechanism allows them to continuously update their learned model by always making use of the latest data. The instance based (IB) structure also has some properties which make it extremely well suited to handle the issue of data streams with drifting concepts. Indeed, IB algorithms make very little assumptions about the nature of the concept they are trying to learn. This grants them a great flexibility which make them likely to be able to learn from a wide range of concepts. Another strength is that storing some of the past observations into memory can bring valuable meta-informations which can be used by an algorithm. Furthermore, the IB structure allows the adaptation process to rely on hard evidences of obsolescence and, by doing so, adaptation to concept changes can happen without the need to explicitly detect the drifts. Finally, in this thesis we stress the importance of allowing the learning algorithm to abstain from prediction in this framework. This is because the drifts can generate a lot of uncertainties and at times, an algorithm might lack the necessary information to accurately predict. Changement de concept Flux de données Apprentissage en ligne Prédiction avec option de rejet Apprentissage automatique Apprentissage basé sur les instances Concept drift Data Stream Learning algorithm 004
88	Dolování v proudu dat / Data Mining in Data Stream Sýkora, Petr January 2009 (has links) This thesis deals with the data mining in data stream which represents fast developing area of information technology. The text describes common principles of data mining, explains what data stream is and shows methods for its preprocessing and algorithms for following data mining. The special attention is given to the VFDT and the CVDT algorithm. The next mentioned are the spatiotemporal data and related data mining. The second part describes the design and implementation of the application for classification over spatiotemporal data stream represented by road traffic data and following prediction of spatiotemporal events (traffic-jams). The classification is performed by the VFDT and CVFDT algorithm. The application has been tested on the data set obtained by the simulation tool SUMO.
89	Potlačení DoS útoků s využitím strojového učení / Mitigation of DoS Attacks Using Machine Learning Goldschmidt, Patrik January 2021 (has links) Útoky typu odoprenia služby (DDoS) sú v dnešných počítačových sieťach stále frekventovanejším bezpečnostným incidentom. Táto práca sa zameriava na detekciu týchto útokov a poskytnutie relevantných informácii za účelom ich mitigácie v reálnom čase. Spomínaná funkcionalita je dosiahnutá s využitím techník prúdového dolovania z dát a strojového učenia. Výsledkom práce je sada nástrojov zastrešujúca celý proces strojového učenia - od vlastnej extrakcie príznakov cez predspracovanie dát až po export natrénovaného modelu pripraveného na nasadenie v produkcii. Experimentálne výsledky vyhodnotené na viacerých reálnych a syntetických dátových sadách poukazujú na presnosť systému väčšiu ako 99% s možnosťou spoľahlivej detekcie prebiehajúceho útoku do 4 sekúnd od jeho začiatku.
90	Исследование и применение современных WEB-технологий для реализации программного комплекса «Умная парковка» : магистерская диссертация / Research of modern WEB-technologies and their implemetation in creating the software solution “Smart parking” Рапопорт, А. А., Rapoport, A. A. January 2017 (has links) В работе актуализируется проблема эффективности использования современных WEB-технологий при реализации WEB-приложений. Предлагаемый вариант решения позволяет успешно интегрировать стек современных WEB-технологий и применить его в разработке программного комплекса для бронирования и поиска свободного места на парковке. / The thesis updates the problem of the effectiveness of using modern WEB-technologies in the implementation of the WEB user interface, as well as the problem of finding and reserving parking spots in large parking spaces. The offered solution allows to integrate the stack of modern WEB-technologies and apply it in the development of a software package for simple reservation and search of free parking spots. WEB-ТЕХНОЛОГИЯ ВЕБ-ПРИЛОЖЕНИЕ ПОТОК ДАННЫХ MASTER'S THESIS WEB TECHNOLOGY SMART PARKING WEB APPLICATION SINGLE-PAGE APPLICATION DATA STREAM

Search results