• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 162
  • 20
  • 11
  • 11
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 323
  • 323
  • 135
  • 111
  • 81
  • 69
  • 66
  • 44
  • 43
  • 42
  • 39
  • 38
  • 36
  • 35
  • 34
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
221

Data Science and Analytics in Industrial Maintenance: Selection, Evaluation, and Application of Data-Driven Methods

Zschech, Patrick 02 October 2020 (has links)
Data-driven maintenance bears the potential to realize various benefits based on multifaceted data assets generated in increasingly digitized industrial environments. By taking advantage of modern methods and technologies from the field of data science and analytics (DSA), it is possible, for example, to gain a better understanding of complex technical processes and to anticipate impending machine faults and failures at an early stage. However, successful implementation of DSA projects requires multidisciplinary expertise, which can rarely be covered by individual employees or single units within an organization. This expertise covers, for example, a solid understanding of the domain, analytical method and modeling skills, experience in dealing with different source systems and data structures, and the ability to transfer suitable solution approaches into information systems. Against this background, various approaches have emerged in recent years to make the implementation of DSA projects more accessible to broader user groups. These include structured procedure models, systematization and modeling frameworks, domain-specific benchmark studies to illustrate best practices, standardized DSA software solutions, and intelligent assistance systems. The present thesis ties in with previous efforts and provides further contributions for their continuation. More specifically, it aims to create supportive artifacts for the selection, evaluation, and application of data-driven methods in the field of industrial maintenance. For this purpose, the thesis covers four artifacts, which were developed in several publications. These artifacts include (i) a comprehensive systematization framework for the description of central properties of recurring data analysis problems in the field of industrial maintenance, (ii) a text-based assistance system that offers advice regarding the most suitable class of analysis methods based on natural language and domain-specific problem descriptions, (iii) a taxonomic evaluation framework for the systematic assessment of data-driven methods under varying conditions, and (iv) a novel solution approach for the development of prognostic decision models in cases of missing label information. Individual research objectives guide the construction of the artifacts as part of a systematic research design. The findings are presented in a structured manner by summarizing the results of the corresponding publications. Moreover, the connections between the developed artifacts as well as related work are discussed. Subsequently, a critical reflection is offered concerning the generalization and transferability of the achieved results. Thus, the thesis not only provides a contribution based on the proposed artifacts; it also paves the way for future opportunities, for which a detailed research agenda is outlined.:List of Figures List of Tables List of Abbreviations 1 Introduction 1.1 Motivation 1.2 Conceptual Background 1.3 Related Work 1.4 Research Design 1.5 Structure of the Thesis 2 Systematization of the Field 2.1 The Current State of Research 2.2 Systematization Framework 2.3 Exemplary Framework Application 3 Intelligent Assistance System for Automated Method Selection 3.1 Elicitation of Requirements 3.2 Design Principles and Design Features 3.3 Prototypical Instantiation and Evaluation 4 Taxonomic Framework for Method Evaluation 4.1 Survey of Prognostic Solutions 4.2 Taxonomic Evaluation Framework 4.3 Exemplary Framework Application 5 Method Application Under Industrial Conditions 5.1 Conceptualization of a Solution Approach 5.2 Prototypical Implementation and Evaluation 6 Discussion of the Results 6.1 Connections Between Developed Artifacts and Related Work 6.2 Generalization and Transferability of the Results 7 Concluding Remarks Bibliography Appendix I: Implementation Details Appendix II: List of Publications A Publication P1: Focus Area Systematization B Publication P2: Focus Area Method Selection C Publication P3: Focus Area Method Selection D Publication P4: Focus Area Method Evaluation E Publication P5: Focus Area Method Application / Datengetriebene Instandhaltung birgt das Potential, aus den in Industrieumgebungen vielfältig anfallenden Datensammlungen unterschiedliche Nutzeneffekte zu erzielen. Unter Verwendung von modernen Methoden und Technologien aus dem Bereich Data Science und Analytics (DSA) ist es beispielsweise möglich, das Verhalten komplexer technischer Prozesse besser nachzuvollziehen oder bevorstehende Maschinenausfälle und Fehler frühzeitig zu erkennen. Eine erfolgreiche Umsetzung von DSA-Projekten erfordert jedoch multidisziplinäres Expertenwissen, welches sich nur selten von einzelnen Personen bzw. Einheiten innerhalb einer Organisation abdecken lässt. Dies umfasst beispielsweise ein fundiertes Domänenverständnis, Kenntnisse über zahlreiche Analysemethoden, Erfahrungen im Umgang mit verschiedenen Quellsystemen und Datenstrukturen sowie die Fähigkeit, geeignete Lösungsansätze in Informationssysteme zu überführen. Vor diesem Hintergrund haben sich in den letzten Jahren verschiedene Ansätze herausgebildet, um die Durchführung von DSA-Projekten für breitere Anwendergruppen zugänglich zu machen. Dazu gehören strukturierte Vorgehensmodelle, Systematisierungs- und Modellierungsframeworks, domänenspezifische Benchmark-Studien zur Veranschaulichung von Best Practices, Standardlösungen für DSA-Software und intelligente Assistenzsysteme. An diese Arbeiten knüpft die vorliegende Dissertation an und liefert weitere Artefakte, um insbesondere die Selektion, Evaluation und Anwendung datengetriebener Methoden im Bereich der industriellen Instandhaltung zu unterstützen. Insgesamt erstreckt sich die Abhandlung auf vier Artefakte, die in einzelnen Publikationen erarbeitet wurden. Dies umfasst (i) ein umfangreiches Systematisierungsframework zur Beschreibung zentraler Ausprägungen wiederkehrender Datenanalyseprobleme im Bereich der industriellen Instandhaltung, (ii) ein textbasiertes Assistenzsystem, welches ausgehend von natürlichsprachlichen und domänenspezifischen Problembeschreibungen eine geeignete Klasse von Analysemethoden vorschlägt, (iii) ein taxonomisches Evaluationsframework zur systematischen Bewertung von datengetriebenen Methoden unter verschiedenen Rahmenbedingungen sowie (iv) einen neuartigen Lösungsansatz zur Entwicklung von prognostischen Entscheidungsmodellen im Fall von eingeschränkter Informationslage. Die Konstruktion der Artefakte wird durch einzelne Forschungsziele im Rahmen eines systematischen Forschungsdesigns angeleitet. Neben der Darstellung der einzelnen Forschungsbeiträge unter Bezugnahme auf die erzielten Ergebnisse der dazugehörigen Publikationen werden auch die Verbindungen zwischen den entwickelten Artefakten beleuchtet und Zusammenhänge zu angrenzenden Arbeiten hergestellt. Zudem erfolgt eine kritische Reflektion der Ergebnisse hinsichtlich ihrer Verallgemeinerung und Übertragung auf andere Rahmenbedingungen. Dadurch liefert die vorliegende Abhandlung nicht nur einen Beitrag anhand der erzeugten Artefakte, sondern ebnet auch den Weg für fortführende Forschungsarbeiten, wofür eine detaillierte Forschungsagenda erarbeitet wird.:List of Figures List of Tables List of Abbreviations 1 Introduction 1.1 Motivation 1.2 Conceptual Background 1.3 Related Work 1.4 Research Design 1.5 Structure of the Thesis 2 Systematization of the Field 2.1 The Current State of Research 2.2 Systematization Framework 2.3 Exemplary Framework Application 3 Intelligent Assistance System for Automated Method Selection 3.1 Elicitation of Requirements 3.2 Design Principles and Design Features 3.3 Prototypical Instantiation and Evaluation 4 Taxonomic Framework for Method Evaluation 4.1 Survey of Prognostic Solutions 4.2 Taxonomic Evaluation Framework 4.3 Exemplary Framework Application 5 Method Application Under Industrial Conditions 5.1 Conceptualization of a Solution Approach 5.2 Prototypical Implementation and Evaluation 6 Discussion of the Results 6.1 Connections Between Developed Artifacts and Related Work 6.2 Generalization and Transferability of the Results 7 Concluding Remarks Bibliography Appendix I: Implementation Details Appendix II: List of Publications A Publication P1: Focus Area Systematization B Publication P2: Focus Area Method Selection C Publication P3: Focus Area Method Selection D Publication P4: Focus Area Method Evaluation E Publication P5: Focus Area Method Application
222

Unsupervised anomaly detection for aircraft health monitoring system

Dani, Mohamed Cherif 10 March 2017 (has links)
La limite des connaissances techniques ou fondamentale, est une réalité dont l’industrie fait face. Le besoin de mettre à jour cette connaissance acquise est essentiel pour une compétitivité économique, mais aussi pour une meilleure maniabilité des systèmes et machines. Aujourd’hui grâce à ces systèmes et machine, l’expansion de données en quantité, en fréquence de génération est un véritable phénomène. À présent par exemple, les avions Airbus génèrent des centaines de mégas de données par jour, et intègrent des centaines voire des milliers de capteurs dans les nouvelles générations d’avions. Ces données générées par ces capteurs, sont exploitées au sol ou pendant le vol, pour surveiller l’état et la santé de l’avion, et pour détecter des pannes, des incidents ou des changements. En théorie, ces pannes, ces incidents ou ces changements sont connus sous le terme d’anomalie. Une anomalie connue comme un comportement qui ne correspond pas au comportement normal des données. Certains la définissent comme une déviation d’un modèle normal, d’autres la définissent comme un changement. Quelques soit la définition, le besoin de détecter cette anomalie est important pour le bon fonctionnement de l'avion. Actuellement, la détection des anomalies à bord des avions est assuré par plusieurs équipements de surveillance aéronautiques, l’un de ces équipements est le « Aircraft condition monitoring System –ACMS », enregistre les données générées par les capteurs en continu, il surveille aussi l’avion en temps réel grâce à des triggers et des seuils programmés par des Airlines ou autres mais à partir d’une connaissance a priori du système. Cependant, plusieurs contraintes limitent le bon fonctionnement de cet équipement, on peut citer par exemple, la limitation des connaissances humaines un problème classique que nous rencontrons dans plusieurs domaines. Cela veut dire qu’un trigger ne détecte que les anomalies et les incidents dont il est désigné, et si une nouvelle condition surgit suite à une maintenance, changement de pièce, etc. Le trigger est incapable s’adapter à cette nouvelle condition, et il va labéliser toute cette nouvelle condition comme étant une anomalie. D’autres problèmes et contraintes seront cités progressivement dans les chapitres qui suivent. L’objectif principal de notre travail est de détecter les anomalies et les changements dans les données de capteurs, afin d’améliorer le system de surveillance de santé d’avion connu sous le nom Aircraft Health Monitoring(AHM). Ce travail est basé principalement sur une analyse à deux étapes, Une analyse unie varie dans un contexte non supervisé, qui nous permettra de se focaliser sur le comportement de chaque capteur indépendamment, et de détecter les différentes anomalies et changements pour chaque capteur. Puis une analyse multi-variée qui nous permettra de filtrer certaines anomalies détectées (fausses alarmes) dans la première analyse et de détecter des groupes de comportement suspects. La méthode est testée sur des données réelles et synthétiques, où les résultats, l’identification et la validation des anomalies sont discutées dans cette thèse. / The limitation of the knowledge, technical, fundamental is a daily challenge for industries. The need to updates these knowledge are important for a competitive industry and also for an efficient reliability and maintainability of the systems. Actually, thanks to these machines and systems, the expansion of the data on quantity and frequency of generation is a real phenomenon. Within Airbus for example, and thanks to thousands of sensors, the aircrafts generate hundreds of megabytes of data per flight. These data are today exploited on the ground to improve safety and health monitoring system as a failure, incident and change detection. In theory, these changes, incident and failure are known as anomalies. An anomaly is known as deviation form a normal behavior of the data. Others define it as a behavior that do not conform the normal behavior. Whatever the definition, the anomaly detection process is very important for good functioning of the aircraft. Currently, the anomaly detection process is provided by several health monitoring equipments, one of these equipment is the Aircraft Health Monitoring System (ACMS), it records continuously the date of each sensor, and also monitor these sensors to detect anomalies and incident using triggers and predefined condition (exeedance approach). These predefined conditions are programmed by airlines and system designed according to a prior knowledge (physical, mechanical, etc.). However, several constraints limit the ACMS anomaly detection potential. We can mention, for example, the limitation the expert knowledge which is a classic problem in many domains, since the triggers are designed only to the targeted anomalies. Otherwise, the triggers do not cover all the system conditions. In other words, if a new behavior appears (new condition) in the sensor, after a maintenance action, parts changing, etc. the predefined conditions won't detect any thing and may be in many cases generated false alarms. Another constraint is that the triggers (predefined conditions) are static, they are unable to adapt their proprieties to each new condition. Another limitation is discussed gradually in the future chapters. The principle of objective of this thesis is to detect anomalies and changes in the ACMS data. In order to improve the health monitoring function of the ACMS. The work is based principally on two stages, the univariate anomaly detection stage, where we use the unsupervised learning to process the univariate sensors, since we don’t have any a prior knowledge of the system, and no documentation or labeled classes are available. The univariate analysis focuses on each sensor independently. The second stage of the analysis is the multivariate anomaly detection, which is based on density clustering, where the objective is to filter the anomalies detected in the first stage (false alarms) and to detect suspected behaviours (group of anomalies). The anomalies detected in both univariate and multivariate can be potential triggers or can be used to update the existing triggers. Otherwise, we propose also a generic concept of anomaly detection based on univariate and multivariate anomaly detection. And finally a new concept of validation anomalies within airbus.
223

Aplicación de Data Science en la pequeña empresa, caso: Pollería Mister Pollo

Baldeón Maraví, Brian, Fukushima Castillo, Hugo Kenji, Ochante Quispe, Milagros Karina, Quevedo Trujillo, Haedly Victoria, Tejada Alarcón, Ernesto Rosendo 14 July 2021 (has links)
El presente trabajo tiene como finalidad aplicar los conocimientos y técnicas impartidas durante los tres cursos de Ciencia de Datos. Específicamente identificar y utilizar las variables encontradas en el negocio para determinar un modelo que permita una mayor permanencia del personal en la empresa Mister Pollo. En ese contexto, la investigación se apoyará en la metodología de ciencia de datos de IBM, la cual inicia con la fase de comprensión del negocio para identificar el problema de la organización, analizando sus fortalezas y debilidades; así como la fase de recopilación y preparación de los datos, análisis, interpretación, modelado y evaluación de la data. Asimismo, el tipo de investigación que se emplea es mixto, pues en la fase inicial tiene un enfoque descriptivo que permite entender la importancia de las variables utilizadas. En la segunda fase, el enfoque se vuelve predictivo gracias a la utilización de una técnica de aprendizaje supervisado, en este caso, el modelo de árbol de decisión para la determinación de una herramienta que permita evaluar la mayor permanencia de trabajadores en el restaurante. Esto permitirá que el Gerente General de la empresa pueda elaborar un plan de acción para poder controlar y minimizar la rotación del personal, considerando diferentes escenarios, perfiles y necesidades de la empresa. Finalmente, en la conclusión de este proyecto se evaluarán los hallazgos en el modelo seleccionado para verificar que responden a los objetivos planteados por el Gerente de la empresa Míster Pollo en coordinación con el equipo de trabajo. / The purpose of this work is to apply the knowledge and techniques taught during the three Data Science courses. Specifically, to identify and use the variables found in the business to determine a model that allows a greater permanence of the personnel in the company Mister Pollo. In this context, the research will be supported by IBM's data science methodology, which begins with the phase of understanding the business to identify the organization's problem, analyzing its strengths and weaknesses; as well as the phase of data collection and preparation, analysis, interpretation, modeling and evaluation of the data. Likewise, the type of research used is mixed, since in the initial phase it has a descriptive approach that allows understanding the importance of the variables used. In the second phase, the approach becomes predictive thanks to the use of a supervised learning technique, in this case, the decision tree model for the determination of a tool to evaluate the greater permanence of workers in the restaurant. This will allow the General Manager of the company to develop an action plan to control and minimize staff turnover, considering different scenarios, profiles and needs of the company. Finally, at the conclusion of this project, the findings of the selected model will be evaluated to verify that they respond to the objectives set by the Manager of the company Míster Pollo in coordination with the work team. / Trabajo de investigación
224

Aplicación de ciencia de datos para identificar los segmentos de clientes de Grupo Deltron / Data science application to identify Grupo Deltron customer segments

Arias Aybar, Italo Daniel, Cueva Angulo, Ricardo Baltazar, Llanos Donayre, Giovani Martin, Pipa Ayala, Evelin, Valdez Brizuela, Ursula Alexandra 16 July 2021 (has links)
El presente trabajo de investigación que se muestra a continuación comprende el análisis y ejecución de la problemática encontrada en la empresa Grupo Deltron, el cual ha reportado un descenso de ventas en unidades de su línea de notebooks. Para el desarrollo del trabajo se ha implementado la metodología de la ciencia de datos, el cual permitió desarrollar el objetivo general , pues se logró identificar los conglomerados de clientes que mostraron un comportamiento importante producto de las ventas de la línea de notebooks en el 2020, estos datos obtenidos fueron brindados por un integrante del equipo, quien maneja información directa de la base de datos interna de la misma empresa de Grupo Deltron, facilitando la recopilación para el desarrollo del trabajo de investigación, con ello se identificaron 18 variables con 184 646 registros que fueron fundamentales para establecer el modelo de enfoque de análisis que se aplicara en el trabajo de manera descriptiva y prescriptiva. Asimismo, se aplicaron herramientas de visualización que permitieron obtener mayor claridad de los datos y definir las variables de indagación, obteniendo resultados consistentes y determinantes para la solución de la problemática inicial. / The present research work that is shown below comprises the analysis and execution of the problems found in the company Grupo Deltron, which has reported a decrease in sales in units of its line of notebooks. For the development of the work, the data science methodology has been implemented, which managed to develop the general objective, since it was possible to identify the clusters of clients that showed an important behavior due to the sales of the notebook line in 2020, These data were provided by a team member, who manages direct information from the internal database of the same Deltron Group company, facilitating the compilation for the development of the research work, with this, 18 variables were identified with 184646 records that were fundamental to establish the analysis approach model that will be applied in the work in a descriptive and prescriptive way. Likewise, visualization tools were applied that allowed obtaining greater clarity of the data and defining the inquiry variables, obtaining consistent and decisive results for the solution of the initial problem. / Trabajo de investigación
225

Aplicación de ciencia datos para pronosticar la compra de certificados de revisión técnica de vehículos de la empresa FORA / Data science application to forecast the purchase of technical inspection certificates for vehicles from the FORA company

Chávez Cajo, Eva Sofia, Sedano Chuquizuta, Melany Rosario 16 July 2021 (has links)
El objeto de estudio del presente trabajo es la empresa Fora S. A. C., empresa dedicada a revisiones técnicas, donde al realizar los análisis tanto externo como interno se logró identificar el problema esencial de la compañía. De esta manera, se diagnosticó que existía un inadecuado método en solicitar cierta cantidad de certificados vehiculares al momento de realizar las compras por parte del área Logística. Por esta razón, esta investigación se basa en identificación de las variables que influyen en la compra de estos certificados y en emplear la mejor técnica de ciencia de datos que permita anticipar la compra y disminuir las consecuencias negativas en el inventario y en lo económico de la empresa. Para lo cual, se empleó la metodología de ciencia de datos mediante un análisis el explicativo y exploratorio; asimismo, la creación de la base de datos utilizada en el trabajo se empleó a través de fuentes internas y externas a la empresa, lo que permite responder a las preguntas de Data Science formuladas en el proyecto. También, es importante mencionar que se realizó un análisis estadístico cuantitativo, esto con la finalidad de encontrar las correlaciones entre todas las variables. De las cuales, se lograron hallar notables descubrimientos que sin duda aportarán valor a la organización y permitirá que el área logre los objetivos, debido a que se proponen diversas soluciones y sugerencias que se deberán tomar para emplear de mejor manera la decisión de compra y reducir al mínimo las consecuencias que afectan a la compañía. / The object of study of this work is the company Fora S. A. C., a company dedicated to technical reviews, where by performing both external and internal analyzes it was possible to identify the essential problem of the company. In this way, it was diagnosed that there was an inadequate method in requesting a certain amount of vehicle certificates at the time of making purchases by the Logistics area. For this reason, this research is based on the identification of the variables that influence the purchase of these certificates and on using the best data science technique that allows anticipating the purchase and reducing the negative consequences on the inventory and on the economy of the company. business. For which, the data science methodology was used through an explanatory and exploratory analysis; Likewise, the creation of the database used in the work was used through internal and external sources to the company, which allows answering the Data Science questions formulated in the project. Also, it is important to mention that a quantitative statistical analysis was carried out in order to find the correlations between all the variables. Of which, notable discoveries were found that will undoubtedly add value to the organization and allow the area to achieve its objectives, because various solutions and suggestions are proposed that should be taken to better use the purchase decision and reduce the consequences that affect the company to a minimum. / Trabajo de investigación
226

Short-Term electricity consumption prediction: Elområde 4, Sweden

Kothapalli, Anil Kumar January 2021 (has links)
This Thesis work is part of course work for the Masters Program in Data Science at LTU.  The focus of this work is mainly to review the literature published to identify state-of-art methodologies applied to predict short-term electricity consumption. This includes the exploration of features and models as well-as the discussion of the results attained. Identify opportunities to improve the forecast results for southern Elområde(bidding area)4, Sweden. The application of different modern methods to forecast electricity consumption has been studied and experimented with. This work adapted the CRISP-DM, a Data Science methodology.
227

Análisis de las características que identifica a un usuario de practisis premium: variables que deciden para convertirse de una cuenta freemium a premium

Bocangel Carbajal, Jose Luis, Chavarria Contreras, Jonathan, Murrugarra Ocharan, Eric Gonzalo, Quispe Vivanco, Cesar Manuel 13 December 2020 (has links)
El presente trabajo de investigación constituye el análisis de la problemática planteada por la empresa Practisis, el cual se refiere a un déficit de migración de cuentas Freemium a Premium de usuarios del software Dora en el 2020 en Perú, a pesar de que esta empresa inició sus operaciones en enero 2019, la migración de cuentas Freemium a premium es muy baja y afecta directamente a las ventas. Para el desarrollo de este trabajo se ha utilizado la metodología de ciencia de datos de IBM, logrando identificar las variables que influyen en la migración de cuentas Freemium a Premium, las cuales afectan directamente a las ventas para el año 2020. La base de datos se obtuvo directamente desde la plataforma de la empresa, la cual captura a los clientes y lleva el control de cada cuenta. Con estos datos se pudieron identificar 15 variables y 1348 registros. Para ello, se utilizó el modelo de árbol de decisión como técnica de aprendizaje automático supervisado, el cual nos ayudó a identificar, según las variables utilizadas, el requisito para que un cliente convierta su cuenta Freemium a Premium. Finalmente, para el resultado de este análisis, se elaboraron diversos gráficos y tablas, así como también un análisis en Python como respuesta para sustentar la problemática de Practisis. / The present research paper constitutes the analysis of the problem posed by the company Practisis, which refers to a deficit of migration of Freemium accounts to Premium of users of Dora software in 2020 in Peru, although this company started operations in January 2019, the migration of Freemium accounts to premium is very low and directly affects sales. For the development of this work, IBM’s data science methodology has been used, identifying the variables that influence the migration of Freemium accounts to Premium, which directly affect sales by 2020. The database was obtained directly from the company’s platform, which captures customers and controls each account. With these data, 15 variables and 1348 records could be identified. To do this, the decision tree model was used as a supervised machine learning technique, which helped us to identify, according to the variables used, the requirement for a customer to convert their Freemium account to Premium. Finalmente, para el resultado de este análisis, se elaboraron diversos gráficos y tablas, así como también un análisis en Python como respuesta para sustentar la problemática de Practisis. / Trabajo de investigación
228

Particulate Matter Matters

Meyer, Holger J., Gruner, Hannes, Waizenegger, Tim, Woltmann, Lucas, Hartmann, Claudio, Lehner, Wolfgang, Esmailoghli, Mahdi, Redyuk, Sergey, Martinez, Ricardo, Abedjan, Ziawasch, Ziehn, Ariane, Rabl, Tilmann, Markl, Volker, Schmitz, Christian, Serai, Dhiren Devinder, Gava, Tatiane Escobar 15 June 2023 (has links)
For the second time, the Data Science Challenge took place as part of the 18th symposium “Database Systems for Business, Technology and Web” (BTW) of the Gesellschaft für Informatik (GI). The Challenge was organized by the University of Rostock and sponsored by IBM and SAP. This year, the integration, analysis and visualization around the topic of particulate matter pollution was the focus of the challenge. After a preselection round, the accepted participants had one month to adapt their developed approach to a substantiated problem, the real challenge. The final presentation took place at BTW 2019 in front of the prize jury and the attending audience. In this article, we give a brief overview of the schedule and the organization of the Data Science Challenge. In addition, the problem to be solved and its solution will be presented by the participants.
229

New Spatio-temporal Hawkes Process Models For Social Good

Wen-Hao Chiang (12476658) 28 April 2022 (has links)
<p>As more and more datasets with self-exciting properties become available, the demand for robust models that capture contagion across events is also getting stronger. Hawkes processes stand out given their ability to capture a wide range of contagion and self-excitation patterns, including the transmission of infectious disease, earthquake aftershock distributions, near-repeat crime patterns, and overdose clusters. The Hawkes process is flexible in modeling these various applications through parametric and non-parametric kernels that model event dependencies in space, time and on networks.</p> <p>In this thesis, we develop new frameworks that integrate Hawkes Process models with multi-armed bandit algorithms, high dimensional marks, and high-dimensional auxiliary data to solve problems in search and rescue, forecasting infectious disease, and early detection of overdose spikes.</p> <p>In Chapter 3, we develop a method applications to the crisis of increasing overdose mortality over the last decade.  We first encode the molecular substructures found in a drug overdose toxicology report. We then cluster these overdose encodings into different overdose categories and model these categories with spatio-temporal multivariate Hawkes processes. Our results demonstrate that the proposed methodology can improve estimation of the magnitude of an overdose spike based on the substances found in an initial overdose. </p> <p>In Chapter 4, we build a framework for multi-armed bandit problems arising in event detection where the underlying process is self-exciting. We derive the expected number of events for Hawkes processes given a parametric model for the intensity and then analyze the regret bound of a Hawkes process UCB-normal algorithm. By introducing the Hawkes Processes modeling into the upper confidence bound construction, our models can detect more events of interest under the multi-armed bandit problem setting. We apply the Hawkes bandit model to spatio-temporal data on crime events and earthquake aftershocks. We show that the model can quickly learn to detect hotspot regions, when events are unobserved, while striking a balance between exploitation and exploration. </p> <p>In Chapter 5, we present a new spatio-temporal framework for integrating Hawkes processes with multi-armed bandit algorithms. Compared to the methods proposed in Chapter 4, the upper confidence bound is constructed through Bayesian estimation of a spatial Hawkes process to balance the trade-off between exploiting and exploring geographic regions. The model is validated through simulated datasets and real-world datasets such as flooding events and improvised explosive devices (IEDs) attack records. The experimental results show that our model outperforms baseline spatial MAB algorithms through rewards and ranking metrics.</p> <p>In Chapter 6, we demonstrate that the Hawkes process is a powerful tool to model the infectious disease transmission. We develop models using Hawkes processes with spatial-temporal covariates to forecast COVID-19 transmission at the county level. In the proposed framework, we show how to estimate the dynamic reproduction number of the virus within an EM algorithm through a regression on Google mobility indices. We also include demographic covariates as spatial information to enhance the accuracy. Such an approach is tested on both short-term and long-term forecasting tasks. The results show that the Hawkes process outperforms several benchmark models published in a public forecast repository. The model also provides insights on important covariates and mobility that impact COVID-19 transmission in the U.S.</p> <p>Finally, in chapter 7, we discuss implications of the research and future research directions.</p>
230

Predicting Carbon Dioxide Levels and Occupancy with Machine Learning and Environmental Data

Datunaishvili, Giorgi, Khederchah, Christian, Li, Henrik, Kevin, Salazar January 2022 (has links)
Buildings consume the majority of the world’s energy usage through heating, ventilation and cooling. These elements are not regulated in an efficient and effective manner. Lights and heating are often left in action in empty spaces leading to waste. This project’s goal and purpose is to mitigate this wastefulness by implementing self powered environment sensors that can predict carbon dioxide levels and occupancy. These values can then be used to regulate spaces accordingly.  The approach chosen to find a solution to this problem was to use machine learning. Machine learning was used to generate a prediction model. Different methods and models were used such as Gaussian Process Regression and Tree algorithm. The most effective model for this particular case turned out to be Gaussian Process Regression. The model was built by using accumulate, a model was made to calculate carbon dioxide values through humidity, temperature and pressure where an accuracy above 90% was achieved. The model to calculate occupancy levels had significantly lower accuracy. The reason that the carbon dioxide model was a success and the occupancy model was not, is due to the small size of the data set used while training the model. Carbon dioxide values had a bigger variance between data points, while the occupancy dataset contained mostly ones and zeros. This concludes to a longer training period to achieve high accuracy and precision for the occupancy model. The model for carbon dioxide converges with fewer data points as the result of the data having higher variance.

Page generated in 0.0537 seconds