Spelling suggestions: "subject:"data driven"" "subject:"mata driven""
71 |
Data driven modelling for environmental water managementSyed, Mofazzal January 2007 (has links)
Management of water quality is generally based on physically-based equations or hypotheses describing the behaviour of water bodies. In recent years models built on the basis of the availability of larger amounts of collected data are gaining popularity. This modelling approach can be called data driven modelling. Observational data represent specific knowledge, whereas a hypothesis represents a generalization of this knowledge that implies and characterizes all such observational data. Traditionally deterministic numerical models have been used for predicting flow and water quality processes in inland and coastal basins. These models generally take a long time to run and cannot be used as on-line decision support tools, thereby enabling imminent threats to public health risk and flooding etc. to be predicted. In contrast, Data driven models are data intensive and there are some limitations in this approach. The extrapolation capability of data driven methods are a matter of conjecture. Furthermore, the extensive data required for building a data driven model can be time and resource consuming or for the case predicting the impact of a future development then the data is unlikely to exist. The main objective of the study was to develop an integrated approach for rapid prediction of bathing water quality in estuarine and coastal waters. Faecal Coliforms (FC) were used as a water quality indicator and two of the most popular data mining techniques, namely, Genetic Programming (GP) and Artificial Neural Networks (ANNs) were used to predict the FC levels in a pilot basin. In order to provide enough data for training and testing the neural networks, a calibrated hydrodynamic and water quality model was used to generate input data for the neural networks. A novel non-linear data analysis technique, called the Gamma Test, was used to determine the data noise level and the number of data points required for developing smooth neural network models. Details are given of the data driven models, numerical models and the Gamma Test. Details are also given of a series experiments being undertaken to test data driven model performance for a different number of input parameters and time lags. The response time of the receiving water quality to the input boundary conditions obtained from the hydrodynamic model has been shown to be a useful knowledge for developing accurate and efficient neural networks. It is known that a natural phenomenon like bacterial decay is affected by a whole host of parameters which can not be captured accurately using solely the deterministic models. Therefore, the data-driven approach has been investigated using field survey data collected in Cardiff Bay to investigate the relationship between bacterial decay and other parameters. Both of the GP and ANN models gave similar, if not better, predictions of the field data in comparison with the deterministic model, with the added benefit of almost instant prediction of the bacterial levels for this recreational water body. The models have also been investigated using idealised and controlled laboratory data for the velocity distributions along compound channel reaches with idealised rods have located on the floodplain to replicate large vegetation (such as mangrove trees).
|
72 |
DDD metodologija paremto projektavimo įrankio kodo generatoriaus kūrimas ir tyrimas / DDD methodology based design tool‘s code generator development and researchValinčius, Kęstutis 13 August 2010 (has links)
Data Driven Design metodologija plačiai naudojama įvairiose programinėse sistemose. Šios metodologijos tikslas – atskirti bei lygiagretinti programuotojų ir projektuotojų veiklą. Sistemos branduolio funkcionalumas yra įgyvendinamas sąsajomis, o dinamika – scenarijų pagalba. Taip įvedamas abstrakcijos lygmuo, kurio dėka programinis produktas tampa lankstesnis, paprasčiau palaikomas ir tobulinamas, be to šiuos veiksmus galima atlikti lygiagrečiai.
Darbo tikslas buvo sukurti automatinį kodo generatorių, kuris transformuotų grafiškai sumodeliuotą scenarijų į programinį kodą. Generuojant programinį kodą automatiškai ženkliai sumažėja sintaksinių bei loginių klaidų tikimybė, viskas priklauso nuo sumodeliuoto scenarijaus. Kodas sugeneruojamas labai greitai ir visiškai nereikalingas programuotojo įsikišimas. Šis tikslas pasiektas iškėlus biznio logikos projektavimą į scenarijaus projektavimą, o kodo generavimo posistemę realizavus žiniatinklio paslaugos principu. Kodas generuojamas neprisirišant prie konkrečios architektūros, technologijos ar taikymo srities panaudojant įskiepių sistemą . Grafiniame scenarijų kūrimo įrankyje sumodeliuojamas scenarijus ir tada transformuojamas į metakalbą , iš kurios ir generuojamas galutinis programinis kodas. Metakalba – tam tikromis taisyklėmis apibrėžta „XML “ kalba.
Realizavus eksperimentinę sistemą su didelėmis problemomis nebuvo susidurta. Naujos sistemos modeliavimas projektavimo įrankiu paspartino kūrimo procesą septynis kartus. Tai įrodo... [toliau žr. visą tekstą] / Data Driven Design methodology is widely used in various program systems. This methodology aim is to distinguish and parallel software developer and scenario designer’s work. Core functionality is implemented via interfaces and dynamics via scenario support. This introduces a level of abstraction, which makes software product more flexible easily maintained and improved, in addition these actions can be performed in parallel. The main aim of this work was to create automatic code generator that transforms graphically modeled scenario to software code. Automatically generated software code restricts probability of syntactic and logical errors, all depends on scenario modeling. Code is generated instantly and no need software developer interference. This aim is achieved by moving business logic designing to scenario designing process and code generator service making as a “Web service”. Using cartridge based system code is generated not attached to a specific architecture, technology or application domain. In graphical scenario modeling tool scenario is modeled and transformed to metalanguage, from which software code is generated. Metalanguage – with specific rules defined “XML” language. Experimental system was developed with no major problems. New project modeling with our modeling tool speeded the development process by seven times. This proves modeling tool advantage over manual programming.
|
73 |
A Spreadsheet Model for Using Web Services and Creating Data-Driven ApplicationsChang, Kerry Shih-Ping 01 April 2016 (has links)
Web services have made many kinds of data and computing services available. However, to use web services often requires significant programming efforts and thus limits the people who can take advantage of them to only a small group of skilled programmers. In this dissertation, I will present a tool called Gneiss that extends the spreadsheet model to support four challenging aspects of using web services: programming two-way data communications with web services, creating interactive GUI applications that use web data sources, using hierarchical data, and using live streaming data. Gneiss contributes innovations in spreadsheet languages, spreadsheet user interfaces and interaction techniques to allow programming tasks that currently require writing complex, lengthy code to instead be done using familiar spreadsheet mechanisms. Spreadsheets are arguably the most successful and popular data tools among people of all programming levels. This work advances the use of spreadsheets to new domains and could benefit a wide range of users from professional programmers to end-user programmers.
|
74 |
Data-driven approaches to load modeling andmonitoring in smart energy systemsTang, Guoming 23 January 2017 (has links)
In smart energy systems, load curve refers to the time series reported by smart meters, which indicate the energy consumption of customers over a certain period of time. The widespread use of load curve (data) in demand side management and demand response programs makes it one of the most important resources. To capture the load behavior or energy consumption patterns, load curve modeling is widely applied to help the utilities and residents make better plans and decisions. In this dissertation, with the help of load curve modeling, we focus on data-driven solutions to three load monitoring problems in different scenarios of smart energy systems, including residential power systems and datacenter power systems and covering the research fields of i) data cleansing, ii) energy disaggregation, and iii) fine-grained power monitoring.
First, to improve the data quality for load curve modeling on the supply side, we challenge the regression-based approaches as an efficient way to load curve data cleansing and propose a new approach to analyzing and organizing load curve data. Our approach adopts a new view, termed portrait, on the load curve data by analyzing the inherent periodic patterns and re-organizing the data for ease of analysis. Furthermore, we introduce strategies to build virtual portrait datasets and demonstrate how this technique can be used for outlier detection in load curve. To identify the corrupted load curve data, we propose an appliance-driven approach that particularly takes advantage of information available on the demand side. It identifies corrupted data from the smart meter readings by solving a carefully-designed optimization problem. To solve the problem efficiently, we further develop a sequential local optimization algorithm that tackles the original NP-hard problem by solving an approximate problem in polynomial time.
Second, to separate the aggregated energy consumption of a residential house into that of individual appliances, we propose a practical and universal energy disaggregation solution, only referring to the readily available information of appliances. Based on the sparsity of appliances' switching events, we first build a sparse switching event recovering (SSER) model. Then, by making use of the active epochs of switching events, we develop an efficient parallel local optimization algorithm to solve our model and obtain individual appliances' energy consumption. To explore the benefit of introducing low-cost energy meters for energy disaggregation, we propose a semi-intrusive appliance load monitoring (SIALM) approach for large-scale appliances situation. Instead of using only one meter, multiple meters are distributed in the power network to collect the aggregated load data from sub-groups of appliances. The proposed SSER model and parallel optimization algorithm are used for energy disaggregation within each sub-group of appliances. We further provide the sufficient conditions for unambiguous state recovery of multiple appliances, under which a minimum number of meters is obtained via a greedy clique-covering algorithm.
Third, to achieve fine-grained power monitoring at server level in legacy datacenters, we present a zero-cost, purely software-based solution. With our solution, no power monitoring hardware is needed any more, leading to much reduced operating cost and hardware complexity. In detail, we establish power mapping functions (PMFs) between the states of servers and their power consumption, and infer the power consumption of each server with the aggregated power of the entire datacenter. We implement and evaluate our solution over a real-world datacenter with 326 servers. The results show that our solution can provide high precision power estimation at both the rack level and the server level. In specific, with PMFs including only two nonlinear terms, our power estimation i) at the rack level has mean relative error of 2.18%, and ii) at the server level has mean relative errors of 9.61% and 7.53% corresponding to the idle and peak power, respectively. / Graduate / 0984 / 0791 / 0800 / tangguo1999@gmail.com
|
75 |
A Graphical Analysis of Simultaneously Choosing the Bandwidth and Mixing Parameter for Semiparametric Regression TechniquesRivers, Derick L. 31 July 2009 (has links)
There has been extensive research done in the area of Semiparametric Regression. These techniques deliver substantial improvements over previously developed methods, such as Ordinary Least Squares and Kernel Regression. Two of these hybrid techniques: Model Robust Regression 1 (MRR1) and Model Robust Regression 2 (MRR2) require the choice of an appropriate bandwidth for smoothing and a mixing parameter that allows a portion of a nonparametric fit to be used in fitting a model that may be misspecifed by other regression methods. The current method of choosing the bandwidth and mixing parameter does not guarantee the optimal choices in either case. The immediate objective of the current work is to address this process of choosing the optimal bandwidth and mixing parameter and to examine the behavior of these estimates using 3D plots. The 3D plots allow us to examine how the semiparametric techniques: MRR1 and MRR2, behave for the optimal (AVEMSE) selection process when compared to data-driven selectors, such as PRESS* and PRESS**. It was found that the structure of MRR2 behaved consistently under all conditions. MRR2 displayed a wider range of "acceptable" values for the choice of bandwidth as opposed to a much more limited choice when using MRR1. These results provide general support for earlier fndings by Mays et al. (2000).
|
76 |
Data Driven Visual RecognitionAghazadeh, Omid January 2014 (has links)
This thesis is mostly about supervised visual recognition problems. Based on a general definition of categories, the contents are divided into two parts: one which models categories and one which is not category based. We are interested in data driven solutions for both kinds of problems. In the category-free part, we study novelty detection in temporal and spatial domains as a category-free recognition problem. Using data driven models, we demonstrate that based on a few reference exemplars, our methods are able to detect novelties in ego-motions of people, and changes in the static environments surrounding them. In the category level part, we study object recognition. We consider both object category classification and localization, and propose scalable data driven approaches for both problems. A mixture of parametric classifiers, initialized with a sophisticated clustering of the training data, is demonstrated to adapt to the data better than various baselines such as the same model initialized with less subtly designed procedures. A nonparametric large margin classifier is introduced and demonstrated to have a multitude of advantages in comparison to its competitors: better training and testing time costs, the ability to make use of indefinite/invariant and deformable similarity measures, and adaptive complexity are the main features of the proposed model. We also propose a rather realistic model of recognition problems, which quantifies the interplay between representations, classifiers, and recognition performances. Based on data-describing measures which are aggregates of pairwise similarities of the training data, our model characterizes and describes the distributions of training exemplars. The measures are shown to capture many aspects of the difficulty of categorization problems and correlate significantly to the observed recognition performances. Utilizing these measures, the model predicts the performance of particular classifiers on distributions similar to the training data. These predictions, when compared to the test performance of the classifiers on the test sets, are reasonably accurate. We discuss various aspects of visual recognition problems: what is the interplay between representations and classification tasks, how can different models better adapt to the training data, etc. We describe and analyze the aforementioned methods that are designed to tackle different visual recognition problems, but share one common characteristic: being data driven. / <p>QC 20140604</p>
|
77 |
A robust and reliable data-driven prognostics approach based on Extreme Learning Machine and Fuzzy Clustering / Une approche robuste et fiable de pronostic guidé par les données robustes et basée sur l'apprentissage automatique extrême et la classification floueJaved, kamran 09 April 2014 (has links)
Le pronostic industriel vise à étendre le cycle de vie d’un dispositif physique, tout en réduisant les couts d’exploitation et de maintenance. Pour cette raison, le pronostic est considéré comme un processus clé avec des capacités de prédiction. En effet, des estimations précises de la durée de vie avant défaillance d’un équipement, Remaining Useful Life (RUL), permettent de mieux définir un plan d’action visant à accroitre la sécurité, réduire les temps d’arrêt, assurer l’achèvement de la mission et l’efficacité de la production.Des études récentes montrent que les approches guidées par les données sont de plus en plus appliquées pour le pronostic de défaillance. Elles peuvent être considérées comme des modèles de type boite noire pour l’ étude du comportement du système directement `a partir des données de surveillance d’ état, pour définir l’ état actuel du système et prédire la progression future de défauts. Cependant, l’approximation du comportement des machines critiques est une tâche difficile qui peut entraîner des mauvais pronostic. Pour la compréhension de la modélisation du pronostic guidé par les données, on considère les points suivants. 1) Comment traiter les données brutes de surveillance pour obtenir des caractéristiques appropriées reflétant l’ évolution de la dégradation? 2) Comment distinguer les états de dégradation et définir des critères de défaillance (qui peuvent varier d’un cas `a un autre)? 3) Comment être sûr que les modèles définis seront assez robustes pour montrer une performance stable avec des entrées incertaines s’ écartant des expériences acquises, et seront suffisamment fiables pour intégrer des données inconnues (c’est `a dire les conditions de fonctionnement, les variations de l’ingénierie, etc.)? 4) Comment réaliser facilement une intégration sous des contraintes et des exigence industrielles? Ces questions sont des problèmes abordés dans cette thèse. Elles ont conduit à développer une nouvelle approche allant au-delà des limites des méthodes classiques de pronostic guidé par les données. / Prognostics and Health Management (PHM) aims at extending the life cycle of engineerin gassets, while reducing exploitation and maintenance costs. For this reason,prognostics is considered as a key process with future capabilities. Indeed, accurateestimates of the Remaining Useful Life (RUL) of an equipment enable defining furtherplan of actions to increase safety, minimize downtime, ensure mission completion andefficient production.Recent advances show that data-driven approaches (mainly based on machine learningmethods) are increasingly applied for fault prognostics. They can be seen as black-boxmodels that learn system behavior directly from Condition Monitoring (CM) data, usethat knowledge to infer its current state and predict future progression of failure. However,approximating the behavior of critical machinery is a challenging task that canresult in poor prognostics. As for understanding, some issues of data-driven prognosticsmodeling are highlighted as follows. 1) How to effectively process raw monitoringdata to obtain suitable features that clearly reflect evolution of degradation? 2) Howto discriminate degradation states and define failure criteria (that can vary from caseto case)? 3) How to be sure that learned-models will be robust enough to show steadyperformance over uncertain inputs that deviate from learned experiences, and to bereliable enough to encounter unknown data (i.e., operating conditions, engineering variations,etc.)? 4) How to achieve ease of application under industrial constraints andrequirements? Such issues constitute the problems addressed in this thesis and have ledto develop a novel approach beyond conventional methods of data-driven prognostics.
|
78 |
Revenue Generation in Data-driven Healthcare : An exploratory study of how big data solutions can be integrated into the Swedish healthcare systemJonsson, Hanna, Mazomba, Luyolo January 2019 (has links)
Abstract The purpose of this study is to investigate how big data solutions in the Swedish healthcare system can generate a revenue. As technology continues to evolve, the use of big data is beginning to transform processes in many different industries, making them more efficient and effective. The opportunities presented by big data have been researched to a large extent in commercial fields, however, research in the use of big data in healthcare is scarce and this is particularly true in the case of Sweden. Furthermore, there is a lack in research that explores the interface between big data, healthcare and revenue models. The interface between these three fields of research is important as innovation and the integration of big data in healthcare could be affected by the ability of companies to generate a revenue from developing such innovations or solutions. Thus, this thesis aims to fill this gap in research and contribute to the limited body of knowledge that exists on this topic. The study conducted in this thesis was done via qualitative methods, in which a literature search was done and interviews were conducted with individuals who hold managerial positions at Region Västerbotten. The purpose of conducting these interviews was to establish a better understanding of the Swedish healthcare system and how its structure has influenced the use, or lack thereof, of big data in the healthcare delivery process, as well as, how this structure enables the generation of revenue through big data solutions. The data collected was analysed using the grounded theory approach which includes the coding and thematising of the empirical data in order to identify the key areas of discussion. The findings revealed that the current state of the Swedish healthcare system does not present an environment in which big data solutions that have been developed for the system can thrive and generate a revenue. However, if action is taken to make some changes to the current state of the system, then revenue generation may be possible in the future. The findings from the data also identified key barriers that need to be overcome in order to increase the integration of big data into the healthcare system. These barriers included the (i) lack of big data knowledge and expertise, (ii) data protection regulations, (iii) national budget allocation and the (iv) lack of structured data. Through collaborative work between actors in both the public and private sectors, these barriers can be overcome and Sweden could be on its way to transforming its healthcare system with the use of big data solutions, thus, improving the quality of care provided to its citizens. Key words: big data, healthcare, Swedish healthcare system, AI, revenue models, data-driven revenue models
|
79 |
Practice-driven solutions for inventory management problems in data-scarce environmentsWang, Le 03 June 2019 (has links)
Many firms are challenged to make inventory decisions with limited data, and high customer service level requirements. This thesis focuses on heuristic solutions for inventory management problems in data-scarce environments, employing rigorous mathematical frameworks and taking advantage of the information that is available in practice but often ignored in literature. We define a class of inventory models and solutions with demonstrable value in helping firms solve these challenges.
|
80 |
Self-Service Business Intelligence : En studie om vilka grundläggande kunskaper en slutanvändare bör inneha vid användningen av SSBI / Self-Service Business Intelligence : A study of which basic knowledge end users should include for the use of SSBIJohansson, Linus January 2019 (has links)
Eftersom dagens affärsklimat ständigt utvecklas i och med utökad konkurrens behöver organisationer fatta beslut som är baserade på data i ett tidigt skede. Business Intelligence (BI) tillhandahåller beslutsfattare inom organisationer snabb och riktig information som kan användas som beslutsstöd. I och med att BI’s omfattning gått från enstaka avdelningar till att beröra hela organisationer sätter det stor press på experter inom IT-avdelningar.Det bidrar till att slutanvändare behöver en miljö som ger dem direkt åtkomst till data för egna analyser och beslut. Den miljön nås genom att implementera Self-Service Business Intelligence (SSBI). Det SSBI gör är att det effektiviserar processen för beslut. När SSBI implementeras kräver det att de slutanvändare som berörs av det behöver utöka sina kunskaper för att utnyttja potentialen vilket SSBI medför. För nuvarande saknas forskning kring vilka kunskaper slutanvändare behöver inneha vilket har bidragit till att följande frågeställning kommer att undersökas i studien:➢ Vilka grundläggande kunskaper bör en slutanvändare inneha vid användningen av Self-Service Business Intelligence?Studien grundas i en litteraturgranskning och en fallstudie där intervjuer av sex respondenter, vilka förfogar över god kunskap kring SSBI, använts för datainsamling. Resultatet framställer fyra grundkunskaper vilka slutanvändare bör inneha för att öka möjligheten att börja använda SSBI på ett mer framgångsrikt sätt.
|
Page generated in 0.0659 seconds