• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 57
  • 14
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 92
  • 92
  • 92
  • 39
  • 37
  • 29
  • 27
  • 26
  • 24
  • 24
  • 17
  • 16
  • 16
  • 14
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Real-time forecasting of dietary habits and user health using Federated Learning with privacy guarantees

Horchidan, Sonia-Florina January 2020 (has links)
Modern health self-monitoring devices and applications, such as Fitbit and MyFitnessPal, empower users to take concrete actions and set fitness and lifestyle goals based on their recorded trends and statistics. Predicting such trends is beneficial in the road of achieving long-time targets, as the individuals can adjust their diets and habits at any point to guarantee success. The design and implementation of such a system, which also respects user privacy, is the main objective of our work.This application is modelled as a time-series forecasting problem. Given the historical data of users, we aim to predict their eating and lifestyle habits in real-time. We apply the federated learning paradigm to our use-case be- cause of the highly-distributed nature of our data and the privacy concerns of such sensitive recorded information. However, federated learning from het- erogeneous sequences of data can be challenging, as even state-of-the-art ma- chine learning techniques for time-series forecasting can encounter difficulties when learning from very irregular data sequences. Specifically, in the pro- posed healthcare scenario, the machine learning algorithms might fail to cater to users with unique dietary patterns.In this work, we implement a two-step streaming clustering mechanism and group clients that exhibit similar eating and fitness behaviours. The con- ducted experiments prove that learning federatively in this context can achieve very high prediction accuracy, as our predictions are no more than 0.025% far from the ground truth value with respect to the range of each feature. Training separate models for each group of users is shown to be beneficial, especially in terms of the training time, but it is highly dependent on the parameters used for the models and the training process. Our experiments conclude that the configuration used for the general federated model cannot be applied to the clusters of data. However, a decrease in prediction error of more than 45% can be achieved, given the parameters are optimized for each case.Lastly, this work tackles the problem of data privacy by applying state-of- the-art differential privacy techniques. Our empirical study shows that noising the gradients sent to the server is unsuitable for small datasets and cancels out the benefits obtained by prior users’ clustering. On the other hand, noising the training data achieves remarkable results, obtaining a differential privacy level corresponding to an epsilon value of 0.1 with an increase in the observed mean absolute error by a factor of only 0.21. / Moderna apparater och applikationer för självövervakning av hälsa, som Fitbit och MyFitnessPal, ger användarna möjlighet att vidta konkreta åtgärder och sätta fitness- och livsstilsmål baserat på deras dokumenterade trender och statistik. Att förutsäga sådana trender är fördelaktigt för att uppnå långtidsmål, eftersom individerna kan anpassa sina dieter och vanor när som helst för att garantera framgång.Utformningen och implementeringen av ett sådant system, som dessutom respekterar användarnas integritet, är huvudmålet för vårt arbete. Denna appli- kation är modellerad som ett tidsserieprognosproblem. Med avseende på an- vändarnas historiska data är målet att förutsäga deras matvanor och livsstilsva- nor i realtid. Vi tillämpar det federerade inlärningsparadigmet på vårt använd- ningsfall på grund av den mycket distribuerade karaktären av vår data och in- tegritetsproblemen för sådan känslig bokförd information. Federerade lärande från heterogena datasekvenser kan emellertid vara utmanande, eftersom även de modernaste maskininlärningstekniker för tidsserieprognoser kan stöta på svårigheter när de lär sig från mycket oregelbundna datasekvenser. Specifikt i det föreslagna sjukvårdsscenariot kan maskininlärningsalgoritmerna misslyc- kas med att förse användare med unika dietmönster.I detta arbete implementerar vi en tvåstegsströmmande klustermekanism och grupperar användare som uppvisar liknande ät- och fitnessbeteenden. De genomförda experimenten visar att federerade lärande i detta sammanhang kan uppnå mycket hög nogrannhet i förutsägelse, eftersom våra förutsägelser in- te är mer än 0,025% ifrån det sanna värdet med avseende på intervallet för varje funktion. Träning av separata modeller för varje grupp användare visar sig vara fördelaktigt, särskilt gällande träningstiden, men det är mycket be- roende av parametrarna som används för modellerna och träningsprocessen. Våra experiment drar slutsatsen att konfigurationen som används för den all- männa federerade modellen inte kan tillämpas på dataklusterna. Dock kan en minskning av förutsägelsefel på mer än 45% uppnås, givet att parametrarna är optimerade för varje fall.Slutligen hanteras problemet med datasekretess genom att tillämpa bästa tillgängliga differentiell integritetsteknik. Vår empiriska studie visar att adde- ra brus till gradienter som skickas till servern är olämpliga för liten data och avbryter fördelarna med tidigare användares kluster. Däremot, genom att ad- dera brus till träningsdata uppnås anmärkningsvärda resultat. En differentierad integritetsnivå motsvarande ett epsilonvärde på 0,1 med en ökning av det ob- serverade genomsnittliga absoluta felet med en faktor på endast 0,21 erhölls.
72

Statistical And Spatial Approaches To Marina Master Plan For Turkey

Karanci, Ayse 01 February 2011 (has links) (PDF)
Turkey, with its climate, protected bays, cultural and environmental resources is an ideal place for yacht tourism. Subsequently, yacht tourism is increasing consistently. Yacht tourism can cause unmitigated development and environmental concerns when aiming to achieve tourist satisfaction. As the demand for yacht tourism intensifies, sustainable development strategies are needed to maximize natural, cultural and economic benefits. Integration of forecasts to the strategic planning is necessary for sustainable and use of the coastal resources. In this study two different quantitative forecasting techniques - Exponential smoothing and Auto-Regressive Integrated Moving Average (ARIMA) methods were used to estimate the demand for yacht berthing capacity demand till 2030 in Turkey. Based on environmental, socio-economic and geographic data and the opinions gathered from stakeholders such as marina operators, local communities and government officials an allocation model was developed for the successful allocation of the predicted demand seeking social and economical growth while preserving the coastal environment. AHP was used to identify and evaluate the development, social and environmental and geographic priorities. Aiming a dynamic plan which is responsive to both national and international developments in yacht tourism, potential investment areas were determined for the investments required to accommodate the future demand. This study provides a multi dimensioned point of view to planning problem and highlights the need for sustainable and dynamic planning at delicate and high demand areas such as coasts.
73

Υπολογιστική νοημοσύνη στην οικονομία και τη θεωρία παιγνίων

Παυλίδης, Νίκος 09 October 2008 (has links)
Η διατριβή πραγματεύεται το αντικείμενο της Υπολογιστικής Νοημοσύνης στην Οικονομική και Χρηματοοικονομική επιστήμη. Στο πρώτο μέρος της διατριβής αναπτύσσονται μέθοδοι ομαδοποίησης και υπολογιστικής νοημοσύνης για τη μοντελοποίηση και πρόβλεψη χρονολογικών σειρών ημερησίων συναλλαγματικών ισοτιμιών. Η προτεινόμενη μεθοδολογία κατασκευάζει τοπικούς προσέγγιστές, με τη μορφή νευρωνικών δικτύων, για ομάδες προτύπων στο χώρο εισόδων που αναγνωρίζονται από μη-επιβλεπόμενους αλγόριθμους ομαδοποίησης. Στη συνέχεια κατασκευάζονται τεχνικοί κανόνες συναλλαγών απευθείας από τα δεδομένα με τη χρήση γενετικού προγραμματισμού. Η επίδοση των νέων κανόνων συγκρίνεται με αυτή των γενικευμένων κανόνων κινητού μέσου. Το δεύτερο μέρος της διατριβής πραγματεύεται την εφαρμογή εξελικτικών αλγορίθμων για τον υπολογισμό και την εκτίμηση του πλήθους σημείων ισορροπίας σε προβλήματα από τη θεωρία παιγνίων και τη νέα οικονομική γεωγραφία. Πιο συγκεκριμένα, αξιολογείται η ικανότητα των εξελικτικών αλγορίθμων να εντοπίσουν σημεία ισορροπίας κατά Nash σε πεπερασμένα στρατηγικά παίγνια και προτείνονται τεχνικές για τον εντοπισμό περισσοτέρων του ενός σημείων ισορροπίας. Τέλος εφαρμόζονται κριτήρια από τη θεωρία υπολογισμού σταθερών σημείων και τη θεωρία τοπολογικού βαθμού για τη διερεύνηση της ύπαρξης και της υπολογιστικής πολυπλοκότητας του υπολογισμού βραχυχρόνιων σημείων ισορροπίας σε μοντέλα νέας οικονομικής γεωγραφίας. / The thesis investigates Computational Intelligence methods in Economics and Finance. The first part of the thesis is devoted to computational intelligence methods and unsupervised clustering methods for modeling and forecasting daily exchange rate time series. A methodology is proposed that relies on local approximation, using artificial neural networks, for subregions of the input space that are identified through unsupervised clustering algorithms. Furthermore, we employ genetic programming to construct novel trading rules directly from the data. The performance of the novel rules is compared to that of generalised moving average rules. In the second part of the thesis we employ evolutionary algorithms to compute and to estimate the number of equilibria in finite strategic games and new economic geography models. In particular, we investigate the capability of evolutionary and swarm intelligence algorithms to compute Nash equilibria and propose an approach for the computation of more than one equilibria. Finally we employ criteria from the theory on computation of fixed points and topological degree theory to investigate the existence and the computational complexity of computing short run equilibria in new economic geography models.
74

Réseaux de neurones, SVM et approches locales pour la prévision de séries temporelles / No available

Cherif, Aymen 16 July 2013 (has links)
La prévision des séries temporelles est un problème qui est traité depuis de nombreuses années. On y trouve des applications dans différents domaines tels que : la finance, la médecine, le transport, etc. Dans cette thèse, on s’est intéressé aux méthodes issues de l’apprentissage artificiel : les réseaux de neurones et les SVM. On s’est également intéressé à l’intérêt des méta-méthodes pour améliorer les performances des prédicteurs, notamment l’approche locale. Dans une optique de diviser pour régner, les approches locales effectuent le clustering des données avant d’affecter les prédicteurs aux sous ensembles obtenus. Nous présentons une modification dans l’algorithme d’apprentissage des réseaux de neurones récurrents afin de les adapter à cette approche. Nous proposons également deux nouvelles techniques de clustering, la première basée sur les cartes de Kohonen et la seconde sur les arbres binaires. / Time series forecasting is a widely discussed issue for many years. Researchers from various disciplines have addressed it in several application areas : finance, medical, transportation, etc. In this thesis, we focused on machine learning methods : neural networks and SVM. We have also been interested in the meta-methods to push up the predictor performances, and more specifically the local models. In a divide and conquer strategy, the local models perform a clustering over the data sets before different predictors are affected into each obtained subset. We present in this thesis a new algorithm for recurrent neural networks to use them as local predictors. We also propose two novel clustering techniques suitable for local models. The first is based on Kohonen maps, and the second is based on binary trees.
75

Previsão de demanda no médio prazo utilizando redes neurais artificiais em sistemas de distribuição de energia elétrica

Medeiros , Romero Álamo Oliveira de 29 July 2016 (has links)
Submitted by Cristhiane Guerra (cristhiane.guerra@gmail.com) on 2017-01-26T14:55:17Z No. of bitstreams: 1 arquivototal.pdf: 2586746 bytes, checksum: 18b7b08875fbe9dc7bcecd5595b19734 (MD5) / Made available in DSpace on 2017-01-26T14:55:17Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 2586746 bytes, checksum: 18b7b08875fbe9dc7bcecd5595b19734 (MD5) Previous issue date: 2016-07-29 / The demand forecasting studies are of great importance for electricity companies, because there is a need to allocate their resources well in advance, requiring a medium and long- term p lanning. These resources can be the purchase of new equipment, the transmission line acquisition or construction, scheduled maintenance and the purchase and sale of energy. I n this work, a support tool has been developed for experts in strategic planning i n power distribution systems using artificial neural networks to demand forecasting. For the proposed method, it implemented a demand forecasting procedure in the medium term of the region fueled by three substations belonging to the power distribution sys tem managed by EnergisaPB, using a computer model based on Multilayer Perceptron (MLP) artificial neural networks with the assistance of Matlab ® environment. The database was structured by the measurements of active power from 2008 to 2014, provided by En ergisa/PB and the forecast achieved one year ahead (52 weeks) compared with the real data of 2014. In addition, it was possible to evaluate the performance of RNA and estimate the demand growth in the region supplied by each substation, which can assist th e distribution system expansion planning. / Os estudo s de previsão de demanda têm grande importância para empresa s da área de energia elétrica , pois, existe a necessidade de alocar seus recursos com uma certa antecedência , exigindo um planejamento a médio e longo prazo. D entre estes recursos , estão a compra de equipamentos, a aquisição e construção de linhas de transmissão, manutenções programadas e a compra e venda de energia. Nesta premissa, foi desenvolvida uma ferramenta de apoio aos especialistas na área de planejamento estratégico em sistemas de distrib uição de energia elétrica, utilizando redes neurais artificiais para previsão de demanda. Para o método proposto, foi implementado um procedimento de previsão de demanda no médio prazo da região alimentada por três subestações reais pertencentes ao sistema de distribuição de energia gerido pela concessionária Energisa- PB, utilizando um modelo computacional baseado em redes neurais artificiais (RNA) do tipo Multilayer Perceptron (MLP) com o auxílio do ambiente Matlab ® . Foram consideradas as informações reais (banco de dados) da potência ativa, para o período de 2008 até 2014, fornecidas pela própria concessionária e a previsão alcançou o horizonte de um ano a frente (52 semanas). A RNA foi treinada com os dados de 2008 a 2013, e o resultado, comparado com dad os do ano de 2014. Além disso, foi possível avaliar o desempenho da RNA sob diferentes aspectos (volume de treinamento, parâmetros, configurações, camadas ocultas, etc.) e estimar o crescimento de demanda da região alimentada por cada subestação, o que pod e auxiliar o planejamento de expansão do sistema de distribuição.
76

Otimização da função de fitness para a evolução de redes neurais com o uso de análise envoltória de dados aplicada à previsão de séries temporais

SILVA, David Augusto 01 July 2011 (has links)
Submitted by (ana.araujo@ufrpe.br) on 2016-06-28T16:05:18Z No. of bitstreams: 1 David Augusto Silva.pdf: 1453777 bytes, checksum: 4516b869e7e749b770a803eb7e91a084 (MD5) / Made available in DSpace on 2016-06-28T16:05:18Z (GMT). No. of bitstreams: 1 David Augusto Silva.pdf: 1453777 bytes, checksum: 4516b869e7e749b770a803eb7e91a084 (MD5) Previous issue date: 2011-07-01 / The techniques for Time Series Analysis and Forecasting have great presence on the literature over the years. The computational resources combined with statistical techniques are improving the predictive results, and these results have been become increasingly accurate. Computational methods base on Artificial Neural Networks (ANN) and Evolutionary Computing (EC) are presenting a new approach to solve the Time Series Analysis and Forecasting problem. These computational methods are contained in the branch of Artificial Intelligence (AI), and they are biologically inspired, where the ANN models are based on the neural structure of intelligent organism, and the EC uses the concept of nature selection of Charles Darwin. Both methods acquire experience from prior knowledge and example of the given problem. In particular, for the Time Series Forecasting Problem, the objective is to find the predictive model with highest forecast perfomance, where the performance measure are statistical errors. However, there is no universal criterion to identify the best performance measure. Since the ANNs are the predictive models, the EC will constantly evaluate the forecast performance of the ANNs, using a fitness functions to guide the predictive model for an optimal solution. The Data Envelopment Analysis (DEA) was employed to predictive determine the best combination of variables based on the relative efficiency of the best models. Therefore, this work to study the optimization Fitness Function process with Data Envelopment Analysis applied the Intelligence Hybrid System for time series forecasting problem. The data analyzed are composed by financial data series, agribusiness and natural phenomena. The C language program was employed for implementation of the hybrid intelligent system and the R Environment version 2.12 for analysis of DEA models. In general, the perspective of using DEA procedure to evaluate the fitness functions were satisfactory and serves as an additional resource in the branch of time series forecasting. Researchers need to compute the results under different perspectives, whether in the matter of the computational cost of implementing a particular function or which function was more efficient in the aspect of assessing which combinations are unwanted saving time and resources. / As técnicas de análise e previsão de séries temporais alcançaram uma posição de distinção na literatura ao longo dos anos. A utilização de recursos computacionais, combinada com técnicas estatísticas, apresenta resultados mais precisos quando comparados com os recursos separadamente. Em particular, técnicas que usam Redes Neurais Artificiais (RNA) e Computação Evolutiva (CE), apresenta uma posição de destaque na resolução de problemas de previsão na análise de séries temporais. Estas técnicas de Inteligência Artificial (AI) são inspiradas biologicamente, no qual o modelo de RNA é baseado na estrutura neural de organismos inteligentes, que adquirem conhecimento através da experiência. Para o problema de previsão em séries temporais, um fator importante para o maior desempenho na previsão é encontrar um método preditivo com a melhor acurácia possível, tanto quanto possível, no qual o desempenho do método pode ser analisado através de erros de previsão. Entretanto, não existe um critério universal para identificar qual a melhor medida de desempenho a ser utilizada para a caracterização da previsão. Uma vez que as RNAs são os modelos de previsão, a CE constantemente avaliará o desempenho de previsão das RNAs, usando uma função de fitness para guiar o modelo preditivo para uma solução ótima. Desejando verificar quais critérios seriam mais eficientes no momento de escolher o melhor modelo preditivo, a Análise Envoltória de Dados (DEA) é aplicada para fornecer a melhor combinação de variáveis visando a otimização do modelo. Portanto, nesta dissertação, foi estudado o processo de otimização de Funções de Fitness através do uso da Análise Envoltória de Dados utilizando-se de técnicas hibridas de Inteligência Artificial aplicadas a área de previsão de séries temporais. O banco de dados utilizado foi obtido de séries históricas econômico- financeiras, fenômenos naturais e agronegócios obtidos em diferentes órgãos específicos de cada área. Quanto à parte operacional, utilizou-se a linguagem de programação C para implementação do sistema híbrido inteligente e o ambiente R versão 2.12 para a análise dos modelos DEA. Em geral, a perspectiva do uso da DEA para avaliar as Funções de Fitness foi satisfatório e serve como recurso adicional na área de previsão de séries temporais. Cabe ao pesquisador, avaliar os resultados sob diferentes óticas, quer seja sob a questão do custo computacional de implementar uma determinada Função que foi mais eficiente ou sob o aspecto de avaliar quais combinações não são desejadas poupando tempo e recursos.
77

Machine learning strategies for multi-step-ahead time series forecasting

Ben Taieb, Souhaib 08 October 2014 (has links)
How much electricity is going to be consumed for the next 24 hours? What will be the temperature for the next three days? What will be the number of sales of a certain product for the next few months? Answering these questions often requires forecasting several future observations from a given sequence of historical observations, called a time series. <p><p>Historically, time series forecasting has been mainly studied in econometrics and statistics. In the last two decades, machine learning, a field that is concerned with the development of algorithms that can automatically learn from data, has become one of the most active areas of predictive modeling research. This success is largely due to the superior performance of machine learning prediction algorithms in many different applications as diverse as natural language processing, speech recognition and spam detection. However, there has been very little research at the intersection of time series forecasting and machine learning.<p><p>The goal of this dissertation is to narrow this gap by addressing the problem of multi-step-ahead time series forecasting from the perspective of machine learning. To that end, we propose a series of forecasting strategies based on machine learning algorithms.<p><p>Multi-step-ahead forecasts can be produced recursively by iterating a one-step-ahead model, or directly using a specific model for each horizon. As a first contribution, we conduct an in-depth study to compare recursive and direct forecasts generated with different learning algorithms for different data generating processes. More precisely, we decompose the multi-step mean squared forecast errors into the bias and variance components, and analyze their behavior over the forecast horizon for different time series lengths. The results and observations made in this study then guide us for the development of new forecasting strategies.<p><p>In particular, we find that choosing between recursive and direct forecasts is not an easy task since it involves a trade-off between bias and estimation variance that depends on many interacting factors, including the learning model, the underlying data generating process, the time series length and the forecast horizon. As a second contribution, we develop multi-stage forecasting strategies that do not treat the recursive and direct strategies as competitors, but seek to combine their best properties. More precisely, the multi-stage strategies generate recursive linear forecasts, and then adjust these forecasts by modeling the multi-step forecast residuals with direct nonlinear models at each horizon, called rectification models. We propose a first multi-stage strategy, that we called the rectify strategy, which estimates the rectification models using the nearest neighbors model. However, because recursive linear forecasts often need small adjustments with real-world time series, we also consider a second multi-stage strategy, called the boost strategy, that estimates the rectification models using gradient boosting algorithms that use so-called weak learners.<p><p>Generating multi-step forecasts using a different model at each horizon provides a large modeling flexibility. However, selecting these models independently can lead to irregularities in the forecasts that can contribute to increase the forecast variance. The problem is exacerbated with nonlinear machine learning models estimated from short time series. To address this issue, and as a third contribution, we introduce and analyze multi-horizon forecasting strategies that exploit the information contained in other horizons when learning the model for each horizon. In particular, to select the lag order and the hyperparameters of each model, multi-horizon strategies minimize forecast errors over multiple horizons rather than just the horizon of interest.<p><p>We compare all the proposed strategies with both the recursive and direct strategies. We first apply a bias and variance study, then we evaluate the different strategies using real-world time series from two past forecasting competitions. For the rectify strategy, in addition to avoiding the choice between recursive and direct forecasts, the results demonstrate that it has better, or at least has close performance to, the best of the recursive and direct forecasts in different settings. For the multi-horizon strategies, the results emphasize the decrease in variance compared to single-horizon strategies, especially with linear or weakly nonlinear data generating processes. Overall, we found that the accuracy of multi-step-ahead forecasts based on machine learning algorithms can be significantly improved if an appropriate forecasting strategy is used to select the model parameters and to generate the forecasts.<p><p>Lastly, as a fourth contribution, we have participated in the Load Forecasting track of the Global Energy Forecasting Competition 2012. The competition involved a hierarchical load forecasting problem where we were required to backcast and forecast hourly loads for a US utility with twenty geographical zones. Our team, TinTin, ranked fifth out of 105 participating teams, and we have been awarded an IEEE Power & Energy Society award.<p> / Doctorat en sciences, Spécialisation Informatique / info:eu-repo/semantics/nonPublished
78

Extrémní učící se stroje pro předpovídání časových řad / Extreme learning machines for time series prediction

Zmeškal, Jiří January 2018 (has links)
Thesis is aimed at the possibility of utilization of extreme learning machines and echo state networks for time series forecasting with possibility of utilizing GPU acceleration. Such predictions are part of nearly everyone’s daily lives through utilization in weather forecasting, prediction of regular and stock market, power consumption predictions and many more. Thesis is meant to familiarize reader firstly with theoretical basis of extreme learning machines and echo state networks, taking advantage of randomly generating majority of neural networks parameters and avoiding iterative processes. Secondly thesis demonstrates use of programing tools, such as ND4J and CUDA toolkit, to create very own programs. Finally, prediction capability and convenience of GPU acceleration is tested.
79

Inteligentní manažer hry Fantasy Premier League / Intelligent Manager of Fantasy Premier League Game

Vasilišin, Maroš January 2020 (has links)
Hra Fantasy Premier League poskytuje miliónom hráčov po celom svete možnosť stať sa na chvíľu manažérom svojho vlastného klubu. Výsledky a bodové ohodnotenie v hre závisia na správnom predvídaní, ako sa budú hráči chovať v skutočných futbalových zápasoch. Ak by pri tomto rozhodovaní pomáhal software na predikciu a analýzu budúcich výkonov hráčov, výsledky v hre sa môžu rapídne zlepšiť. Táto diplomová práca sa zaoberá návrhom a implementáciou predikčného modelu, ktorý využíva neurónové siete na predikcie časových radov počas celej sezóny v hre. Boli použité metódy na spracovanie dát o hráčoch a kluboch za posledné 4 sezóny. Výkonnosť a presnosť predikčných metód boli testované na dátach z poslednej sezóny Premier League a predikcie algoritmu sa vo väčšine prípadov blížili realite. Ak by sa užívateľ držal predikčného modelu v hre stopercentne, získal by väčší počet bodov ako bežný hráč, ktorý žiadny predikčný model nepoužíva.
80

Forecasting in Database Systems

Fischer, Ulrike 18 December 2013 (has links)
Time series forecasting is a fundamental prerequisite for decision-making processes and crucial in a number of domains such as production planning and energy load balancing. In the past, forecasting was often performed by statistical experts in dedicated software environments outside of current database systems. However, forecasts are increasingly required by non-expert users or have to be computed fully automatically without any human intervention. Furthermore, we can observe an ever increasing data volume and the need for accurate and timely forecasts over large multi-dimensional data sets. As most data subject to analysis is stored in database management systems, a rising trend addresses the integration of forecasting inside a DBMS. Yet, many existing approaches follow a black-box style and try to keep changes to the database system as minimal as possible. While such approaches are more general and easier to realize, they miss significant opportunities for improved performance and usability. In this thesis, we introduce a novel approach that seamlessly integrates time series forecasting into a traditional database management system. In contrast to flash-back queries that allow a view on the data in the past, we have developed a Flash-Forward Database System (F2DB) that provides a view on the data in the future. It supports a new query type - a forecast query - that enables forecasting of time series data and is automatically and transparently processed by the core engine of an existing DBMS. We discuss necessary extensions to the parser, optimizer, and executor of a traditional DBMS. We furthermore introduce various optimization techniques for three different types of forecast queries: ad-hoc queries, recurring queries, and continuous queries. First, we ease the expensive model creation step of ad-hoc forecast queries by reducing the amount of processed data with traditional sampling techniques. Second, we decrease the runtime of recurring forecast queries by materializing models in a specialized index structure. However, a large number of time series as well as high model creation and maintenance costs require a careful selection of such models. Therefore, we propose a model configuration advisor that determines a set of forecast models for a given query workload and multi-dimensional data set. Finally, we extend forecast queries with continuous aspects allowing an application to register a query once at our system. As new time series values arrive, we send notifications to the application based on predefined time and accuracy constraints. All of our optimization approaches intend to increase the efficiency of forecast queries while ensuring high forecast accuracy.

Page generated in 0.09 seconds