Spelling suggestions: "subject:"data series"" "subject:"mata series""
1 |
Wind Speed Prediction using Global and Regional Based Virtual Towers in CFD SimulationsMoubarak, Roger January 2011 (has links)
Wind farm assessment is a costly and time consuming process when it is planned by traditional methods such as a met mast. Therefore, new models have been established and used for the wind farm assessment to ease the process of wind farm planning. These models are Global-regional models which add to cost efficiency and time saving. There are several types of these models in the market that have different accuracy. This thesis discusses and uses in simulations Global – regional model data outputs from European Centre for Medium-Range Weather Forecasts (ECMWF), Weather Research Forecast WRF and ECMWF, which is currently producing ERA-Interim, global reanalysis of the data-rich period since 1989 .The goal of the master's thesis is to see whether it is useful and efficient to use Global – regional weather model data such as the Era Interim Global Reanalysis Model data for wind assessment by comparing it with the real data series (met mast) located in Maglarp, in the south of Sweden.The comparison shows that in that specific area (hindcast) at Maglarp, in the south of Sweden, very promising results for planning a wind farm for a 100m, 120m and 38m heights.
|
2 |
Operational Modal Parameter Estimation from Short Time-Data SeriesArora, Rahul 03 June 2014 (has links)
No description available.
|
3 |
Order in the random forestKarlsson, Isak January 2017 (has links)
In many domains, repeated measurements are systematically collected to obtain the characteristics of objects or situations that evolve over time or other logical orderings. Although the classification of such data series shares many similarities with traditional multidimensional classification, inducing accurate machine learning models using traditional algorithms are typically infeasible since the order of the values must be considered. In this thesis, the challenges related to inducing predictive models from data series using a class of algorithms known as random forests are studied for the purpose of efficiently and effectively classifying (i) univariate, (ii) multivariate and (iii) heterogeneous data series either directly in their sequential form or indirectly as transformed to sparse and high-dimensional representations. In the thesis, methods are developed to address the challenges of (a) handling sparse and high-dimensional data, (b) data series classification and (c) early time series classification using random forests. The proposed algorithms are empirically evaluated in large-scale experiments and practically evaluated in the context of detecting adverse drug events. In the first part of the thesis, it is demonstrated that minor modifications to the random forest algorithm and the use of a random projection technique can improve the effectiveness of random forests when faced with discrete data series projected to sparse and high-dimensional representations. In the second part of the thesis, an algorithm for inducing random forests directly from univariate, multivariate and heterogeneous data series using phase-independent patterns is introduced and shown to be highly effective in terms of both computational and predictive performance. Then, leveraging the notion of phase-independent patterns, the random forest is extended to allow for early classification of time series and is shown to perform favorably when compared to alternatives. The conclusions of the thesis not only reaffirm the empirical effectiveness of random forests for traditional multidimensional data but also indicate that the random forest framework can, with success, be extended to sequential data representations.
|
4 |
Classificação de sinais de eletroencefalograma usando máquinas de vetores suporteChagas, Sandro Luiz das 27 August 2009 (has links)
Made available in DSpace on 2016-03-15T19:38:14Z (GMT). No. of bitstreams: 1
Sandro Luiz das Chagas.pdf: 1694587 bytes, checksum: d10c7a5a95b65289731cab95f9b3478a (MD5)
Previous issue date: 2009-08-27 / Electroencephalogram (EEG) is a clinical method widely used to study brain function and neurological disorders. The EEG is a temporal data series which records the electrical activity of the brain. The EEG monitoring systems create a huge amount of data; with this fact a visual analysis of the EEG is not feasible. Because of this, there is a strong demand for computational methods able to analyze automatically the EEG records and extract useful information to support the diagnostics. Herewith, it is necessary to design a tool to extract the relevant features within the EEG record and to classify the EEG based on these features. Calculation of statistics over wavelet coefficients are being used successfully to extract features from many kinds of temporal data series, including EEG signals. Support Vector Machines (SVM) are machine learning techniques with high generalization ability, and they have been successfully used in classification problems by several researches. This dissertation makes an analysis of the influence of feature vectors based on wavelet coefficients in the classification of EEG signal using different implementations of SVMs. / O eletroencefalograma (EEG) é um exame médico largamente utilizado no estudo da função cerebral e de distúrbios neurológicos. O EEG é uma série temporal que contém os registros de atividade elétrica do cérebro. Um grande volume de dados é gerado pelos sistemas de monitoração de EEG, o que faz com que a análise visual completa destes dados se torne inviável na prática. Com isso, surge uma grande demanda por métodos computacionais capazes de extrair, de forma automática, informação útil para a realização de diagnósticos. Para atender essa demanda, é necessária uma forma de extrair de um sinal de EEG as características relevantes para um diagnóstico e também uma forma de classificar o EEG em função destas características. O cálculo de estatísticas sobre coeficientes wavelet vem sendo empregado com sucesso na extração de características de diversos tipos de séries temporais, inclusive EEG. As máquinas de vetores de suporte (SVM do inglês Support Vector Machines) constituem uma técnica de aprendizado de máquina que possui alta capacidade de generalização e têm sido empregadas com sucesso em problemas de classificação por diversos pesquisadores. Nessa dissertação é feita uma análise do impacto da utilização de vetores de características baseados em coeficientes wavelet na classificação de EEG utilizando diferentes implementações de SVM.
|
5 |
Iterative and Expressive Querying for Big Data Series / Requêtes itératives et expressives pour l’analyse de grandes séries de donnéesGogolou, Anna 15 November 2019 (has links)
Les séries temporelles deviennent omniprésentes dans la vie moderne et leur analyse de plus en plus difficile compte tenu de leur taille. L’analyse des grandes séries de données implique des tâches telles que l’appariement de modèles (motifs), la détection d’anomalies, l’identification de modèles fréquents, et la classification ou le regroupement (clustering). Ces tâches reposent sur la notion de similarité. La communauté scientifique a proposé de plusieurs techniques, y compris de nombreuses mesures de similarité pour calculer la distance entre deux séries temporelles, ainsi que des techniques et des algorithmes d’indexation correspondants, afin de relever les défis de l’évolutivité lors de la recherche de similarité.Les analystes, afin de s’acquitter efficacement de leurs tâches, ont besoin de systèmes d’analyse visuelle interactifs, extrêmement rapides, et puissants. Lors de la création de tels systèmes, nous avons identifié deux principaux défis: (1) la perception de similarité et (2) la recherche progressive de similarité. Le premier traite de la façon dont les gens perçoivent des modèles similaires et du rôle de la visualisation dans la perception de similarité. Le dernier point concerne la rapidité avec laquelle nous pouvons redonner aux utilisateurs des mises à jour des résultats progressifs, lorsque les temps de réponse du système sont longs et non interactifs. Le but de cette thèse est de répondre et de donner des solutions aux défis ci-dessus.Dans la première partie, nous avons étudié si différentes représentations visuelles (Graphiques en courbes, Graphiques d’horizon et Champs de couleur) modifiaient la perception de similarité des séries temporelles. Nous avons essayé de comprendre si les résultats de recherche automatique de similarité sont perçus de manière similaire, quelle que soit la technique de visualisation; et si ce que les gens perçoivent comme similaire avec chaque visualisation s’aligne avec différentes mesures de similarité. Nos résultats indiquent que les Graphes d’horizon s’alignent sur des mesures qui permettent des variations de décalage temporel ou d’échelle (i.e., ils promeuvent la déformation temporelle dynamique). En revanche, ils ne s’alignent pas sur des mesures autorisant des variations d’amplitude et de décalage vertical (ils ne promeuvent pas des mesures basées sur la z-normalisation). L’inverse semble être le cas pour les Graphiques en courbes et les Champs de couleur. Dans l’ensemble, nos travaux indiquent que le choix de la visualisation affecte les schémas temporels que l’homme considère comme similaires. Donc, la notion de similarité dans les séries temporelles est dépendante de la technique de visualisation.Dans la deuxième partie, nous nous sommes concentrés sur la recherche progressive de similarité dans de grandes séries de données. Nous avons étudié la rapidité avec laquelle les premières réponses approximatives et puis des mises à jour des résultats progressifs sont détectées lors de l’exécuton des requêtes progressives. Nos résultats indiquent qu’il existe un écart entre le moment où la réponse finale s’est trouvée et le moment où l’algorithme de recherche se termine, ce qui entraîne des temps d’attente gonflés sans amélioration. Des estimations probabilistes pourraient aider les utilisateurs à décider quand arrêter le processus de recherche, i.e., décider quand l’amélioration de la réponse finale est improbable. Nous avons développé et évalué expérimentalement une nouvelle méthode probabiliste qui calcule les garanties de qualité des résultats progressifs de k-plus proches voisins (k-NN). Notre approche apprend d’un ensemble de requêtes et construit des modèles de prédiction basés sur deux observations: (i) des requêtes similaires ont des réponses similaires; et (ii) des réponses progressives renvoyées par les indices de séries de données sont de bons prédicteurs de la réponse finale. Nous fournissons des estimations initiales et progressives de la réponse finale. / Time series are becoming ubiquitous in modern life, and given their sizes, their analysis is becoming increasingly challenging. Time series analysis involves tasks such as pattern matching, anomaly detection, frequent pattern identification, and time series clustering or classification. These tasks rely on the notion of time series similarity. The data-mining community has proposed several techniques, including many similarity measures (or distance measure algorithms), for calculating the distance between two time series, as well as corresponding indexing techniques and algorithms, in order to address the scalability challenges during similarity search.To effectively support their tasks, analysts need interactive visual analytics systems that combine extremely fast computation, expressive querying interfaces, and powerful visualization tools. We identified two main challenges when considering the creation of such systems: (1) similarity perception and (2) progressive similarity search. The former deals with how people perceive similar patterns and what the role of visualization is in time series similarity perception. The latter is about how fast we can give back to users updates of progressive similarity search results and how good they are, when system response times are long and do not support real-time analytics in large data series collections. The goal of this thesis, that lies at the intersection of Databases and Human-Computer Interaction, is to answer and give solutions to the above challenges.In the first part of the thesis, we studied whether different visual representations (Line Charts, Horizon Graphs, and Color Fields) alter time series similarity perception. We tried to understand if automatic similarity search results are perceived in a similar manner, irrespective of the visualization technique; and if what people perceive as similar with each visualization aligns with different automatic similarity measures and their similarity constraints. Our findings indicate that Horizon Graphs promote as invariant local variations in temporal position or speed, and as a result they align with measures that allow variations in temporal shifting or scaling (i.e., dynamic time warping). On the other hand, Horizon Graphs do not align with measures that allow amplitude and y-offset variations (i.e., measures based on z-normalization), because they exaggerate these differences, while the inverse seems to be the case for Line Charts and Color Fields. Overall, our work indicates that the choice of visualization affects what temporal patterns humans consider as similar, i.e., the notion of similarity in time series is visualization-dependent.In the second part of the thesis, we focused on progressive similarity search in large data series collections. We investigated how fast first approximate and then updates of progressive answers are detected, while we execute similarity search queries. Our findings indicate that there is a gap between the time the final answer is found and the time when the search algorithm terminates, resulting in inflated waiting times without any improvement. Computing probabilistic estimates of the final answer could help users decide when to stop the search process. We developed and experimentally evaluated using benchmarks, a new probabilistic learning-based method that computes quality guarantees (error bounds) for progressive k-Nearest Neighbour (k-NN) similarity search results. Our approach learns from a set of queries and builds prediction models based on two observations: (i) similar queries have similar answers; and (ii) progressive best-so-far (bsf) answers returned by the state-of-the-art data series indexes are good predictors of the final k-NN answer. We provide both initial and incrementally improved estimates of the final answer.
|
Page generated in 0.0783 seconds