Global ETD Search

161	Scalable Architecture for Automating Machine Learning Model Monitoring de la Rúa Martínez, Javier January 2020 (has links) Last years, due to the advent of more sophisticated tools for exploratory data analysis, data management, Machine Learning (ML) model training and model serving into production, the concept of MLOps has gained more popularity. As an effort to bring DevOps processes to the ML lifecycle, MLOps aims at more automation in the execution of diverse and repetitive tasks along the cycle and at smoother interoperability between teams and tools involved. In this context, the main cloud providers have built their own ML platforms [4, 34, 61], offered as services in their cloud solutions. Moreover, multiple frameworks have emerged to solve concrete problems such as data testing, data labelling, distributed training or prediction interpretability, and new monitoring approaches have been proposed [32, 33, 65]. Among all the stages in the ML lifecycle, one of the most commonly overlooked although relevant is model monitoring. Recently, cloud providers have presented their own tools to use within their platforms [4, 61] while work is ongoing to integrate existent frameworks [72] into open-source model serving solutions [38]. Most of these frameworks are either built as an extension of an existent platform (i.e lack portability), follow a scheduled batch processing approach at a minimum rate of hours, or present limitations for certain outliers and drift algorithms due to the platform architecture design in which they are integrated. In this work, a scalable automated cloudnative architecture is designed and evaluated for ML model monitoring in a streaming approach. An experimentation conducted on a 7-node cluster with 250.000 requests at different concurrency rates shows maximum latencies of 5.9, 29.92 and 30.86 seconds after request time for 75% of distance-based outliers detection, windowed statistics and distribution-based data drift detection, respectively, using windows of 15 seconds length and 6 seconds of watermark delay. / Under de senaste åren har konceptet MLOps blivit alltmer populärt på grund av tillkomsten av mer sofistikerade verktyg för explorativ dataanalys, datahantering, modell-träning och model serving som tjänstgör i produktion. Som ett försök att föra DevOps processer till Machine Learning (ML)-livscykeln, siktar MLOps på mer automatisering i utförandet av mångfaldiga och repetitiva uppgifter längs cykeln samt på smidigare interoperabilitet mellan team och verktyg inblandade. I det här sammanhanget har de största molnleverantörerna byggt sina egna ML-plattformar [4, 34, 61], vilka erbjuds som tjänster i deras molnlösningar. Dessutom har flera ramar tagits fram för att lösa konkreta problem såsom datatestning, datamärkning, distribuerad träning eller tolkning av förutsägelse, och nya övervakningsmetoder har föreslagits [32, 33, 65]. Av alla stadier i ML-livscykeln förbises ofta modellövervakning trots att det är relevant. På senare tid har molnleverantörer presenterat sina egna verktyg att kunna användas inom sina plattformar [4, 61] medan arbetet pågår för att integrera befintliga ramverk [72] med lösningar för modellplatformer med öppen källkod [38]. De flesta av dessa ramverk är antingen byggda som ett tillägg till en befintlig plattform (dvs. saknar portabilitet), följer en schemalagd batchbearbetningsmetod med en lägsta hastighet av ett antal timmar, eller innebär begränsningar för vissa extremvärden och drivalgoritmer på grund av plattformsarkitekturens design där de är integrerade. I det här arbetet utformas och utvärderas en skalbar automatiserad molnbaserad arkitektur för MLmodellövervakning i en streaming-metod. Ett experiment som utförts på ett 7nodskluster med 250.000 förfrågningar vid olika samtidigheter visar maximala latenser på 5,9, 29,92 respektive 30,86 sekunder efter tid för förfrågningen för 75% av avståndsbaserad detektering av extremvärden, windowed statistics och distributionsbaserad datadriftdetektering, med hjälp av windows med 15 sekunders längd och 6 sekunders fördröjning av vattenstämpel. Model Monitoring Streaming Scalability Cloud-native Data Drift Outliers Machine Learning Modellövervakning Streaming-metod Skalbarhet Molnbaserad Dataskift Outlierupptäckt Maskininlärning Computer and Information Sciences Data- och informationsvetenskap
162	Cost-Sensitive Learning-based Methods for Imbalanced Classification Problems with Applications Razzaghi, Talayeh 01 January 2014 (has links) Analysis and predictive modeling of massive datasets is an extremely significant problem that arises in many practical applications. The task of predictive modeling becomes even more challenging when data are imperfect or uncertain. The real data are frequently affected by outliers, uncertain labels, and uneven distribution of classes (imbalanced data). Such uncertainties create bias and make predictive modeling an even more difficult task. In the present work, we introduce a cost-sensitive learning method (CSL) to deal with the classification of imperfect data. Typically, most traditional approaches for classification demonstrate poor performance in an environment with imperfect data. We propose the use of CSL with Support Vector Machine, which is a well-known data mining algorithm. The results reveal that the proposed algorithm produces more accurate classifiers and is more robust with respect to imperfect data. Furthermore, we explore the best performance measures to tackle imperfect data along with addressing real problems in quality control and business analytics. Classification imbalanced data cost sensitive learning outliers weighted support vector machine relaxed support vector machines control chart pattern recognition Engineering Industrial Engineering
163	A Robust Dynamic State and Parameter Estimation Framework for Smart Grid Monitoring and Control Zhao, Junbo 30 May 2018 (has links) The enhancement of the reliability, security, and resiliency of electric power systems depends on the availability of fast, accurate, and robust dynamic state estimators. These estimators should be robust to gross errors on the measurements and the model parameter values while providing good state estimates even in the presence of large dynamical system model uncertainties and non-Gaussian thick-tailed process and observation noises. It turns out that the current Kalman filter-based dynamic state estimators given in the literature suffer from several important shortcomings, precluding them from being adopted by power utilities for practical applications. To be specific, they cannot handle (i) dynamic model uncertainty and parameter errors; (ii) non-Gaussian process and observation noise of the system nonlinear dynamic models; (iii) three types of outliers; and (iv) all types of cyber attacks. The three types of outliers, including observation, innovation, and structural outliers are caused by either an unreliable dynamical model or real-time synchrophasor measurements with data quality issues, which are commonly seen in the power system. To address these challenges, we have pioneered a general theoretical framework that advances both robust statistics and robust control theory for robust dynamic state and parameter estimation of a cyber-physical system. Specifically, the generalized maximum-likelihood-type (GM)-estimator, the unscented Kalman filter (UKF), and the H-infinity filter are integrated into a unified framework to yield various centralized and decentralized robust dynamic state estimators. These new estimators include the GM-iterated extended Kalman filter (GM-IEKF), the GM-UKF, the H-infinity UKF and the robust H-infinity UKF. The GM-IEKF is able to handle observation and innovation outliers but its statistical efficiency is low in the presence of non-Gaussian system process and measurement noise. The GM-UKF addresses this issue and achieves a high statistical efficiency under a broad range of non-Gaussian process and observation noise while maintaining the robustness to observation and innovation outliers. A reformulation of the GM-UKF with multiple hypothesis testing further enables it to handle structural outliers. However, the GM-UKF may yield biased state estimates in presence of large system uncertainties. To this end, the H-infinity UKF that relies on robust control theory is proposed. It is shown that H-infinity is able to bound the system uncertainties but lacks of robustness to outliers and non-Gaussian noise. Finally, the robust H-infinity filter framework is proposed that leverages the H-infinity criterion to bound system uncertainties while relying on the robustness of GM-estimator to filter out non-Gaussian noise and suppress outliers. Furthermore, these new robust estimators are applied for system bus frequency monitoring and control and synchronous generator model parameter calibration. Case studies of several different IEEE standard systems show the efficiency and robustness of the proposed estimators. / Ph. D. / The enhancement of the reliability, security, and resiliency of electric power systems depends on the availability of fast, accurate, and robust dynamic state estimators. These estimators should be robust to gross errors on the measurements and the model parameter values while providing good state estimates even in the presence of large dynamical system model uncertainties and non-Gaussian thick-tailed process and observation noises. There are three types of gross errors or outliers, namely, observation, innovation, and structural outliers. They can be caused by either an unreliable dynamical model or real-time synchrophasor measurements with data quality issues, which are commonly seen in the power system. The system uncertainties can be induced in several ways, including i) unknowable system inputs, such as noise, parameter variations and actuator failures, to name a few; ii) unavailable inputs, such as unmeasured mechanical power, field voltage of the exciter, unknown fault location; and iii) inaccuracies of the model parameter values of the synchronous generators, the loads, the lines, and the transformers, to name a few. It turns out that the current Kalman filter-based dynamic state estimators suffer from several important shortcomings, precluding them from being adopted by power utilities for practical applications. To address these challenges, this dissertation has proposed a general theoretical framework that advances both robust statistics and robust control theory for robust dynamic state and parameter estimation. Specifically, the robust generalized maximum-likelihood-type (GM)- estimator, the nonlinear filter, i.e., unscented Kalman filter (UKF), and the H-infinity filter are integrated into a unified framework to produce various robust dynamic state estimators. These new estimators include the robust GM-IEKF, the robust GM-UKF, the H-infinity UKF and the robust H-infinity UKF. Specifically, the GM-IEKF deals with the observation and innovation outliers but achieving relatively low statistical efficiency in the presence of non-Gaussian system process and measurement noise. To address that, the robust GM-UKF is proposed that is able to achieve a high statistical efficiency under a broad range of non-Gaussian noise while maintaining the robustness to observation and innovation outliers. A reformulation of the GM-UKF with multiple hypothesis testing further enables it to handle three types of outliers. However, the GM-UKF may yield biased state estimates in presence of large system uncertainties. To this end, the H-infinity UKF that depends on robust control theory is proposed. It is able to bound the system uncertainties but lacks of robustness to outliers and non-Gaussian noise. Finally, the robust H-infinity filter framework is proposed that relies on the H-infinity criterion to bound system uncertainties while leveraging the robustness of GM-UKF to filter out non-Gaussian noise and suppress outliers. These new robust estimators are applied for system bus frequency monitoring and control and synchronous generator model parameter calibration. Case studies of several different IEEE standard systems show the efficiency and robustness of the proposed estimators. Kalman filter Robust statistics Power system state estimation Dynamic state estimation Unscented transformation Robust control theory Estimation theory Power system dynamics and control Outliers Cyber attacks Phasor measurement units
164	Shluková analýza rozsáhlých souborů dat: nové postupy založené na metodě k-průměrů / Cluster analysis of large data sets: new procedures based on the method k-means Žambochová, Marta January 2005 (has links) Abstract Cluster analysis has become one of the main tools used in extracting knowledge from data, which is known as data mining. In this area of data analysis, data of large dimensions are often processed, both in the number of objects and in the number of variables, which characterize the objects. Many methods for data clustering have been developed. One of the most widely used is a k-means method, which is suitable for clustering data sets containing large number of objects. It is based on finding the best clustering in relation to the initial distribution of objects into clusters and subsequent step-by-step redistribution of objects belonging to the clusters by the optimization function. The aim of this Ph.D. thesis was a comparison of selected variants of existing k-means methods, detailed characterization of their positive and negative characte- ristics, new alternatives of this method and experimental comparisons with existing approaches. These objectives were met. I focused on modifications of the k-means method for clustering of large number of objects in my work, specifically on the algorithms BIRCH k-means, filtering, k-means++ and two-phases. I watched the time complexity of algorithms, the effect of initialization distribution and outliers, the validity of the resulting clusters. Two real data files and some generated data sets were used. The common and different features of method, which are under investigation, are summarized at the end of the work. The main aim and benefit of the work is to devise my modifications, solving the bottlenecks of the basic procedure and of the existing variants, their programming and verification. Some modifications brought accelerate the processing. The application of the main ideas of algorithm k-means++ brought to other variants of k-means method better results of clustering. The most significant of the proposed changes is a modification of the filtering algorithm, which brings an entirely new feature of the algorithm, which is the detection of outliers. The accompanying CD is enclosed. It includes the source code of programs written in MATLAB development environment. Programs were created specifically for the purpose of this work and are intended for experimental use. The CD also contains the data files used for various experiments.
165	Confiabilidade de rede GPS de referência cadastral municipal - estudo de caso : rede do município de Vitória (ES) / Reliability of network GPS of municipal cadastral reference - study of case : network of the municipal district of Vitória (ES) Amorim, Geraldo Passos 25 March 2004 (has links) A proposta deste trabalho é estudar as teorias de análise de qualidade de rede GPS, baseando-se nas teorias de confiabilidade de rede propostas por Baarda, em 1968. As hipóteses estatísticas para detecção de "outliers" constituem a base desse estudo, pois são fundamentais para elaboração dos testes de detecção de "outliers", localização e eliminação de erros grosseiros e, também, para a análise da confiabilidade da rede. A confiabilidade, que traduz a controlabilidade da rede e depende do número de redundância, é estudada em dois aspectos: confiabilidade interna e confiabilidade externa. A rede de referência cadastral do município de Vitória ES, escolhida para o estudo de caso foi estabelecida por GPS, em 2001, tendo como concepção básica a implantação de 37 pares de vértices intervisíveis, privilegiando locais públicos e de livre acesso. Essa rede foi ajustada em 2001 pela Prefeitura Municipal de Vitória, e as coordenadas ajustadas dos vértices são usadas, deste então, para apoiar todos os levantamentos topográficos e cadastrais realizados no município. O ajustamento dessa rede, em 2001, constituiu-se de um ajustamento simples em que os testes estatísticos de detecção de "outliers", a localização e eliminação dos erros grosseiros não foram levados em conta. A parte prática desta pesquisa compreendeu a medição de 21 novos vetores (linhas bases) para formar uma rede de controle, conforme estabelece a NBR-14166, o ajustamento dessa rede de controle (15 vértices) e o ajustamento da rede principal (78 vértices), tendo por injunção a rede de controle previamente ajustada. A principal diferença ente o ajustamento de 2001, feito pela Prefeitura Municipal de Vitória, e ajustamento de 2004, feito para esta pesquisa, foi a consideração no novo ajustamento dos testes estatísticos baseados nas teorias de confiabilidade propostas por Baarda. A comparação entre os resultados dos dois ajustamentos da rede cadastral de Vitória não apontou diferenças significativas entre as coordenadas ajustadas / The proposal of this work is to study the theories of analysis of network quality GPS, basing on the theories of reliability network proposed by Baarda, in 1968. The statistical hypotheses for outlier's detection constitute the base of this study, because they are fundamental for elaboration of the tests of outlier's detection tests, location and elimination of observations with gross errors as well as for the analysis of the realiability of the network. The reliability, that translates the controllability of the network and it depends of the redundancy number, it was studied in two aspects: internal reliability and external reliability. The network of cadastral reference of the municipal district of Vitória (ES), chosen for the case study it established by GPS, in 2001. The basic conception of this network was the implantation of 37 pair of vertexes inter-visible, privileging public places (of free access), as sidewalks and central stonemasons. This network adjusted in 2001 by the Municipal City Hall of Vitória, and the adjusted coordinates of the vertexes used, of this then, to support all topographical and cadastral survey accomplished in the municipal district. The adjustment of this network, in 2001, constituted of a simple adjustment in that did not take into account the statistical tests of outlier's detection and location and elimination of observations with gross errors. The practical part of this research was constituted of the measurement of 21 new vectors (line bases) to form a control network, as it establishes NBR-14166, the adjustment of that control network (15 vertexes) and the adjustment of the main network (78 vertexes), tends previously for injunction the control network adjusted. To principal it differentiates being the adjustment of 2001, done by the Municipal City Hall of Vitória, and adjustment of 2004, done for this research; it was the consideration in the new adjustment of the based statistical tests, mainly, in the reliability theories proposed by Baarda. The results of the adjustment of 2001 and of 2004 compared, and it verified that, in the case of the cadastral network of Vitória, there was not significant difference among results found in the two adjustments ajustamento de rede confiabilidade data snooping test detecção de outliers elipse de erros ellipse of error erros grosseiros gross errors hypothesis test network adjustment network of cadastral reference normalized residues outlier's detection rede de referência cadastral reliability resíduos normalizados teste data snooping teste de hipóteses
166	Extensões dos modelos de regressão quantílica bayesianos / Extensions of bayesian quantile regression models Santos, Bruno Ramos dos 29 April 2016 (has links) Esta tese visa propor extensões dos modelos de regressão quantílica bayesianos, considerando dados de proporção com inflação de zeros, e também dados censurados no zero. Inicialmente, é sugerida uma análise de observações influentes, a partir da representação por mistura localização-escala da distribuição Laplace assimétrica, em que as distribuições a posteriori das variáveis latentes são comparadas com o intuito de identificar possíveis observações aberrantes. Em seguida, é proposto um modelo de duas partes para analisar dados de proporção com inflação de zeros ou uns, estudando os quantis condicionais e a probabilidade da variável resposta ser igual a zero. Além disso, são propostos modelos de regressão quantílica bayesiana para dados contínuos com um componente discreto no zero, em que parte dessas observações é suposta censurada. Esses modelos podem ser considerados mais completos na análise desse tipo de dados, uma vez que a probabilidade de censura é verificada para cada quantil de interesse. E por último, é considerada uma aplicação desses modelos com correlação espacial, para estudar os dados da eleição presidencial no Brasil em 2014. Nesse caso, os modelos de regressão quantílica são capazes de incorporar essa informação espacial a partir do processo Laplace assimétrico. Para todos os modelos propostos foi desenvolvido um pacote do software R, que está exemplificado no apêndice. / This thesis aims to propose extensions of Bayesian quantile regression models, considering proportion data with zero inflation, and also censored data at zero. Initially, it is suggested an analysis of influential observations, based on the location-scale mixture representation of the asymmetric Laplace distribution, where the posterior distribution of the latent variables are compared with the goal of identifying possible outlying observations. Next, a two-part model is proposed to analyze proportion data with zero or one inflation, studying the conditional quantile and the probability of the response variable being equal to zero. Following, Bayesian quantile regression models are proposed for continuous data with a discrete component at zero, where part of these observations are assumed censored. These models may be considered more complete in the analysis of this type of data, as the censoring probability varies with the quantiles of interest. For last, it is considered an application of these models with spacial correlation, in order to study the data about the last presidential election in Brazil in 2014. In this example, the quantile regression models are able to incorporate spatial dependence with the asymmetric Laplace process. For all the proposed models it was developed a R package, which is exemplified in the appendix. Bayesian quantile regression Censored data Dados censurados Distribuição Laplace assimétrica Modelo de duas partes Modelo espacial Observações aberrantes Outliers Regressão quantica bayesiana Two-part model
167	從貝氏觀點診斷離群值及具有影響力之觀察值 / Some diagnostics for outliers and influential observations from Bayesian point of view 謝季英, Shieh, Jih Ing Unknown Date (has links) 在線性迴歸分析中，資料的不適當，常導致研究者選擇了不當的模式，為避免此缺失，在分析資料前須先做好診斷工作。本文中將從貝氏觀點提出一些不同的診斷方法以供參考。首先推導出均數移動參數a=(a<sub>1</sub>,…,a<sub>k</sub>)'的事後分配，並利用a'a/k的事後均數診斷出不當資料點。接著，考慮在個別模式下以β事後分配之總變異及廣義變異為標準，診斷出離群值及具有潛在影響力之觀測值。最後，分別利用(i)β的事後分配(ii)σ<sup>2</sup>的事後分配(iii)(β,σ<sup>2</sup>)的聯合事後分配，推導出對應的對稱均方差以做為診斷標準。 / In this thesis, some different diagnostic methodologies for outliers and influential observations from Bayesian point of view are proposed. We firstly derive the marginal posterior distribution of the mean-shift parameter a=(a<sub>1</sub>,a<sub>k</sub>)<sup>1</sup>, then use the posterior mean of a<sup>1</sup>a/k to detect the spurious data items. Secondly, we use the posterior total variance and generalized variance of β as diagnostic criterions for outliers and influential observations. Finally, we utilize (i) the posterior distribution of β, (ii) the posterior distribution of σ<sup>2</sup>, and (iii) the joint posterior distribution of β, σ<sup>2</sup> to find their corresponding symmetric mean square differences , which can be used as diagnostic criterions. 貝氏離群值影響力之觀測值不正當均數移動對稱均方差 Bayesian outliers influential observations spurious mean- shift symmetric mean square difference
168	季節性時間序列之預測─類神經網路模式之探討 / Forecasting Seasonal Time Series : A Neural Network Approach 賴家瑞, Lia, Chia Jui Unknown Date (has links) 本論文主要研究以類神經網路模式預測季節性時間序列之有效性。利用適當地建構樣本訓練集,網路經訓練後可作為季節性時間序列之預測工具。文中亦提出移動學習法以期提高預測之準確度。並以台灣地區每季進口商品與勞務總值則作為實證之研究。此季節性時間序列因受離群值之影響而增加其預測困難度。實證結果顯示類神經網路模式之預測表現較傳統之統計方法優異,即使此序列受到離群值之干擾。 / We investigate the effectiveness of neural networks for predicting the future behavior of seasonal time series. Utilizing the training set constructed properly, we can train the network who can be used to predict the future of seasonal time series. A shifting-learning method is also employed in order to obtained a better forecasting performance. The quarterly imports of goods and services of Taiwan between the first quarter of 1968 and the fourth quarter of 1990 are studied in the research. The series are contaminated with outliers, which will increase the difficulty of forecasting. Empirical results exhibit that neural networks model free approach have better prediction performance than the classical Box-Jenkins approach, even the series are contaminated with outliers. 季節性時間序列神經網路移動學習法離群值預測 seasonal time series neural networks shifting learning method outliers forecasting
169	Phenology in Germany in the 20th century : methods, analyses and models Schaber, Jörg January 2002 (has links) Die Länge der Vegetationsperiode (VP) spielt eine zentrale Rolle für die interannuelle Variation der Kohlenstoffspeicherung terrestrischer Ökosysteme. Die Analyse von Beobachtungsdaten hat gezeigt, dass sich die VP in den letzten Jahrzehnten in den nördlichen Breiten verlängert hat. Dieses Phänomen wurde oft im Zusammenhang mit der globalen Erwärmung diskutiert, da die Phänologie von der Temperatur beeinflusst wird.<br /> <br /> Die Analyse der Pflanzenphänologie in Süddeutschland im 20. Jahrhundert zeigte:<br /> - Die starke Verfrühung der Frühjahrsphasen in dem Jahrzehnt vor 1999 war kein singuläres Ereignis im 20. Jahrhundert. Schon in früheren Dekaden gab es ähnliche Trends. Es konnten Perioden mit unterschiedlichem Trendverhalten identifiziert werden.<br /> - Es gab deutliche Unterschiede in den Trends von frühen und späten Frühjahrsphasen. Die frühen Frühjahrsphasen haben sich stetig verfrüht, mit deutlicher Verfrühung zwischen 1931 und 1948, moderater Verfrühung zwischen 1948 und 1984 und starker Verfrühung zwischen 1984 und 1999. Die späten Frühjahrsphasen hingegen, wechselten ihr Trendverhalten in diesen Perioden von einer Verfrühung zu einer deutlichen Verspätung wieder zu einer starken Verfrühung.<br /> <br /> Dieser Unterschied in der Trendentwicklung zwischen frühen und späten Frühjahrsphasen konnte auch für ganz Deutschland in den Perioden 1951 bis 1984 und 1984 bis 1999 beobachtet werden.<br /> Der bestimmende Einfluss der Temperatur auf die Frühjahrsphasen und ihr modifizierender Einfluss auf die Herbstphasen konnte bestätigt werden. Es zeigt sich jedoch, dass <br /> - die Phänologie bestimmende Funktionen der Temperatur nicht mit einem globalen jährlichen CO2 Signal korreliert waren, welches als Index für die globale Erwärmung verwendet wurde<br /> - ein Index für grossräumige regionale Zirkulationsmuster (NAO-Index) nur zu einem kleinen Teil die beobachtete phänologischen Variabilität erklären konnte.<br /> <br /> Das beobachtete unterschiedliche Trendverhalten zwischen frühen und späten Frühjahrsphasen konnte auf die unterschiedliche Entwicklung von März- und Apriltemperaturen zurückgeführt werden. Während sich die Märztemperaturen im Laufe des 20. Jahrhunderts mit einer zunehmenden Variabilität in den letzten 50 Jahren stetig erhöht haben, haben sich die Apriltemperaturen zwischen dem Ende der 1940er und Mitte der 1980er merklich abgekühlt und dann wieder deutlich erwärmt.<br /> Es wurde geschlussfolgert, dass die Verfrühungen in der Frühjahrsphänologie in den letzten Dekaden Teile multi-dekadischer Fluktuationen sind, welche sich nach Spezies und relevanter saisonaler Temperatur unterscheiden. Aufgrund dieser Fluktuationen konnte kein Zusammenhang mit einem globalen Erwärmungsignal gefunden werden.<br /> Im Durchschnitt haben sich alle betrachteten Frühjahrsphasen zwischen 1951 und 1999 in Naturräumen in Deutschland zwischen 5 und 20 Tagen verfrüht. Ein starker Unterschied in der Verfrühung zwischen frühen und späten Frühjahrsphasen liegt an deren erwähntem unterschiedlichen Verhalten. Die Blattverfärbung hat sich zwischen 1951 und 1999 für alle Spezies verspätet, aber nach 1984 im Durchschnitt verfrüht. Die VP hat sich in Deutschland zwischen 1951 und 1999 um ca. 10 Tage verlängert.<br /> Es ist hauptsächlich die Änderung in den Frühjahrphasen, die zu einer Änderung in der potentiell absorbierten Strahlung (PAS) führt. Darüber hinaus sind es die späten Frühjahrsphasen, die pro Tag Verfrühung stärker profitieren, da die zusätzlichen Tage länger undwärmer sind als dies für die frühen Phasen der Fall ist. Um die relative Änderung in PAS im Vergleich der Spezies abzuschätzen, müssen allerdings auch die Veränderungen in den Herbstphasen berücksichtigt werden.<br /> Der deutliche Unterschied zwischen frühen und späten Frühjahrsphasen konnte durch die Anwendung einer neuen Methode zur Konstruktion von Zeitreihen herausgearbeitet werden. Der neue methodische Ansatz erlaubte die Ableitung verlässlicher 100-jähriger Zeitreihen und die Konstruktion von lokalen kombinierten Zeitreihen, welche die Datenverfügbarkeit für die Modellentwicklung erhöhten.<br /> Ausser analysierten Protokollierungsfehlern wurden mikroklimatische, genetische und Beobachtereinflüsse als Quellen von Unsicherheit in phänologischen Daten identifiziert. Phänologischen Beobachtungen eines Ortes können schätzungsweise 24 Tage um das parametrische Mittel schwanken.Dies unterstützt die 30-Tage Regel für die Detektion von Ausreissern.<br /> Neue Phänologiemodelle, die den Blattaustrieb aus täglichen Temperaturreihen simulieren, wurden entwickelt. Diese Modelle basieren auf einfachen Interaktionen zwischen aktivierenden und hemmenden Substanzen, welche die Entwicklungsstadien einer Pflanze bestimmen. Im Allgemeinen konnten die neuen Modelle die Beobachtungsdaten besser simulieren als die klassischen Modelle.<br /> <br /> Weitere Hauptresultate waren:<br /> - Der Bias der klassischen Modelle, d.h. Überschätzung von frühen und Unterschätzung von späten Beobachtungen, konnte reduziert, aber nicht vollständig eliminiert werden.<br /> - Die besten Modellvarianten für verschiedene Spezies wiesen darauf hin, dass für die späten Frühjahrsphasen die Tageslänge eine wichtigere Rolle spielt als für die frühen Phasen.<br /> - Die Vernalisation spielte gegenüber den Temperaturen kurz vor dem Blattaustrieb nur eine untergeordnete Rolle. / The length of the vegetation period (VP) plays a central role for the interannual variation of carbon fixation of terrestrial ecosystems. Observational data analysis has indicated that the length of the VP has increased in the last decades in the northern latitudes mainly due to an advancement of bud burst (BB). This phenomenon has been widely discussed in the context of Global Warming because phenology is correlated to temperatures. <br /> <br /> Analyzing the patterns of spring phenology over the last century in Southern Germany provided two main findings:<br /> - The strong advancement of spring phases especially in the decade before 1999 is not a singular event in the course of the 20th century. Similar trends were also observed in earlier decades. Distinct periods of varying trend behavior for important spring phases could be distinguished.<br /> - Marked differences in trend behavior between the early and late spring phases were detected. Early spring phases changed as regards the magnitude of their negative trends from strong negative trends between 1931 and 1948 to moderate negative trends between 1948 and 1984 and back to strong negative trends between 1984 and 1999. Late spring phases showed a different behavior. Negative trends between 1931 and 1948 are followed by marked positive trends between 1948 and 1984 and then strong negative trends between 1984 and 1999.<br /> This marked difference in trend development between early and late spring phases was also found all over Germany for the two periods 1951 to 1984 and 1984 to 1999.<br /> <br /> The dominating influence of temperature on spring phenology and its modifying effect on autumn phenology was confirmed in this thesis. However,<br /> - temperature functions determining spring phenology were not significantly correlated with a global annual CO2 signal which was taken as a proxy for a Global Warming pattern.<br /> - an index for large scale regional circulation patterns (NAO index) could only to a small part explain the observed phenological variability in spring.<br /> <br /> The observed different trend behavior of early and late spring phases is explained by the differing behavior of mean March and April temperatures. Mean March temperatures have increased on average over the 20th century accompanied by an increasing variation in the last 50 years. April temperatures, however, decreased between the end of the 1940s and the mid-1980s, followed by a marked warming after the mid-1980s. <br /> It can be concluded that the advancement of spring phenology in recent decades are part of multi-decadal fluctuations over the 20th century that vary with the species and the relevant seasonal temperatures. Because of these fluctuations a correlation with an observed Global Warming signal could not be found.<br /> On average all investigated spring phases advanced between 5 and 20 days between 1951 and 1999 for all Natural Regions in Germany. A marked difference be! tween late and early spring phases is due to the above mentioned differing behavior before and after the mid-1980s. Leaf coloring (LC) was delayed between 1951 and 1984 for all tree species. However, after 1984 LC was advanced. Length of the VP increased between 1951 and 1999 for all considered tree species by an average of ten days throughout Germany.<br /> It is predominately the change in spring phases which contributes to a change in the potentially absorbed radiation. Additionally, it is the late spring species that are relatively more favored by an advanced BB because they can additionally exploit longer days and higher temperatures per day advancement. To assess the relative change in potentially absorbed radiation among species, changes in both spring and autumn phenology have to be considered as well as where these changes are located in the year.<br /> For the detection of the marked difference between early and late spring phenology a new time series construction method was developed. This method allowed the derivation of reliable time series that spanned over 100 years and the construction of locally combined time series increasing the available data for model development.<br /> Apart from analyzed protocolling errors, microclimatic site influences, genetic variation and the observers were identified as sources of uncertainty of phenological observational data. It was concluded that 99% of all phenological observations at a certain site will vary within approximately 24 days around the parametric mean. This supports to the proposed 30-day rule to detect outliers. <br /> New phenology models that predict local BB from daily temperature time series were developed. These models were based on simple interactions between inhibitory and promotory agents that are assumed to control the developmental status of a plant. Apart from the fact that, in general, the new models fitted and predicted the observations better than classical models, the main modeling results were: <br /> - The bias of the classical models, i.e. overestimation of early observations and underestimation of late observations, could be reduced but not completely removed. <br /> - The different favored model structures for each species indicated that for the late spring phases photoperiod played a more dominant role than for early spring phases. <br /> - Chilling only plays a subordinate role for spring BB compared to temperatures directly preceding BB. Earth sciences
170	Robust estimation for spatial models and the skill test for disease diagnosis Lin, Shu-Chuan 25 August 2008 (has links) This thesis focuses on (1) the statistical methodologies for the estimation of spatial data with outliers and (2) classification accuracy of disease diagnosis. Chapter I, Robust Estimation for Spatial Markov Random Field Models: Markov Random Field (MRF) models are useful in analyzing spatial lattice data collected from semiconductor device fabrication and printed circuit board manufacturing processes or agricultural field trials. When outliers are present in the data, classical parameter estimation techniques (e.g., least squares) can be inefficient and potentially mislead the analyst. This chapter extends the MRF model to accommodate outliers and proposes robust parameter estimation methods such as the robust M- and RA-estimates. Asymptotic distributions of the estimates with differentiable and non-differentiable robustifying function are derived. Extensive simulation studies explore robustness properties of the proposed methods in situations with various amounts of outliers in different patterns. Also provided are studies of analysis of grid data with and without the edge information. Three data sets taken from the literature illustrate advantages of the methods. Chapter II, Extending the Skill Test for Disease Diagnosis: For diagnostic tests, we present an extension to the skill plot introduced by Mozer and Briggs (2003). The method is motivated by diagnostic measures for osteoporosis in a study. By restricting the area under the ROC curve (AUC) according to the skill statistic, we have an improved diagnostic test for practical applications by considering the misclassification costs. We also construct relationships, using the Koziol-Green model and mean-shift model, between the diseased group and the healthy group for improving the skill statistic. Asymptotic properties of the skill statistic are provided. Simulation studies compare the theoretical results and the estimates under various disease rates and misclassification costs. We apply the proposed method in classification of osteoporosis data. True positive rate False positive rate Classification Disease diagnosis Skill test Robust estimation Spatial models Markov random field models Spatial lattice data Koziol-Green model and mean-shift model Area under the curve ROC curve Markov random fields Lattice theory Outliers (Statistics)

Search results