641 |
Combining Multivariate Statistical Methods and Spatial Analysis to Characterize Water Quality Conditions in the White River Basin, Indiana, U.S.A.Gamble, Andrew Stephan 25 February 2011 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / This research performs a comparative study of techniques for combining spatial data and multivariate statistical methods for characterizing water quality conditions in a river basin. The study has been performed on the White River basin in central Indiana, and uses sixteen physical and chemical water quality parameters collected from 44 different monitoring sites, along with various spatial data related to land use – land cover, soil characteristics, terrain characteristics, eco-regions, etc. Various parameters related to the spatial data were analyzed using ArcHydro tools and were included in the multivariate analysis methods for the purpose of creating classification equations that relate spatial and spatio-temporal attributes of the watershed to water quality data at monitoring stations. The study compares the use of various statistical estimates (mean, geometric mean, trimmed mean, and median) of monitored water quality variables to represent annual and seasonal water quality conditions. The relationship between these estimates and the spatial data is then modeled via linear and non-linear multivariate methods. The linear statistical multivariate method uses a combination of principal component analysis, cluster analysis, and discriminant analysis, whereas the non-linear multivariate method uses a combination of Kohonen Self-Organizing Maps, Cluster Analysis, and Support Vector Machines. The final models were tested with recent and independent data collected from stations in the Eagle Creek watershed, within the White River basin. In 6 out of 20 models the Support Vector Machine more accurately classified the Eagle Creek stations, and in 2 out of 20 models the Linear Discriminant Analysis model achieved better results. Neither the linear or non-linear models had an apparent advantage for the remaining 12 models. This research provides an insight into the variability and uncertainty in the interpretation of the various statistical estimates and statistical models, when water quality monitoring data is combined with spatial data for characterizing general spatial and spatio-temporal trends.
|
642 |
Informational attributes behind consumer payment habits and settlement preference / Informativa attribut bakom konsumenternas betalningsvanor och transaktionspreferensTchibaline, Alexander, Mårtensson, David January 2018 (has links)
Sweden is known for being at the forefront of becoming a cashless society.However, cash continues to be an important part of the payment ecosystem butthere are limited studies and data regarding the preference for holding cash. Previousstudies have shown that the use of cash is declining with regards to comers and thatthere are behavior differences between cultures and sociodemographic groups inregards to characteristics and preferences for payment methods. The aim of thisstudy was to investigate the informational attributes that drives the preference forholding cash by studying two extreme cases, Sweden which is a pioneer of cashlesssociety, and Germany which is one of the most conservative cash-intensivecountries in the Western world. Primary quantitative data was derived fromstandardized questionnaires issued in both countries by Loomis AB; the data wasdescribed and analyzed with applicable statistical procedures. A multivariate analysiswas performed on a specific segment of the collected data were the respondents hadbeen asked to rate cash associated statements based on their agreeableness towardsthem. These segments were then examined through a Principal component analysisto determine the underlying dimensions regarding preferences for cash. Our resultssuggest that Germans withdrew cash more frequently, from a wider spectrum ofdenominations and carried it to a larger extend than Swedes. Swedes made limitedwithdrawals, of small denominations and preferred to carry smaller amounts of cash.The results also show that there were differences in the perception and preferencesfor cash between sociodemographic groups, with e.g. older age groups being used to itand the youngest using it due to ‘status quo bias’. The main conclusions include thatthe informational attributes such as security, anonymity, ease of use and paymentinfrastructure was the main drives for the preferences of holding cash. / Sverige är känt för att vara i framkanten gällande transformationen till att bli ettkontantlöst samhälle. Kontanter fortsätter dock att vara en viktig del avbetalningsekosystemet, men det finns begränsade studier och data om preferensenför att hålla kontanter. Tidigare studier har visat att användningen av kontanterminskat i dagligvaru- och sällanköpshandeln och att det finns beteendeskillnadermellan kulturer och sociodemografiska grupper med avseende på egenskaper ochpreferenser för betalningsmetoder. Syftet med denna studie var att undersöka deinformativa attribut som driver preferensen för att hålla pengar genom att studeratvå extrema fall, Sverige som är en pionjär gällande det kontantlösa samhället ochTyskland som är ett av de mest konservativa kontantanvändande länderna ivästvärlden. Kvantitativa data härleddes från standardiserade frågeformulärutfärdade i de båda länderna av Loomis AB; data beskrevs och analyserades medtillämpliga statistiska förfaranden, mer specifikt en multivariantanalys som gjordespå ett visst segment av de insamlade uppgifterna, där de svarande blev ombedda attbedöma kontantanvändningen genom associerade uttalanden baserat på derasöverensstämmelse med dem. Dessa segment undersöktes sedan genom enhuvudkomponentanalys för att bestämma de underliggande dimensionerna gällandepreferenser för kontanter. Våra resultat tyder på att tyskar tar ut pengar merfrekvent, från ett bredare spektrum av valörer och bär i större utsträckning merkontanter än svenskar. Svenskar har en begränsad uttagsfrekvens, främst av småvalörer samt att de föredrar att bära kontanter i en liten utsträckning. Fortsatt visarResultaten att det fanns skillnader i perception och preferenser för kontanter mellansociodemografiska grupper, exempelvis är äldre åldersgrupper vana vidkontantanvändning och att de yngre åldersgrupperna använder kontanter på grundav "status quo bias". De viktigaste slutsatserna är att de informativa attribut somsäkerhet, anonymitet, användarvänlighet och betalningsinfrastruktur är de viktigastedrivkrafterna gällande preferensen att hålla kontanter.
|
643 |
Automatic Detection of Brain Functional Disorder Using Imaging DataDey, Soumyabrata 01 January 2014 (has links)
Recently, Attention Deficit Hyperactive Disorder (ADHD) is getting a lot of attention mainly for two reasons. First, it is one of the most commonly found childhood behavioral disorders. Around 5-10% of the children all over the world are diagnosed with ADHD. Second, the root cause of the problem is still unknown and therefore no biological measure exists to diagnose ADHD. Instead, doctors need to diagnose it based on the clinical symptoms, such as inattention, impulsivity and hyperactivity, which are all subjective. Functional Magnetic Resonance Imaging (fMRI) data has become a popular tool to understand the functioning of the brain such as identifying the brain regions responsible for different cognitive tasks or analyzing the statistical differences of the brain functioning between the diseased and control subjects. ADHD is also being studied using the fMRI data. In this dissertation we aim to solve the problem of automatic diagnosis of the ADHD subjects using their resting state fMRI (rs-fMRI) data. As a core step of our approach, we model the functions of a brain as a connectivity network, which is expected to capture the information about how synchronous different brain regions are in terms of their functional activities. The network is constructed by representing different brain regions as the nodes where any two nodes of the network are connected by an edge if the correlation of the activity patterns of the two nodes is higher than some threshold. The brain regions, represented as the nodes of the network, can be selected at different granularities e.g. single voxels or cluster of functionally homogeneous voxels. The topological differences of the constructed networks of the ADHD and control group of subjects are then exploited in the classification approach. We have developed a simple method employing the Bag-of-Words (BoW) framework for the classification of the ADHD subjects. We represent each node in the network by a 4-D feature vector: node degree and 3-D location. The 4-D vectors of all the network nodes of the training data are then grouped in a number of clusters using K-means; where each such cluster is termed as a word. Finally, each subject is represented by a histogram (bag) of such words. The Support Vector Machine (SVM) classifier is used for the detection of the ADHD subjects using their histogram representation. The method is able to achieve 64% classification accuracy. The above simple approach has several shortcomings. First, there is a loss of spatial information while constructing the histogram because it only counts the occurrences of words ignoring the spatial positions. Second, features from the whole brain are used for classification, but some of the brain regions may not contain any useful information and may only increase the feature dimensions and noise of the system. Third, in our study we used only one network feature, the degree of a node which measures the connectivity of the node, while other complex network features may be useful for solving the proposed problem. In order to address the above shortcomings, we hypothesize that only a subset of the nodes of the network possesses important information for the classification of the ADHD subjects. To identify the important nodes of the network we have developed a novel algorithm. The algorithm generates different random subset of nodes each time extracting the features from a subset to compute the feature vector and perform classification. The subsets are then ranked based on the classification accuracy and the occurrences of each node in the top ranked subsets are measured. Our algorithm selects the highly occurring nodes for the final classification. Furthermore, along with the node degree, we employ three more node features: network cycles, the varying distance degree and the edge weight sum. We concatenate the features of the selected nodes in a fixed order to preserve the relative spatial information. Experimental validation suggests that the use of the features from the nodes selected using our algorithm indeed help to improve the classification accuracy. Also, our finding is in concordance with the existing literature as the brain regions identified by our algorithms are independently found by many other studies on the ADHD. We achieved a classification accuracy of 69.59% using this approach. However, since this method represents each voxel as a node of the network which makes the number of nodes of the network several thousands. As a result, the network construction step becomes computationally very expensive. Another limitation of the approach is that the network features, which are computed for each node of the network, captures only the local structures while ignore the global structure of the network. Next, in order to capture the global structure of the networks, we use the Multi-Dimensional Scaling (MDS) technique to project all the subjects from an unknown network-space to a low dimensional space based on their inter-network distance measures. For the purpose of computing distance between two networks, we represent each node by a set of attributes such as the node degree, the average power, the physical location, the neighbor node degrees, and the average powers of the neighbor nodes. The nodes of the two networks are then mapped in such a way that for all pair of nodes, the sum of the attribute distances, which is the inter-network distance, is minimized. To reduce the network computation cost, we enforce that the maximum relevant information is preserved with minimum redundancy. To achieve this, the nodes of the network are constructed with clusters of highly active voxels while the activity levels of the voxels are measured based on the average power of their corresponding fMRI time-series. Our method shows promise as we achieve impressive classification accuracies (73.55%) on the ADHD-200 data set. Our results also reveal that the detection rates are higher when classification is performed separately on the male and female groups of subjects. So far, we have only used the fMRI data for solving the ADHD diagnosis problem. Finally, we investigated the answers of the following questions. Do the structural brain images contain useful information related to the ADHD diagnosis problem? Can the classification accuracy of the automatic diagnosis system be improved combining the information of the structural and functional brain data? Towards that end, we developed a new method to combine the information of structural and functional brain images in a late fusion framework. For structural data we input the gray matter (GM) brain images to a Convolutional Neural Network (CNN). The output of the CNN is a feature vector per subject which is used to train the SVM classifier. For the functional data we compute the average power of each voxel based on its fMRI time series. The average power of the fMRI time series of a voxel measures the activity level of the voxel. We found significant differences in the voxel power distribution patterns of the ADHD and control groups of subjects. The Local binary pattern (LBP) texture feature is used on the voxel power map to capture these differences. We achieved 74.23% accuracy using GM features, 77.30% using LBP features and 79.14% using combined information. In summary this dissertation demonstrated that the structural and functional brain imaging data are useful for the automatic detection of the ADHD subjects as we achieve impressive classification accuracies on the ADHD-200 data set. Our study also helps to identify the brain regions which are useful for ADHD subject classification. These findings can help in understanding the pathophysiology of the problem. Finally, we expect that our approaches will contribute towards the development of a biological measure for the diagnosis of the ADHD subjects.
|
644 |
Risk Measurement and Performance Attribution for IRS Portfolios Using a Generalized Optimization Method for Term Structure EstimationGerdin Börjesson, Fredrik, Eduards, Christoffer January 2021 (has links)
With the substantial size of the interest rate markets, the importance of accurate pricing, risk measurement and performance attribution can not be understated. However, the models used on the markets often have underlying issues with capturing the market's fundamental behavior. With this thesis, we aim to improve the pricing, risk measurement, and performance attribution of interest rate swap portfolios. The paper is divided into six main parts, by subject, to aid in achieving these goals. To begin with, we validate all cash flows with SEB to increase the validity of the results. Next, we implement an optimization-based model developed by Jörgen Blomvall to estimate multiple yield curves. By considering innovations of the daily in-sample curves, risk factors are computed with principal component analysis. These risk factors are then used to simulate one-day and ten-day ahead scenarios for the multiple yield curves using a Monte Carlo method. Given these simulated scenarios, risk measures are then computed. When backtested, these risk measurements give an indication on the overall accuracy of the methodology, including the estimated curves, the derived risk factors, and the simulation methodology. Along with the simulation, on each out-of-sample day, monetary performance attribution for the portfolios is also performed. The performance attribution indicates what drives the value change in the portfolio. This can be used in order to evaluate the estimated yield curves and derived risk factors. The risk measurement and performance attribution is done for three different portfolios of interest rate swaps on the EUR, USD, and SEK markets. However, the risk factors are only estimated for EUR data and used for all portfolios. The main difference to previous work in this area is that, for all implementations, a multiple yield curve environment is studied. Different PCA algorithms are evaluated to increase the precision and speed of the risk factor calculation. Mean reverting risk factors are developed in the simulation framework, along with a Latin hypercube sampling method accounting for dependence in the random variables to reduce variance. We also study the EUR and SEK markets, while the focus in previous literature is on the USD market. Lastly, we calculate and backtest the risk measures value-at-risk and expected shortfall for one-day and ten-day horizons. Four different PCA methods are implemented, a bidiagonal divide and conquer SVD algorithm, a randomized SVD method, an Arnoldi method, and an optimization-based PCA algorithm. We opt to use the first one due to high accuracy and the ability to calculate all eigenpairs. However, we recommend to use the Arnoldi method in future implementations and to further study the optimization-based method. The Latin hypercube sampling with dependence method is able to produce random variables with the same correlation as the input variables. In the simulation, we are able to produce results that pass all backtests for the risk measures considering the USD portfolio. For the EUR and SEK portfolios, it is shown that the risk measures are too conservative. The results of the mean reversion method indicate that it produces slightly less conservative estimates for the ten-day horizon. In the performance attribution, we show that we are able to produce results with small error terms, therefore indicating accurately estimated term structures, risk factors, and pricing. We conclude that we are partly able to fulfill the stated purpose of this thesis due to having produced accurate pricing and satisfactory performance attribution results for all portfolios, and stable risk measures for the USD portfolio. However, it is not possible to state with certainty that improved risk measurements have been achieved for the EUR and SEK portfolios. Although, we present several alternative approaches to remedy this in future implementations.
|
645 |
The linguistic and cognitive mechanisms underlying language tests in healthy adults : a principal component analysisBresolin Goncalves, Ana Paula 04 1900 (has links)
Pour un processus d’évaluation linguistique plus précis et rapide, il est important
d’identifier les mécanismes cognitifs qui soutiennent des tâches langagières couramment
utilisées. Une façon de mieux comprendre ses mécanismes est d’explorer la variance
partagée entre les tâches linguistiques en utilisant l’analyse factorielle exploratoire. Peu
d’études ont employé cette méthode pour étudier ces mécanismes dans le
fonctionnement normal du langage. Par conséquent, notre objectif principal est
d’explorer comment un ensemble de tâches linguistiques se regroupent afin d’étudier les
mécanismes cognitifs sous-jacents de ses tâches. Nous avons évalué 201 participants en
bonne santé âgés entre 18 et 75 ans (moyenne=45,29, écart-type= 15,06) et avec une
scolarité entre 5 et 23 ans (moyenne=11,10, écart-type=4,68), parmi ceux-ci, 62,87%
étaient des femmes. Nous avons employé deux batteries linguistiques : le Protocole
d’examen linguistique de l’aphasie Montréal-Toulouse et Protocole Montréal d’Évaluation
de la Communication – version abrégé. Utilisant l’analyse en composantes principales
avec une rotation Direct-oblimin, nous avons découvert quatre composantes du langage :
la sémantique picturale (tâches de compréhension orale, dénomination orale et
dénomination écrite), l'exécutif linguistique (tâches d’évocation lexicale - critères
sémantique, orthographique et libre), le transcodage et la sémantique (tâches de lecture,
dictée et de jugement sémantique) et la pragmatique (tâches d'interprétation d'actes de
parole indirecte et d'interprétation de métaphores). Ces quatre composantes expliquent
59,64 % de la variance totale. Deuxièmement, nous avons vérifié l'association entre ces
composantes et deux mesures des fonctions exécutives dans un sous-ensemble de 33
participants. La performance de la flexibilité cognitive a été évaluée en soustrayant le -
temps A au temps B du Trail Making Test et celle de la mémoire de travail en prenant le
total des réponses correctes au test du n-back. La composante exécutive linguistique était
associée à une meilleure flexibilité cognitive (r=-0,355) et la composante transcodage et
sémantique à une meilleure performance de mémoire de travail (r=.0,397). Nos résultats
confirment l’hétérogénéité des processus sous-jacent aux tâches langagières et leur
relation intrinsèque avec d'autres composantes cognitives, tels que les fonctions
exécutives. / To a more accurate and time-efficient language assessment process, it is important
to identify the cognitive mechanisms that sustain commonly used language tasks. One
way to do so is to explore the shared variance across language tasks using the technique
of principal components analysis. Few studies applied this technique to investigate these
mechanisms in normal language functioning. Therefore, our main goal is to explore how
a set of language tasks are going to group to investigate the underlying cognitive
mechanisms of commonly used tasks. We assessed 201 healthy participants aged
between 18 and 75 years old (mean = 45.29, SD = 15.06) and with a formal education
between 5 and 23 years (mean = 11.10, SD =4.68), of these 62.87% were female. We used
two language batteries: the Montreal-Toulouse language assessment battery and the
Montreal Communication Evaluation Battery – brief version. Using a Principal Component
Analysis with a Direct-oblimin rotation, we discovered four language components:
pictorial semantics (auditory comprehension, naming and writing naming tasks),
language-executive (unconstrained, semantic, and phonological verbal fluency tasks),
transcoding and semantics (reading, dictation, and semantic judgment tasks), and
pragmatics (indirect speech acts interpretation and metaphors interpretation tasks).
These four components explained 59.64% of the total variance. Secondarily, we sought to
verify the association between these components with two executive measures in a subset
of 33 participants. Cognitive flexibility was assessed by the time B-time A score of the Trail
Making Test and working memory by the total of correct answers on the n-back test. The
language-executive component was associated with a better cognitive flexibility score
(r=-.355) and the transcoding and semantics one with a better working memory
performance (r=.397). Our findings confirm the heterogeneity process underlying
language tasks and their intrinsic relationship to other cognitive components, such as
executive functions.
|
646 |
Market Surveillance Using Empirical Quantile Model and Machine Learning / Marknadsövervakning med hjälp av empirisk kvantilmodell och maskininlärningLandberg, Daniel January 2022 (has links)
In recent years, financial trading has become more available. This has led to more market participants and more trades taking place each day. The increased activity also implies an increasing number of abusive trades. To detect the abusive trades, market surveillance systems are developed and used. In this thesis, two different methods were tested to detect these abusive trades on high-dimensional data. One was based on empirical quantiles, and the other was based on an unsupervised machine learning technique called isolation forest. The empirical quantile method uses empirical quantiles on dimensionally reduced data to determine if a datapoint is an outlier or not. Principal Component Analysis (PCA) is used to reduce the dimensionality of the data and handle the correlation between features.Isolation forest is a machine learning method that detects outliers by sorting each datapoint in a tree structure. If a datapoint is close to the root, it is more likely to be an outlier. Isolation forest have been proven to detect outliers in high-dimensional datasets successfully, but have not been tested before for market surveillance. The performance of both the quantile method and isolation forest was tested by using recall and run-time. The conclusion was that the empirical quantile method did not detect outliers accurately when all dimensions of the data were used. The method most likely suffered from the curse of dimensionality and could not handle high dimensional data. However, the performance increased when the dimensionality was reduced. Isolation forest performed better than the empirical quantile method and detected 99% of all outliers by classifying 226 datapoints as outliers out of a dataset with 184 true outliers and 1882 datapoints. / Under de senaste åren har finansiell handel blivit mer tillgänglig för allmänheten. Detta har lett till fler deltagare på marknaderna och att fler affärer sker varje dag. Den ökade aktiviteten innebär också att de missbruk som förekommer ökar. För att upptäcka otillåtna affärer utvecklas och används marknadsövervakningssystem. I den här avhandlingen testades två olika metoder för att upptäcka dessa missbruk utifrån högdimensionell data. Den ena baserades på empiriska kvantiler och den andra baserades på en oövervakad maskininlärningsteknik som kallas isolationsskog. Den empiriska kvantilmetoden använder empiriska kvantiler på dimensionellt reducerad data för att avgöra om en datapunkt är ett extremvärde eller inte. För att reducera dimensionen av datan, och för att hantera korrelationen mellan variabler, används huvudkomponent analys (HKA).Isolationsskog är en maskininlärnings metod som upptäcker extremvärden genom att sortera varje datapunkt i en trädstruktur. Om en datapunkt är nära roten är det mer sannolikt att det är en extremvärde. Isolationsskog har visat sig framgångsrikt upptäcka extremvärden i högdimensionella datauppsättningar, men har inte testats för marknadsövervakning tidigare. För att mäta prestanda för båda metoderna användes recall och körtid. Slutsatsen är att den empiriska kvantilmetoden inte hittade extremvärden när alla dimensioner av datan användes. Metoden led med största sannolikhet av dimensionalitetens förbannelse och kunde inte hantera högdimensionell data, men när dimensionaliteten reducerades ökade prestandan. Isolationsskog presterade bättre än den empiriska kvantilmetoden och lyckades detektera 99% av alla extremvärden genom att klassificera 226 datapunkter som extremvärden ur ett dataset med 184 verkliga extremvärden och 1882 datapunkter.
|
647 |
Ecosystem services in a rural landscape of southwest OhioLin, Meimei 10 December 2012 (has links)
No description available.
|
648 |
Learning Latent Temporal Manifolds for Recognition and Prediction of Multiple Actions in Streaming Videos using Deep NetworksNair, Binu Muraleedharan 03 June 2015 (has links)
No description available.
|
649 |
[pt] EFEITO DAS INTERVENÇÕES DO BCB NA CURVA DE CUPOM CAMBIAL / [en] THE EFFECT OF BRAZIL CENTRAL BANK S INTERVENTIONS ON THE CUPOM CAMBIAL CURVEVICTOR AUGUSTO MESQUITA CRAVEIRO 05 February 2020 (has links)
[pt] Neste estudo, tentamos estimar o impacto da medida intervencionista mais recente e amplamente adotada pelo Banco Central do Brasil no mercado de câmbio sobre a Curva de Cupom Cambial: a emissão de Swaps Cambiais. O objetivo do BCB com essa intervenção era prover o setor privado de proteção contra a volatilidade cambial à época. O trabalho foca no efeito dessas medidas na curva de Cupom Cambial por conta da importância do funcionamento dessa curva para a correta precificação do mercado de dólar futuro, já que, no Brasil, a formação da taxa de câmbio se dá no preço futuro de dólar e não no preço à vista, como é comum nos outros países. Através de Análise de Componentes Principais sobre a Curva de Cupom Cambial, encontramos seus três primeiros componentes (nível, inclinação e curvatura) e os utilizamos para regredi-los em variáveis independentes que representam a série de emissões de Swap por parte do BC. Os resultados indicam que os Swaps Cambiais geram mudanças significativas no nível geral da Curva de Cupom Cambial. Já os Swaps Reversos não apresentam impacto estatisticamente significante no nível, mas sim na inclinação da curva. / [en] In this study, we try to estimate the impact of the most recent currency intervention measure widely adopted by the Central Bank of Brazil and how it affects the Cupom Cambial Curve: the issue of Foreign Exchange Swaps. The BCB s objective with this intervention was to provide the private sector with hedge against exchange rate volatility. This paper focus on the effect of these measures on the Cupom Curve due to the importance of the comprehension of this curve for the correct pricing of the future dollar market, given that, in Brazil, the formation of the foreign exchange rate occurs with the future dollar price and not in the spot price, as is more common in other countries. Through Principal Component Analysis on the Foreign Exchange Coupon Curve, we find its three components (level, slope and curvature) and use it as an explained variable to regress it with independent variables that represent the series of Swap issued by the Central Bank. The results indicate that the Foreign Exchange Swaps generate significant changes in the overall level of the Cupom Cambial Curve. Otherwise, Reverse Swaps don t represent a statistically significant impact on the level but do impact the slope of the curve.
|
650 |
Identifying factors that correlate with Enhanced Biological Phosphorus Removal upsets at Lundåkraverket / Undersökning av faktorer som påverkar biologisk fosforavskiljning vid LundåkraverketNiranjan, Rounak January 2021 (has links)
The Enhanced Biological Phosphorus Removal (EBPR) process is characterized as the most sustainable process to remove phosphorus from wastewater albeit with high variability in performance efficiency. Thus, unpredictable upsets in the EBPR system is the norm across several wastewater treatment plants throughout Sweden, forcing the hand of the operators to dose higher volume of chemicals to reach the effluent requirements. As future effluent requirements are getting stricter and since higher chemical usage is environmentally and economically unsustainable, this investigation was setup to evaluate which environmental, operational and/or wastewater characteristics correlate with EBPR upsets at full-scale wastewater treatment plant (WWTP), more specifically at Lundåkra WWTP operated by Nordvästra Skånes Vatten och Avlopp (NSVA). The data used in the investigation was collected between 1St January 2018 and 31St December 2020 for a vast number of parameters known to play a key role in biological phosphorus removal. Online sensors as well as external and internal analysis contributed to the data which included parameters such as ‘Total flow at the plant’, ‘pH of the incoming water’, ‘Temperature in aeration basins’, ‘Dissolved oxygen (DO) levels in aeration basins’, ‘Nitrate in aeration basins’, ‘Sludge content in aeration basins’, etc. Other relevant parameters such as ‘Hydraulic retention time (HRT) in the treatment units’, ‘Sludge retention time (SRT) in aeration basin’, ‘Organic loading rate (OLR)’, etc. were calculated. Before the start of this investigation, the two possible explanations were presumed and they can be classified as: (i) upsets as a result of unsuitable environmental conditions and/or error in the operational strategy at the plant and (ii) upsets as a result of toxicity from higher concentration of metals in the influent specifically. Traditional statistical methods such as the t-distributed Stochastic Neighbor Embedding (t-SNE), Spearman Rank Correlation and Principal Component Analysis were used for the purpose of this study to test the first presumed explanation. The t-SNE plot showed that the upsets did not cluster into one large group but instead clamped up into smaller groups scattered across the length of the scale in both dimensions. This points towards the multivariate dependency of the EBPR process and exhibits that upsets might occur even with an operational strategy that produces good results otherwise. This, in turn, eludes to the fact that a non-included parameter such as the ‘daily metal concentrations in the influent’ could be responsible for some or all of the upsets. The Principal Component Analysis (PCA) plot, although noisy, offered an improvement strategy built around the key variables namely ‘nitrate in aeration basin 1 & 2’, ‘sludge content in aeration basin’, ‘SRT in aeration basin’, ‘O2 in aeration basin 1 & 2’ and ‘pH of incoming water’. Therefore, it is recommended that an improvement strategy be devised around them. Multiple causal factors increase the complexity of the analysis by decreasing the correlation coefficients, however, incorporation of the scatterplots presents a clearer picture. The parameters ‘nitrate in aeration basin 1 & 2’ and ‘sludge content in aeration basin’ showed the strongest correlation with phosphate values at the end of biological treatment at -0.32 and 0.42 respectively. The results also open the door to future research and provide direction for further investigations. / Den förbättrade biologiska fosforborttagningsprocessen karakteriseras som den mest hållbara processen för att avlägsna fosfor från avloppsvatten om än med stor variation i prestandaeffektivitet. Således är oförutsägbara störningar i systemet för förbättrad biologisk fosforavskiljning (EBPR) normen bland flera avloppsreningsverk i hela Sverige, vilket tvingar operatörerna att dosera högre volymer kemikalier för att nå avloppskraven. Eftersom framtida avloppskrav blir allt strängare och eftersom högre kemikalieanvändning är miljömässigt och ekonomiskt ohållbar, gjordes denna undersökning för att utvärdera vilka miljö-, drifts- och/eller avloppsvattenegenskaper som korrelerar med EBPR- störningar vid fullskaligt avloppsreningsverk. Närmare bestämt vid Lundåkra reningsverk som drivs av Nordvästra Skånes Vatten och Avlopp. Datan som användes i undersökningen samlades in mellan 1:a januari 2018 och 31:a december 2020 för ett fast antal parametrar som är kända att spela en nyckelroll vid borttagning av biologiskt P. Onlinesensorer samt externa och interna analyser bidrog till datan vilken inkluderade parametrar som 'Totalt flöde vid anläggningen', 'pH för det inkommande vattnet', 'Temperatur i luftningsbassänger', nivåer av upplöst syre (DO) i luftningsbassänger ',' Nitrat i luftningsbassänger ',' Slamhalt i luftningsbassänger ', etc. Andra relevanta parametrar som 'Hydraulisk retentionstid (HRT) i behandlingsenheterna ',' Slamretentionstid (SRT) i luftningsbassäng ',' Organisk belastningshastighet (OLR) ', etc. beräknades. Innan denna undersökning påbjörades antogs de två möjliga förklaringarna och de kan klassificeras som: (i) störningar till följd av olämpliga miljöförhållanden och/eller fel i driftstrategin vid anläggningen och (ii) störningar till följd av toxicitet från högre koncentration av metaller i inflödet specifikt. Traditionella statistiska metoder såsom t- Distributed Stochastic Neighbor Embedding (t-SNE), Spearman Rank Correlation och Principal Component Analysis användes i denna studie för att testa den första förmodade förklaringen. t-SNE- diagrammet visade att störningarna inte samlades i en stor grupp utan istället klämdes ihop i mindre grupper utspridda över skalans längd i båda dimensionerna. Detta pekar mot EBPR-processens multivariata beroende och visar att störningar kan uppstå även med en operativ strategi som annars ger bra resultat. Detta i sin tur undviker det faktum att en icke-inkluderad parameter som "dagliga metallkoncentrationer i inflödet" kan vara orsaken för några eller alla störningar. Principal Component Analysis (PCA)-diagrammet, trots att det var bullrigt, möjliggjorde en förbättringsstrategi byggd kring nyckelvariablerna, nämligen 'nitrat i luftningsbassäng 1 & 2', 'slamhalt i luftningsbassäng', 'SRT i luftningsbassäng', 'O2 i luftningsbassäng 1 & 2' och 'pH av inkommande vatten'. Därför rekommenderas att en förbättringsstrategi utarbetas kring dem. Flera kausala faktorer ökar komplexiteten i analysen genom att minska korrelationskoefficienterna, men spridningsdiagrammen ger en tydligare bild. Parametrarna ‘nitrat i luftningsbassäng 1 & 2’ och ‘slamhalt i luftningsbassäng’ visade starkast samband med fosfatvärden vid slutet av biologisk behandling vid -0,32 respektive 0,42. Resultaten lämnar dörren öppen för framtida forskning och kan vägleda vidare undersökningar.
|
Page generated in 0.0898 seconds