Global ETD Search

61	Feature Extraction using Dimensionality Reduction Techniques: Capturing the Human Perspective Coleman, Ashley B. January 2015 (has links) No description available. Computer Science Feature Extraction Dimensionality Reduction Principal Component Analysis Multi-dimensional Scaling Isomap Kernel Principal Component Analysis
62	Utilização de análise de componentes principais em séries temporais / Use of principal component analysis in time series Teixeira, Sérgio Coichev 12 April 2013 (has links) Um dos principais objetivos da análise de componentes principais consiste em reduzir o número de variáveis observadas em um conjunto de variáveis não correlacionadas, fornecendo ao pesquisador subsídios para entender a variabilidade e a estrutura de correlação dos dados observados com uma menor quantidade de variáveis não correlacionadas chamadas de componentes principais. A técnica é muito simples e amplamente utilizada em diversos estudos de diferentes áreas. Para construção, medimos a relação linear entre as variáveis observadas pela matriz de covariância ou pela matriz de correlação. Entretanto, as matrizes de covariância e de correlação podem deixar de capturar importante informações para dados correlacionados sequencialmente no tempo, autocorrelacionados, desperdiçando parte importante dos dados para interpretação das componentes. Neste trabalho, estudamos a técnica de análise de componentes principais que torna possível a interpretação ou análise da estrutura de autocorrelação dos dados observados. Para isso, exploramos a técnica de análise de componentes principais para o domínio da frequência que fornece para dados autocorrelacionados um resultado mais específico e detalhado do que a técnica de componentes principais clássica. Pelos métodos SSA (Singular Spectrum Analysis) e MSSA (Multichannel Singular Spectrum Analysis), a análise de componentes principais é baseada na correlação no tempo e entre as diferentes variáveis observadas. Essas técnicas são muito utilizadas para dados atmosféricos na identificação de padrões, tais como tendência e periodicidade. / The main objective of principal component analysis (PCA) is to reduce the number of variables in a small uncorrelated data sets, providing support and helping researcher understand the variation present in all the original variables with small uncorrelated amount of variables, called components. The principal components analysis is very simple and frequently used in several areas. For its construction, the components are calculated through covariance matrix. However, the covariance matrix does not capture the autocorrelation information, wasting important information about data sets. In this research, we present some techniques related to principal component analysis, considering autocorrelation information. However, we explore the principal component analysis in the domain frequency, providing more accurate and detailed results than classical component analysis time series case. In subsequent method SSA (Singular Spectrum Analysis) and MSSA (Multichannel Singular Spectrum Analysis), we study the principal component analysis considering relationship between locations and time points. These techniques are broadly used for atmospheric data sets to identify important characteristics and patterns, such as tendency and periodicity. Análise de componentes principais MSSA MSSA Principal Component Analysis SSA SSA
63	Utilização de análise de componentes principais em séries temporais / Use of principal component analysis in time series Sérgio Coichev Teixeira 12 April 2013 (has links) Um dos principais objetivos da análise de componentes principais consiste em reduzir o número de variáveis observadas em um conjunto de variáveis não correlacionadas, fornecendo ao pesquisador subsídios para entender a variabilidade e a estrutura de correlação dos dados observados com uma menor quantidade de variáveis não correlacionadas chamadas de componentes principais. A técnica é muito simples e amplamente utilizada em diversos estudos de diferentes áreas. Para construção, medimos a relação linear entre as variáveis observadas pela matriz de covariância ou pela matriz de correlação. Entretanto, as matrizes de covariância e de correlação podem deixar de capturar importante informações para dados correlacionados sequencialmente no tempo, autocorrelacionados, desperdiçando parte importante dos dados para interpretação das componentes. Neste trabalho, estudamos a técnica de análise de componentes principais que torna possível a interpretação ou análise da estrutura de autocorrelação dos dados observados. Para isso, exploramos a técnica de análise de componentes principais para o domínio da frequência que fornece para dados autocorrelacionados um resultado mais específico e detalhado do que a técnica de componentes principais clássica. Pelos métodos SSA (Singular Spectrum Analysis) e MSSA (Multichannel Singular Spectrum Analysis), a análise de componentes principais é baseada na correlação no tempo e entre as diferentes variáveis observadas. Essas técnicas são muito utilizadas para dados atmosféricos na identificação de padrões, tais como tendência e periodicidade. / The main objective of principal component analysis (PCA) is to reduce the number of variables in a small uncorrelated data sets, providing support and helping researcher understand the variation present in all the original variables with small uncorrelated amount of variables, called components. The principal components analysis is very simple and frequently used in several areas. For its construction, the components are calculated through covariance matrix. However, the covariance matrix does not capture the autocorrelation information, wasting important information about data sets. In this research, we present some techniques related to principal component analysis, considering autocorrelation information. However, we explore the principal component analysis in the domain frequency, providing more accurate and detailed results than classical component analysis time series case. In subsequent method SSA (Singular Spectrum Analysis) and MSSA (Multichannel Singular Spectrum Analysis), we study the principal component analysis considering relationship between locations and time points. These techniques are broadly used for atmospheric data sets to identify important characteristics and patterns, such as tendency and periodicity. Análise de componentes principais MSSA SSA MSSA Principal Component Analysis SSA
64	Financial Time Series Analysis using Pattern Recognition Methods Zeng, Zhanggui January 2008 (has links) Doctor of Philosophy / This thesis is based on research on financial time series analysis using pattern recognition methods. The first part of this research focuses on univariate time series analysis using different pattern recognition methods. First, probabilities of basic patterns are used to represent the features of a section of time series. This feature can remove noise from the time series by statistical probability. It is experimentally proven that this feature is successful for pattern repeated time series. Second, a multiscale Gaussian gravity as a pattern relationship measurement which can describe the direction of the pattern relationship is introduced to pattern clustering. By searching for the Gaussian-gravity-guided nearest neighbour of each pattern, this clustering method can easily determine the boundaries of the clusters. Third, a method that unsupervised pattern classification can be transformed into multiscale supervised pattern classification by multiscale supervisory time series or multiscale filtered time series is presented. The second part of this research focuses on multivariate time series analysis using pattern recognition. A systematic method is proposed to find the independent variables of a group of share prices by time series clustering, principal component analysis, independent component analysis, and object recognition. The number of dependent variables is reduced and the multivariate time series analysis is simplified by time series clustering and principal component analysis. Independent component analysis aims to find the ideal independent variables of the group of shares. Object recognition is expected to recognize those independent variables which are similar to the independent components. This method provides a new clue to understanding the stock market and to modelling a large time series database. Financial time series Pattern recognition Probabilistic relaxation clustering Multi-scale pattern clustering Principal component analysis Bayesian classification Principal component analysis Independent component analysis Object matching
65	Financial Time Series Analysis using Pattern Recognition Methods Zeng, Zhanggui January 2008 (has links) Doctor of Philosophy / This thesis is based on research on financial time series analysis using pattern recognition methods. The first part of this research focuses on univariate time series analysis using different pattern recognition methods. First, probabilities of basic patterns are used to represent the features of a section of time series. This feature can remove noise from the time series by statistical probability. It is experimentally proven that this feature is successful for pattern repeated time series. Second, a multiscale Gaussian gravity as a pattern relationship measurement which can describe the direction of the pattern relationship is introduced to pattern clustering. By searching for the Gaussian-gravity-guided nearest neighbour of each pattern, this clustering method can easily determine the boundaries of the clusters. Third, a method that unsupervised pattern classification can be transformed into multiscale supervised pattern classification by multiscale supervisory time series or multiscale filtered time series is presented. The second part of this research focuses on multivariate time series analysis using pattern recognition. A systematic method is proposed to find the independent variables of a group of share prices by time series clustering, principal component analysis, independent component analysis, and object recognition. The number of dependent variables is reduced and the multivariate time series analysis is simplified by time series clustering and principal component analysis. Independent component analysis aims to find the ideal independent variables of the group of shares. Object recognition is expected to recognize those independent variables which are similar to the independent components. This method provides a new clue to understanding the stock market and to modelling a large time series database. Financial time series Pattern recognition Probabilistic relaxation clustering Multi-scale pattern clustering Principal component analysis Bayesian classification Principal component analysis Independent component analysis Object matching
66	A Framework For Analysing Investable Risk Premia Strategies / Ett ramverk för analys av investerbarariskpremiestrategier Sandqvist, Joakim, Byström, Erik January 2014 (has links) The focus of this study is to map, classify and analyse how different risk premia strategies that are fully implementable, perform and are affected by different economic environments. The results are of interest for practitioners who currently invest in or are thinking about investing in risk premia strategies. The study also makes a theoretical contribution since there currently is a lack of publicised work on this subject. A combination of the statistical methods cluster tree, spanning tree and principal component analysis are used to first categorise the investigated risk premia strategies into different clusters based on their correlation characteristics and secondly to find the strategies’ most important return drivers. Lastly, an analysis of how the clusters of strategies perform in different macroeconomic environments, here represented by inflation and growth, is conducted. The results show that the three most important drivers for the investigated risk premia strategies are a crisis factor, an equity directional factor and an interest rate factor. These three components explained about 18 percent, 14 percent and 10 percent of the variation in the data, respectively. The results also show that all four clusters, despite containing different types of risk premia strategies, experienced positive total returns during all macroeconomic phases sampled in this study. These results can be seen as indicative of a lower macroeconomic sensitivity among the risk premia strategies and more of an “alpha-like” behaviour. / Denna studie fokuserar på att kartlägga, klassificera och analysera hur riskpremie-strategier, som är fullt implementerbara, presterar och påverkas av olika makroekonomiska miljöer. Studiens resultat är av intresse för investerare som antingen redan investerar i riskpremiestrategier eller som funderar på att investera. Studien lämnar även ett teoretiskt bidrag eftersom det i dagsläget finns få publicerade verk som behandlar detta ämne. För att analysera strategierna har en kombination av de statistiska metoderna cluster tree, spanning tree och principal component analysis använts. Detta för att dels kategorisera riskpremie-strategierna i olika kluster, baserat på deras inbördes korrelation, men också för att finna de faktorer som driver riskpremiestrategiernas avkastning. Slutligen har också en analys över hur de olika strategierna presterar under olika makroekonomiska miljöer genomförts där de makroekonomiska miljöerna representeras av inflation- och tillväxtindikatorer. Resultaten visar att de tre viktigaste faktorerna som driver riskpremiestrategiernas avkastning är en krisfaktor, en aktiemarknadsfaktor och en räntefaktor. Dessa tre faktorer förklarar ungefär 18 procent, 14 procent och 10 procent av den undersökta datans totala varians. Resultaten visar också att alla fyra kluster, trots att de innehåller olika typer av riskpremiestrategier, genererade positiv avkastning under alla makroekonmiska faser som studerades. Detta resultat ses som ett tecken på en lägre makroekonomisk känslighet bland riskpremiestrategier och mer av ett alfabeteende. Risk premia cluster tree spanning tree principal component analysis macroeconomics Riskpremier cluster tree spanning tree principal component analysis makroekonomi Economics and Business Ekonomi och näringsliv
67	A Multi-Level Extension of the Hierarchical PCA Framework with Applications to Portfolio Construction with Futures Contracts / En flernivåsutbyggnad av ramverket för Hierarkisk PCA med tillämpningar på portföljallokering med terminskontrakt Bjelle, Kajsa January 2023 (has links) With an increasingly globalised market and growing asset universe, estimating the market covariance matrix becomes even more challenging. In recent years, there has been an extensive development of methods aimed at mitigating these issues. This thesis takes its starting point in the recently developed Hierarchical Principal Component Analysis, in which a priori known information is taken into account when modelling the market correlation matrix. However, while showing promising results, the current framework only allows for fairly simple hierarchies with a depth of one. In this thesis, we introduce a generalisation of the framework that allows for an arbitrary hierarchical depth. We also evaluate the method in a risk-based portfolio allocation setting with Futures contracts. Furthermore, we introduce a shrinkage method called Hierarchical Shrinkage, which uses the hierarchical structure to further regularise the matrix. The proposed models are evaluated with respect to how well-conditioned they are, how well they predict eigenportfolio risk and portfolio performance when they are used to form the Minimum Variance Portfolio. We show that the proposed models result in sparse and easy-to-interpret eigenvector structures, improved risk prediction, lower condition numbers and longer holding periods while achieving Sharpe ratios that are at par with our benchmarks. / Med en allt mer globaliserad marknad och växande tillgångsuniversum blir det alltmer utmanande att uppskatta marknadskovariansmatrisen. Under senare år har det skett en omfattande utveckling av metoder som syftar till att mildra dessa problem. Detta examensarbete tar sin utgångspunkt i det nyligen utvecklade ramverket Hierarkisk Principalkomponentanalys, där kunskap känd sedan innan används för att modellera marknadskorrelationerna. Även om det visar lovande resultat så tillåter det nuvarande ramverket endast enkla hierarkier med ett djup på ett. I detta examensarbete introduceras en generalisering av detta ramverk, som tillåter ett godtyckligt hierarkiskt djup. Vi utvärderar också metoden i en riskbaserad portföljallokeringsmiljö med terminskontrakt. Vidare introducerar vi en krympningsmetod som vi kallar Hierarkisk Krympning. Hierarkisk krympning använder den hierarkiska strukturen för att ytterligare regularisera matrisen. De föreslagna modellerna av korrelationsmatrisen utvärderas med avseende på hur välkonditionerade de är, hur väl de förutsäger egenportföljrisk samt hur de presterar i portföljallokeringssyfte i en Minimum Variance portfölj. Vi visar att de introducerade modellerna resulterar i en gles och lätttolkad egenvektorstruktur, förbättrad riskprediktion, lägre konditionstal och längre hållperiod, samtidigt som portföljerna uppnår Sharpe-kvoter i linje med benchmarkmodellerna. Portfolio construction asset allocation principal component analysis hierarchical shrinkage eigenportfolio risk Portföljkonstruktion tillgångsallokering principalkomponentanalys hierarkisk principalkomponentanalys hierarkisk krympning egenportföljrisk Other Mathematics Annan matematik
68	Understanding particulate matter - Material analyses of real-life diesel particulate filters and correlation to vehicles’ operational data / Att förstå partiklar - Analyser av verkliga dieselpartikelfilter och korrelationer till fordonsdriftparametrar Nordin, Linus January 2021 (has links) Syftet med denna studie var att undersöka effekterna av driftsparametrar på ett antal mätbara askrelaterade parametrar i dieselpartikelfilter (DPF) i tunga fordon. Tidigare studier visar att askans packningsdensitet, askflöde och hur askan fördelas inuti ett DPF är beroende av parametrar som temperatur, avgasflöde och oljeförbrukning ett fordon har. Det finns anledning att tro att dessa parametrar också påverkas av hur ett fordon används, varför olika driftsparametrar analyserades för korrelation med de uppmätta askparametrarna. De driftsparametrar som undersöktes i denna studie var medelhastighet, antal stopp per 100 km, tomgångsprocent och bränsleförbrukning. Studien startade med metodutveckling av mätning av askvikter hos DPF och jämförde tre olika metoder, benämnda I, II och III. Metod II, som innebar att väga en bit av ett filter före och efter rengöring av filterstycket från aska med tryckluft valdes som den mest pålitliga och användbara metoden eftersom den var snabbare, behövde mindre av varje DPF för att ge kompletta resultat och kunde användas vid analys av DPF-prover som inte hade undersökts innan de användes i ett fordon. Askvikten, tillsammans med den volymetriska fyllningsgraden och genom att känna till inloppsvolymen för ett DPF användes för att beräkna askans packningsdensitet. Fyllningsgraden och askfördelningsprofilen mättes med bildanalys av mikroskopbilder av sågade tvärsnitt av filterstycket. Korrelationsstudien utfördes sedan med dessa metoder och korrelerades med operativa data extraherade från databaser på Scania CV. För att studera vilka parametrar som var korrelerade till varandra utfördes en principal component analysis (PCA) med de operativa och uppmätta variablerna som en matris av data. PCA-analysen visade att tre primalkomponenter (PC) utgjorde >90% av variationen i de erhållna data och att plug/wall-förhållandet, som är ett numeriskt värde för askfördelningen, var starkt positivt korrelerat med ett fordons medelhastighet och negativt korrelerat med antalet stopp, tomgångsprocent och bränsleförbrukning. Vidare visade askflödet en svagare positiv korrelation med tomgångsprocent, antal stopp och bränsleförbrukning medan oljeförbrukningen visade en ännu lägre korrelation med dessa parametrar. Detta indikerar att oljeförbrukningen ej skall ses som en konstant proportionell andel av bränsleförbrukningen för samtliga fordon vid beräkning av serviceintervall för DPFer. Askans packningsdensitet visade ingen till mycket låg korrelation med andra variabler i studien vilket kan bero på att proverna med hög andel väggaska har använts betydligt kortare sträcka än övriga prover, vilket kan ha gjort så att askan inte hunnits packas hårt i filterkanalerna. / The purpose of this study was to investigate the impact of operational parameters on a number of measurable ash related numbers within diesel particle filters (DPFs) of heavy duty vehicles. Previous studies show that ash packing density, ash flow and how the ash is distributed inside a DPF is dependent on parameters such as temperature, exhaust flow profiles and how much oil a vehicle consumes. There is reason to believe that these parameters are also affected by how a vehicle is operated which is why different operational parameters were analysed for correlation with the measured ash numbers. The operational parameters that was investigated in this study was average speed, number of stops per 100 km, idling percentage and fuel consumption. The study started with method development of measuring ash weights of DPFs and compared three different methods, named I, II and III. Method II, which relies on weighing a piece of a filter substrate before and after cleaning the filter piece from ash with pressurized air was chosen as the most reliable and useful method as it was faster, needed less of each DPF to complete the analysis and could be used when analysing DPF samples that had not been investigated previous to its use in a vehicle. The ash weight, together with the volumetric filling degree and known inlet volume of the DPF was used to calculate the ash packing density. The filling degree and ash distribution profile was measured with an image analysis of microscope images of sawed cross sections of the filter piece. The correlation study was then performed with these methods and correlated with operational data extracted from databases at Scania CV. To study which parameters were correlated to each other a primal component analysis (PCA) was performed with the operational and measured variables as a matrix of data. The PCA analysis showed that three primal components made up >90 % of variation in the data and that plug/wall ratio, which is a numerical value of the ash distribution, was strongly positively correlated with average speed of a vehicle and negatively correlated with number of stops, idling percentage and fuel consumption. Furthermore, ash flow showed a slight positive correlation with idling percentage, number of stops and fuel consumption while oil consumption showed an even slighter correlation with these parameters. This indicates that the oil consumption cannot be taken as a constant value as percentage of fuel consumption when calculating service intervals of DPFs. The ash packing density showed none to very low correlation with any other variables in the study, which could depend on the fact that the DPFs with high percentage of wall ash had a significantly lower runtime which could mean that the ash has not had time to be packed tightly in the filter channels. Diesel particulate filter Ash measurement method development Correlation study Operational parameters Principal component analysis Dieselpartikelfilter Askanalys metodutveckling Korrelationsstudie Driftsparametrar Principal component analysis Chemical Engineering Kemiteknik
69	Daily pattern recognition of dynamic origin-destination matrices using clustering and kernel principal component analysis / Daglig mönsterigenkänning av dynamiska Origin-Destination-matriser med hjälp av clustering och kernel principal component analysis Dong, Zhiwu January 2021 (has links) Origin-Destination (OD) matrix plays an important role in traffic management and urban planning. However, the OD estimation demands large data collection which has been done in past mostly by surveys with numerous limitations. With the development of communication technology and artificial intelligence technology, the transportation industry experiences new opportunities and challenges. Sensors bring big data characterized by 4V (Volume, Variety, Velocity, Value) to the transportation domain. This allows traffic practitioners to receive data covering large-scale areas and long time periods, even several years of data. At the same time, the introduction of artificial intelligence technology provides new opportunities and challenges in processing massive data. Advances from computer science have also brought revolutionary advancements in the field of transportation. All these new advances and technologies enable large data collection that can be used for extracting and estimating dynamic OD matrices for small time intervals and long time periods.Using Stockholm as the focus of the case study, this thesis estimates dynamic OD matrices covering data collected from the tolls located around Stockholm municipality. These dynamic OD matrices are used to analyze the day-to-day characteristics of the traffic flow that goes through Stockholm. In other words, the typical day-types of traffic through the city center are identified and studied in this work. This study analyzes the data collected by 58 sensors around Stockholm containing nearly 100 million vehicle observations (12GB).Furthermore, we consider and study the effects of dimensionality reduction on the revealing of most common day-types by clustering. The considered dimensionality reduction techniques are Principal Component Analysis (PCA) and its variant Kernel PCA (KPCA). The results reveal that dimensionality reduction significantly drops computational costs while resulting in reasonable day-types. Day-type clusters reveal expected as unexpected patterns and thus could have potential in traffic management, urban planning, and designing the strategy for congestion tax. / Origin-Destination (OD) -matrisen spelar en viktig roll i trafikledning och stadsplanering. Emellertid kräver OD-uppskattningen stor datainsamling, vilket har gjorts tidigare mest genom enkäter med många begränsningar. Med utvecklingen av kommunikationsteknik och artificiell intelligens upplever transportindustrin nya möjligheter och utmaningar. Sensorer ger stor data som kännetecknas av 4V (på engelska, volym, variation, hastighet, värde) till transportdomänen. Detta gör det möjligt för trafikutövare att ta emot data som täcker storskaliga områden och långa tidsperioder, till och med flera års data. Samtidigt ger introduktionen av artificiell intelligens teknik nya möjligheter och utmaningar i behandlingen av massiva data. Datavetenskapens framsteg har också lett till revolutionära framsteg inom transportområdet. Alla dessa nya framsteg och tekniker möjliggör stor datainsamling som kan användas för att extrahera och uppskatta dynamiska OD-matriser under små tidsintervall och långa tidsperioder.Genom att använda Stockholm som fokus för fallstudien uppskattar denna avhandling dynamiska OD-matriser som täcker data som samlats in från vägtullarna runt Stockholms kommun. Dessa dynamiska OD-matriser används för att analysera de dagliga egenskaperna hos trafikflödet i Stockholm genom stadens centrum. Med andra ord känns igen och studeras de typiska dagtyperna av trafik genom stadens centrum i detta arbete. Denna studie analyserar data som samlats in av 58 sensorer runt Stockholm som innehåller nästan 100 miljoner fordonsobservationer (12 GB)Dessutom överväger och studerar vi effekterna av dimensioneringsreduktion på avslöjandet av de vanligaste dagtyperna genom kluster. De betraktade dimensioneringsreduktionsteknikerna är Principal Component Analysis (PCA) och dess variant Kernel PCA (KPCA). Resultaten avslöjar att dimensioneringsreduktion avsevärt minskar beräkningskostnaderna, samtidigt som det ger rimliga dagtyper. Dagstyp kluster avslöjar förväntade som oväntade mönster och därmed kan ha potential i trafikledning, stadsplanering och utformning av strategin för trängselskatt. origin-destination pattern recognition principal component analysis kernel principal component analysis k-means cluster ursprung-destination mönsterigenkänning huvudkomponentanalys analys av kärnhuvudkomponent k-medelkluster Transport Systems and Logistics Transportteknik och logistik
70	Sparse Principal Component Analysis for High-Dimensional Data: A Comparative Study Bonner, Ashley J. 10 1900 (has links) <p><strong>Background:</strong> Through unprecedented advances in technology, high-dimensional datasets have exploded into many fields of observational research. For example, it is now common to expect thousands or millions of genetic variables (p) with only a limited number of study participants (n). Determining the important features proves statistically difficult, as multivariate analysis techniques become flooded and mathematically insufficient when n < p. Principal Component Analysis (PCA) is a commonly used multivariate method for dimension reduction and data visualization but suffers from these issues. A collection of Sparse PCA methods have been proposed to counter these flaws but have not been tested in comparative detail. <strong>Methods:</strong> Performances of three Sparse PCA methods were evaluated through simulations. Data was generated for 56 different data-structures, ranging p, the number of underlying groups and the variance structure within them. Estimation and interpretability of the principal components (PCs) were rigorously tested. Sparse PCA methods were also applied to a real gene expression dataset. <strong>Results:</strong> All Sparse PCA methods showed improvements upon classical PCA. Some methods were best at obtaining an accurate leading PC only, whereas others were better for subsequent PCs. There exist different optimal choices of Sparse PCA methods when ranging within-group correlation and across-group variances; thankfully, one method repeatedly worked well under the most difficult scenarios. When applying methods to real data, concise groups of gene expressions were detected with the most sparse methods. <strong>Conclusions:</strong> Sparse PCA methods provide a new insightful way to detect important features amidst complex high-dimension data.</p> / Master of Science (MSc) Principal Component Analysis (PCA) High Dimensional Data Simulations Loading Vectors Tuning Parameters Applied Statistics Biostatistics Multivariate Analysis Statistical Methodology Applied Statistics

Search results