Spelling suggestions: "subject:"anda principal component analysis"" "subject:"ando principal component analysis""
311 |
Analysis of Transactional Data with Long Short-Term Memory Recurrent Neural NetworksNawaz, Sabeen January 2020 (has links)
An issue authorities and banks face is fraud related to payments and transactions where huge monetary losses occur to a party or where money laundering schemes are carried out. Previous work in the field of machine learning for fraud detection has addressed the issue as a supervised learning problem. In this thesis, we propose a model which can be used in a fraud detection system with transactions and payments that are unlabeled. The proposed modelis a Long Short-term Memory in an auto-encoder decoder network (LSTMAED)which is trained and tested on transformed data. The data is transformed by reducing it to Principal Components and clustering it with K-means. The model is trained to reconstruct the sequence with high accuracy. Our results indicate that the LSTM-AED performs better than a random sequence generating process in learning and reconstructing a sequence of payments. We also found that huge a loss of information occurs in the pre-processing stages. / Obehöriga transaktioner och bedrägerier i betalningar kan leda till stora ekonomiska förluster för banker och myndigheter. Inom maskininlärning har detta problem tidigare hanterats med hjälp av klassifierare via supervised learning. I detta examensarbete föreslår vi en modell som kan användas i ett system för att upptäcka bedrägerier. Modellen appliceras på omärkt data med många olika variabler. Modellen som används är en Long Short-term memory i en auto-encoder decoder nätverk. Datan transformeras med PCA och klustras med K-means. Modellen tränas till att rekonstruera en sekvens av betalningar med hög noggrannhet. Vår resultat visar att LSTM-AED presterar bättre än en modell som endast gissar nästa punkt i sekvensen. Resultatet visar också att mycket information i datan går förlorad när den förbehandlas och transformeras.
|
312 |
Deriving the Time-Course of the Dominant Frequency of Atrial Fibrillation from a Long Term in vivo Sheep Model using QRST Removal TechniquesPrice, Nicholas F. January 2011 (has links)
No description available.
|
313 |
Nonlinear Wavelet Compression Methods for Ion Analyses and Dynamic Modeling of Complex SystemsCao, Libo January 2004 (has links)
No description available.
|
314 |
Facial Expression Recognition by Using Class Mean Gabor Responses with Kernel Principal Component AnalysisChung, Koon Yin C. 16 April 2010 (has links)
No description available.
|
315 |
Characterizing the Quaternary Hydrostratigraphy of Buried Valleys using Multi-Parameter Borehole Geophysics, Georgetown, OntarioBrennan, Andrew N. 10 1900 (has links)
<p>In 2009, the Regional Municipality of Halton and McMaster University initiated a 2-year collaborative study (Georgetown Aquifer Characterization Study-GACS) of the groundwater resource potential of Quaternary sediments near Georgetown, Ontario. As part of that study, this thesis investigated the Quaternary infill stratigraphy of the Middle Sixteen Mile Creek (MSMC) and Cedarvale (CV) buried valley systems using newly acquired core and borehole geophysical data. Multi-parameter geophysical log suites (natural gamma, EM conductivity, resistivity, magnetic susceptibility, full-waveform sonic, caliper) were acquired in 16 new boreholes (16 m to 55 m depth), pre-existing monitoring wells and from archival data. Characteristic log responses (electrofacies) were identified and correlated with core to produce a detailed subsurface model of a 20-km<sup>2</sup> area to the southwest of Georgetown. Nine distinctive lithostratgraphic units were identified and their geometry mapped across the study area as structure contour and isochore thickness maps. The subsurface model shows that the CV valley truncates the Late Wisconsin MSMC stratigraphy along a channelized erosional unconformity and is a younger (post-glacial?) sediment-hosted valley system. Model results demonstrate the high level of stratigraphic heterogeneity and complexity that is inherent in bedrock valley systems and provides a geological framework for understanding groundwater resource availability.</p> <p>Principal component analysis (PCA) was applied to selected log suites to evaluate the potential for objective lithologic classification using log data. Gamma, resistivity and conductivity logs were most useful for lithologic typing, while p-wave velocity and resistivity logs were more diagnostic of compact diamict units. Cross plots of the first and second principal components of log parameters discriminated silts and clays/shales from sand/gravel and diamict lithofacies. The results show that PCA is a viable method for predicting subsurface lithology in un-cored boreholes and can assist in the identification of hydrostratigraphic units.</p> / Master of Science (MSc)
|
316 |
An integrated approach to the taxonomic identification of prehistoric shell ornamentsDemarchi, B., O'Connor, Sonia A., de Lima Ponzoni, A., de Almeida Rocha Ponzoni, R., Sheridan, A., Penkman, K.E.H., Hancock, Y., Wilson, J. 17 May 2014 (has links)
Yes / Shell beads appear to have been one of the earliest examples of personal adornments. Marine shells identified far from the shore evidence long-distance transport and imply networks of exchange and negotiation. However, worked beads lose taxonomic clues to identification, and this may be compounded by taphonomic alteration. Consequently, the significance of this key early artefact may be underestimated. We report the use of bulk amino acid composition of the stable intra-crystalline proteins preserved in shell biominerals and the application of pattern recognition methods to a large dataset (777 samples) to demonstrate that taxonomic identification can be achieved at genus level. Amino acid analyses are fast (<2 hours per sample) and micro-destructive (sample size <2 mg). Their integration with non-destructive techniques provides a valuable and affordable tool, which can be used by archaeologists and museum curators to gain insight into early exploitation of natural resources by humans. Here we combine amino acid analyses, macro- and microstructural observations (by light microscopy and scanning electron microscopy) and Raman spectroscopy to try to identify the raw material used for beads discovered at the Early Bronze Age site of Great Cornard (UK). Our results show that at least two shell taxa were used and we hypothesise that these were sourced locally.
|
317 |
Development and Application of Novel Computer Vision and Machine Learning TechniquesDepoian, Arthur Charles, II 08 1900 (has links)
The following thesis proposes solutions to problems in two main areas of focus, computer vision and machine learning. Chapter 2 utilizes traditional computer vision methods implemented in a novel manner to successfully identify overlays contained in broadcast footage. The remaining chapters explore machine learning algorithms and apply them in various manners to big data, multi-channel image data, and ECG data. L1 and L2 principal component analysis (PCA) algorithms are implemented and tested against each other in Python, providing a metric for future implementations. Selected algorithms from this set are then applied in conjunction with other methods to solve three distinct problems. The first problem is that of big data error detection, where PCA is effectively paired with statistical signal processing methods to create a weighted controlled algorithm. Problem 2 is an implementation of image fusion built to detect and remove noise from multispectral satellite imagery, that performs at a high level. The final problem examines ECG medical data classification. PCA is integrated into a neural network solution that achieves a small performance degradation while requiring less then 20% of the full data size.
|
318 |
Multiscale process monitoring with singular spectrum analysisKrishnannair, Syamala 12 1900 (has links)
Thesis (MScEng (Process Engineering))--University of Stellenbosch, 2010. / Thesis presented in partial fulfilment of the requirements for the degree
of
Master of Science in Engineering
(Extractive Metallurgy)
In the Department of Process Engineering
at the University of Stellenbosch / ENGLISH ABSTRACT: Multivariate statistical process control (MSPC) approaches are now widely used for performance monitoring, fault detection and diagnosis in chemical processes. Conventional MSPC approaches are based on latent variable projection methods such as principal component analysis and partial least squares. These methods are suitable for handling linearly correlated data sets, with minimal autocorrelation in the variables. Industrial plant data invariably violate these conditions, and several extensions to conventional MSPC methodologies have been proposed to account for these limitations.
In practical situations process data usually contain contributions at multiple scales because of different events occurring at different localizations in time and frequency. To account for such multiscale nature, monitoring techniques that decompose observed data at different scales are necessary. Hence the use of standard MSPC methodologies may lead to unreliable results due to false alarms and significant loss of information.
In this thesis a multiscale methodology based on the use of singular spectrum analysis is proposed. Singular spectrum analysis (SSA) is a linear method that extracts information from the short and noisy time series by decomposing the data into deterministic and stochastic components without prior knowledge of the dynamics affecting the time series. These components can be classified as independent additive time series of slowly varying trend, periodic series and aperiodic noise. SSA does this decomposition by projecting the original time series onto a data-adaptive vector basis obtained from the series itself based on principal component analysis (PCA).
The proposed method in this study treats each process variable as time series and the autocorrelation between the variables are explicitly accounted for. The data-adaptive nature of SSA makes the proposed method more flexible than other spectral techniques using fixed basis functions. Application of the proposed technique is demonstrated using simulated, industrial data and the Tennessee Eastman Challenge process. Also, a comparative analysis is given using the simulated and Tennessee Eastman process. It is found that in most cases the proposed method is superior in detecting process changes and faults of different magnitude accurately compared to classical statistical process control (SPC) based on latent variable methods as well as the wavelet-based multiscale SPC. / AFRIKAANSE OPSOMMING: Meerveranderlike statistiese prosesbeheerbenaderings (MSPB) word tans wydverspreid benut vir werkverrigtingkontrolering, foutopsporing en .diagnose in chemiese prosesse. Gebruiklike MSPB word op latente veranderlike projeksiemetodes soos hoofkomponentontleding en parsiele kleinste-kwadrate gebaseer. Hierdie metodes is geskik om lineer gekorreleerde datastelle, met minimale outokorrelasie, te hanteer. Nywerheidsaanlegdata oortree altyd hierdie voorwaardes, en verskeie MSPB is voorgestel om verantwoording te doen vir hierdie beperkings.
Prosesdata afkomstig van praktiese toestande bevat gewoonlik bydraes by veelvuldige skale, as gevolg van verskillende gebeurtenisse wat by verskillende lokaliserings in tyd en frekwensie voorkom. Kontroleringsmetodes wat waargenome data ontbind by verskillende skale is nodig om verantwoording te doen vir sodanige multiskaalgedrag. Derhalwe kan die gebruik van standaard-MSPB weens vals alarms en beduidende verlies van inligting tot onbetroubare resultate lei.
In hierdie tesis word . multiskaalmetodologie gebaseer op die gebruik van singuliere spektrumontleding (SSO) voorgestel. SSO is . lineere metode wat inligting uit die kort en ruiserige tydreeks ontrek deur die data in deterministiese en stochastiese komponente te ontbind, sonder enige voorkennis van die dinamika wat die tydreeks affekteer. Hierdie komponente kan as onafhanklike, additiewe tydreekse geklassifiseer word: stadigveranderende tendense, periodiese reekse en aperiodiese geruis. SSO vermag hierdie ontbinding deur die oorspronklike tydreeks na . data-aanpassende vektorbasis te projekteer, waar hierdie vektorbasis verkry is vanaf die tydreeks self, gebaseer op hoofkomponentontleding.
Die voorgestelde metode in hierdie studie hanteer elke prosesveranderlike as . tydreeks, en die outokorrelasie tussen veranderlikes word eksplisiet in berekening gebring. Aangesien die SSO metode aanpas tot data, is die voorgestelde metode meer buigsaam as ander spektraalmetodes wat gebruik maak van vaste basisfunksies. Toepassing van die voorgestelde tegniek word getoon met gesimuleerde prosesdata en die Tennessee Eastman-proses. . Vergelykende ontleding word ook gedoen met die gesimuleerde prosesdata en die Tennessee Eastman-proses. In die meeste gevalle is dit gevind dat die voorgestelde metode beter vaar om prosesveranderings en .foute met verskillende groottes op te spoor, in vergeleke met klassieke statistiese prosesbeheer (SP) gebaseer op latente veranderlikes, asook golfie-gebaseerde multiskaal SP.
|
319 |
市場風險因子情境產生方法之研究 / Methodology for Risk Factors Scenario Generation陳育偉, Chen,Yu-Wei Unknown Date (has links)
由於金融事件層出不窮,控管風險已成為銀行、證券、保險各種金融產業的重要課題。其中Value-at-Risk(VaR)模型為銀行與證券業最常用來衡量其市場風險的模型。VaR模型中的蒙地卡羅模擬法是將投資組合持有部位以適當的市場風險因子來表示,接著產生市場風險因子的各種情境,再結合評價公式以求得投資組合在某一段持有期間內、某一信心水準之下的最低價值,再將最低價值減去原來之價值,便為可能的最大損失(Jorion, 2007)。 / 使用蒙地卡羅模擬法產生市場風險因子的各種情境,必須先估計市場風險因子的共變異數矩陣,再藉此模擬出數千種市場風險因子情境。本研究便是將蒙地卡羅模擬法加入隨著時間改變之共變異數矩陣(time-varying covariance matrix)的概念並減少市場風險因子個數,利用蒙地卡羅模擬法配合Constant模型、UWMA模型、EWMA模型、Orthogonal EWMA模型、Orthogonal GARCH模型、PCA EWMA模型、PCA GARCH模型來產生市場風險因子未來的情境並比較各方法對長天期與短天期風險衡量之優劣。結果顯示PCA EWMA模型的效果最好,因此建議各大金融機構可採用PCA EWMA模型來控管其投資組合短天期與長天期的市場風險。
|
320 |
"Spaghetti "主成份分析之延伸-應用於時間相關之區間型台灣股價資料 / An extension of Spaghetti PCA for time dependent interval data陳品達, Chen, Pin-Da Unknown Date (has links)
摘要
近幾年發展的區間型態資料之主成份分析,運用在某些領域的資料上尚未成熟,例如股票價格的資料,這些資料是與時間息息相關地,於是有了時間相關的區間資料分析 (Irpino, 2006. Pattern Recognition Letters 27, 504-513)。本文延續這個分析,針對時間相關之區間型台灣股價資料進行研究。Irpino (2006) 的方法只考慮每週的開盤價與收盤價,為了得到更多資訊,我們提出三種方法,第一個方法,將每週的最高價(最低價)納入分析,由兩點的分析變成三點的分析;第二個方法,我們同時考慮最高價與最低價,變成四點的分析,這兩個方法都能得到原始方法不能得到的資訊-公司的穩定度,其中又以第二個方法較為準確;第三種方法引用Irpino (2006) 的建議,我們改變區間的分配,而此方法得到的結果與原
始的方法差異不大。
本文分別收集了台灣金融市場三十家半導體與台指五十中的四十七家公司於民國九十七年九月一號到十二月二十六號共十七週的股價資料進行實證分析。以台指五十為例,分析結果顯示編號17的台達電子工業股份有限公司、編號24的鴻海科技集團,這兩家公司的未來被看好;而編號10的聯陽半導體股份有限公司、編號35的統一超商股份有限公司,此兩家公司的未來不被看好,這四家公司在民國九十八年一月五號到一月七號三天的走勢確實是如此!此外,結果顯示
金融體系的公司比電子體系的公司來得穩定。
關鍵字:主成份分析,區間型資料,時間相關 / ABSTRACT
The methods for principal component analysis on interval data have not been ripe yet in some areas, for example, the data of stock prices that are closely related to the time, so the analysis of time dependent interval data was proposed (Irpino, 2006. Pattern Recognition Letters 27, 504-513). In this paper, we apply this approach to the stock prices data in Taiwan. The original “Spaghetti” PCA in Irpino (2006) considered only the starting and the ending prices for each week. In order to get more information we propose three methods. We consider the highest (lowest) price for each week to our analysis in Method 1, and the analysis changes from two points to three points. In Method 2, we consider all information to our analysis which considers four points. These two methods can get more information than the original one. For example, we can get the information of stability degree of the company. For the Method 3, we quote the suggestion from Irpino (2006) to change the distribution of intervals from uniform to beta. However, the result is similar to the original result.
In our approach, we collect data of stock prices from 37 companies of semiconductor and 47 companies of TSEC Taiwan 50 index in Taiwan financial market during the 17 weeks from September 1 to December 26, 2008. For TSEC Taiwan 50 index, the results of this analysis are that the future trend of Delta (Delta Electronics Incorporation) which numbers 17 and Foxconn (Foxconn Electronics Incorporation) which numbers 24 are optimistic; And ITE (Integrated Technology Express) which numbers 10 and 7-ELEVEn (President Chain Store Corporation) which numbers 35 are not good. In fact, the trends of these four companies are indicated these results during January 5th to 7th. What’s more, the financial companies are steadier than the electronic industry.
Keywords: Principal component analysis; Interval data; Time dependent
|
Page generated in 0.1413 seconds