Global ETD Search

1	Extending the Growing Hierarchical Self Organizing Maps for a Large Mixed-Attribute Dataset Using Spark MapReduce Malondkar, Ameya Mohan January 2015 (has links) In this thesis work, we propose a Map-Reduce variant of the Growing Hierarchical Self Organizing Map (GHSOM) called MR-GHSOM, which is capable of handling mixed attribute datasets of massive size. The Self Organizing Map (SOM) has proved to be a useful unsupervised data analysis algorithm. It projects a high dimensional data onto a lower dimensional grid of neurons. However, the SOM has some limitations owing to its static structure and the incapability to mirror the hierarchical relations in the data. The GHSOM overcomes these shortcomings of the SOM by providing a dynamic structure that adapts its shape according to the input data. It is capable of growing dynamically in terms of the size of the individual neuron layers to represent data at the desired granularity as well as in depth to model the hierarchical relations in the data. However, the training of the GHSOM requires multiple passes over an input dataset. This makes it difficult to use the GHSOM for massive datasets. In this thesis work, we propose a Map-Reduce variant of the GHSOM called MR-GHSOM, which is capable of processing massive datasets. The MR-GHSOM is implemented using the Apache Spark cluster computing engine and leverages the popular Map-Reduce programming model. This enables us to exploit the usefulness and dynamic capabilities of the GHSOM even for a large dataset. Moreover, the conventional GHSOM algorithm can handle datasets with numeric attributes only. This is owing to the fact that it relies heavily on the Euclidean space dissimilarity measures of the attribute vectors. The MR-GHSOM further extends the GHSOM to handle mixed attribute - numeric and categorical - datasets. It accomplishes this by adopting the distance hierarchy approach of managing mixed attribute datasets. The proposed MR-GHSOM is thus capable of handling massive datasets containing mixed attributes. To demonstrate the effectiveness of the MR-GHSOM in terms of clustering of mixed attribute datasets, we present the results produced by the MR-GHSOM on some popular datasets. We further train our MR-GHSOM on a Census dataset containing mixed attributes and provide an analysis of the results. Self Organizing Map Map-Reduce Growing Hierarchical Self Organizing Map GHSOM SOM Apache Spark
2	財務報表舞弊之探索研究 / Exploring financial reporting fraud 徐國英 Unknown Date (has links) Financial reporting fraud leads to not only significant investment risks for external stockholders, but also financial crises for the capital market. Although the issue of fraudulent financial reporting has drawn much attention, relevant research is much less than issues of predicting financial distress or bankruptcy. Furthermore, one purpose of exploring the financial reporting fraud with various forms is to obtain a better understand of the corporate through investigating its financial and corporate governance indicators. This study addresses the challenge with proposing an approach with the following four phases: (1) to identify a set of financial and corporate governance indicators that are significantly correlated with the financial reporting fraud; (2) to use the Growing Hierarchical Self-Organizing Map (GHSOM) to cluster the normal and fraud listed corporate data; (3) to extract knowledge about the financial reporting fraud through observing the hierarchical relationship displayed in the trained GHSOM; and (4) to make the justification of the extracted knowledge. The proposed approach is feasible because researchers claim that the GHSOM can discover the hidden hierarchical relationship from data with high dimensionality. 財務報表舞弊成長階層式自我組織圖知識擷取 Financial Reporting Fraud Knowledge Extraction
3	適用於財務舞弊偵測之決策支援系統的對偶方法 / A dual approach for decision support in financial fraud detection 黃馨瑩, Huang, Shin Ying Unknown Date (has links) 增長層級式自我組織映射網路(GHSOM)屬於一種非監督式類神經網路，為自我組織映射網路(SOM)的延伸，擅長於對樣本分群，以輔助分析樣本族群裡的共同特徵，並且可以透過族群間存在的空間關係假設來建立分類器，進而辨別出異常的資料。因此本研究提出一個創新的對偶方法(即為一個建立決策支援系統架構的方法)分別對舞弊與非舞弊樣本分群，首先兩類別之群組會被配對，即辨識某一特定無弊群體的非舞弊群體對照組，針對這些配對族群，套用基於不同空間假設所設立的分類規則以檢測舞弊與非舞弊群體中是否有存在某種程度的空間關係，此外並對於舞弊樣本的分群結果加入特徵萃取機制。分類績效最好的分類規則會被用來偵測受測樣本是否有舞弊的嫌疑，萃取機制的結果則會用來標示有舞弊嫌疑之受測樣本的舞弊行為特徵以及相關的輸入變數，以做為後續的決策輔助。更明確地說，本研究分別透過非舞弊樣本與舞弊樣本建立一個非舞弊GHSOM樹以及舞弊GHSOM樹，且針對每一對GHSOM群組建立分類規則，其相應的非舞弊/舞弊為中心規則會適應性地依循決策者的風險偏好最佳化調整規則界線，整體而言較優的規則會被決定為分類規則。非舞弊為中心的規則象徵絕大多數的舞弊樣本傾向分布於非舞弊樣本的周圍，而舞弊為中心的規則象徵絕大多數的非舞弊樣本傾向分布於舞弊樣本的周圍。此外本研究加入了一個特徵萃取機制來發掘舞弊樣本分群結果中各群組之樣本資料的共同特質，其包含輸入變數的特徵以及舞弊行為模式，這些資訊將能輔助決策者(如資本提供者)評估受測樣本的誠實性，輔助決策者從分析結果裡做出更進一步的分析來達到審慎的信用決策。本研究將所提出的方法套用至財報舞弊領域(屬於財務舞弊偵測的子領域)進行實證，實驗結果證實樣本之間存在特定的空間關係，且相較於其他方法如SVM、SOM+LDA和GHSOM+LDA皆具有更佳的分類績效。因此顯示本研究所提出的機制可輔助驗證財務相關數據的可靠性。此外，根據SOM的特質，即任何受測樣本歸類到某特定族群時，該族群訓練樣本的舞弊行為特徵將可以代表此受測樣本的特徵推論。這樣的原則可以用來協助判斷受測樣本的可靠性，並可供持續累積成一個舞弊知識庫，做為進一步分析以及制定相關信用決策的參考。本研究所提出之基於對偶方法的決策支援系統架構可以被套用到其他使用財務數據為資料來源的財務舞弊偵測情境中，作為輔助決策的基礎。 / The Growing Hierarchical Self-Organizing Map (GHSOM) is extended from the Self-Organizing Map (SOM). The GHSOM’s unsupervised learning nature such as the adaptive group size as well as the hierarchy structure renders its availability to discover the statistical salient features from the clustered groups, and could be used to set up a classifier for distinguishing abnormal data from regular ones based on spatial relationships between them. Therefore, this study utilizes the advantage of the GHSOM and pioneers a novel dual approach (i.e., a proposal of a DSS architecture) with two GHSOMs, which starts from identifying the counterparts within the clustered groups. Then, the classification rules are formed based on a certain spatial hypothesis, and a feature extraction mechanism is applied to extract features from the fraud clustered groups. The dominant classification rule is adapted to identify suspected samples, and the results of feature extraction mechanism are used to pinpoint their relevant input variables and potential fraud activities for further decision aid. Specifically, for the financial fraud detection (FFD) domain, a non-fraud (fraud) GHSOM tree is constructed via clustering the non-fraud (fraud) samples, and a non-fraud-central (fraud-central) rule is then tuned via inputting all the training samples to determine the optimal discrimination boundary within each leaf node of the non-fraud (fraud) GHSOM tree. The optimization renders an adjustable and effective rule for classifying fraud and non-fraud samples. Following the implementation of the DSS architecture based on the proposed dual approach, the decision makers can objectively set their weightings of type I and type II errors. The classification rule that dominates another is adopted for analyzing samples. The dominance of the non-fraud-central rule leads to an implication that most of fraud samples cluster around the non-fraud counterpart, meanwhile the dominance of fraud-central rule leads to an implication that most of non-fraud samples cluster around the fraud counterpart. Besides, a feature extraction mechanism is developed to uncover the regularity of input variables and fraud categories based on the training samples of each leaf node of a fraud GHSOM tree. The feature extraction mechanism involves extracting the variable features and fraud patterns to explore the characteristics of fraud samples within the same leaf node. Thus can help decision makers such as the capital providers evaluate the integrity of the investigated samples, and facilitate further analysis to reach prudent credit decisions. The experimental results of detecting fraudulent financial reporting (FFR), a sub-field of FFD, confirm the spatial relationship among fraud and non-fraud samples. The outcomes given by the implemented DSS architecture based on the proposed dual approach have better classification performance than the SVM, SOM+LDA, GHSOM+LDA, SOM, BPNN and DT methods, and therefore show its applicability to evaluate the reliability of the financial numbers based decisions. Besides, following the SOM theories, the extracted relevant input variables and the fraud categories from the GHSOM are applicable to all samples classified into the same leaf nodes. This principle makes that the extracted pre-warning signal can be applied to assess the reliability of the investigated samples and to form a knowledge base for further analysis to reach a prudent decision. The DSS architecture based on the proposed dual approach could be applied to other FFD scenarios that rely on financial numbers as a basis for decision making. 增長層級式自我組織映射網路非監督式類神經網路分類財務舞弊偵測財務報表舞弊 Growing Hierarchical Self-Organizing Map Unsupervised Neural Networks Classification Financial Fraud Detection Fraudulent Financial Reporting

1

Page generated in 0.157 seconds