1 |
機器學習分類方法DCG 與其他方法比較(以紅酒為例) / A supervised learning study of comparison between DCG tree and other machine learning methods in a wine quality dataset楊俊隆, Yang, Jiun Lung Unknown Date (has links)
隨著大數據時代來臨,機器學習方法已然成為熱門學習的主題,主要分為監督式學習與非監督式學習,亦即分類與分群。本研究以羅吉斯迴歸配適結果加權距離矩陣,以資料雲幾何樹分群法為主,在含有類別變數的紅酒資料中,透過先分群再分類的方式,判斷是否可以得到更佳的預測結果。並比較監督式學習下各種機器學習方法預測表現,及非監督式學習下後再透過分類器方法的預測表現。在內容的排序上,首先介紹常見的分類與分群演算方法,並分析其優缺點與假設限制,接著將介紹資料雲幾何樹演算法,並詳述執行步驟。最後再引入加權資料雲幾何樹演算法,將權重的觀點應用在資料雲幾何樹演算法中,透過紅酒資料,比較各種分類與分群方法的預測準確率。 / Machine learning has become a popular topic since the coming of big data era. Machine learning algorithms are often categorized as being supervised or unsupervised, namely classification or clustering methods. In this study, first, we introduced the advantages, disadvantages, and limits of traditional classification and clustering algorithms. Next, we introduced DCG-tree and WDCG algorithms. We extended the idea of WDCG to the cases with label size=3. The distance matrix was modified by the fitted results of logistic regression. Lastly, by using a real wine dataset, we then compared the performance of WDCG with the performance of traditional classification methodologies. The study showed that using unsupervised learning algorithm with logistic regression as a classifier performs better than using only the traditional classification methods.
|
2 |
非監督式新細胞認知機神經網路之研究 / Studies on the Unsupervised Neocognitron陳彥勳, Chen, Yen-Shiun Unknown Date (has links)
本論文使用非監督式新細胞認知機(Unsupervised neocognitron)神經網路來便是印刷體中文字。
關於非監督式新細胞認知機,本論文提出兩項修改。第一項,Us1子層的結點不進行學習,而是直接套用人為方式所指定的12個區域特徵,而Us1之後的S子層仍然使用非監督式學習的方式決定其所要偵測的區域特徵。第二項修改則是,在學習過中設定一個上限值來限制代表節點(representative)產生的個數。如此設計的目的是為了避免模板(cell-planes)分配不均的問題。在本研究,採用這兩項修改的新細胞認知機稱為模式一,而使用第二項修改的新細胞認知機稱為模式二。
本論文裡的所有實驗分為兩部分。在第一部分有四個實驗,這些實驗都使用相同的訓練範例與測試範例。訓練範例有兩組,第一組包含“川”,“三”,“大”,“人”,“台”等五個中文字。而第二組包含“零”,“壹”,“貳”,“參”,“肆”等中文字。訓練範例都市採用細明體,而測試範例則是採用其他九種不同字體。第一個實驗的主要目的是測試模式一的績效。實驗結果顯示,模式一很容易學習成功而且辨識率可以接受。另外三個實驗的目的是想要了解某些參數值與系統績效的關係。這些參數包含S-欄的大小(the size of S-column),模板樹(the number of cell-planes),以及節點的接收場大小(the size of cells’ receptive field)。這三個實驗所使用的網路系統是模式一。
第二部分有二個實驗,主要的目的是比較模式一與模式二的系統績效。在第一個實驗,所使用的訓練範例與第一部分實驗相同。實驗結果顯示模式一比較容易成功地學習,而且系統有不錯的表現。第二個實驗,使用17個中文字做為訓練範例。這17個字包括“零”,“壹”,“貳”,“參”,“肆”,“伍”,“陸”,“柒”,“捌”,“玖”,“拾”,“佰”,“仟”,“萬”,“億”,“圓”,“角”。實驗結果顯示,模式一仍然是一個不錯的系統。 / In this study, we are investigating the feasibility of applying the unsupervised neocognitron to the recognition of printed Chinese characters.
Two propositions for the unsupervised neocognitron are mentioned. The first on proposes that the input connections of the first layer are manually given, and all subsequent layers are trained unsupervised. The second one concerns the selection of representatives. During the process of learning, the number of cell-planes that send representatives for each training pattern has an upper bound. The unsupervised neocognitron with implementing these two propositions is named as Model 1, and the unsupervised neocognitron with implementing only the second proposition is named as Model 2.
Experiment in this study are grouped into two parts, called Part I and Part II. In Part I, four experiments are conducted. For each experiment, two sets of training patterns will be conducted respectively. The first one, called the simple training set, consists of five printed Chinese characters“川”,“三”,“大”,“人”, and “台” with size of 25*25 in MingLight font. The second one, called the complex training set, contains another five printed Chinese characters“零”,“壹”,“貳”,“參”, and “肆” in the some font and size. After training, these characters of other nine different fonts are presented to test the generalization of the network.
The objective of the first experiment of Part I is to investigate the performance of Model 1. Simulation results shot that Model 1 demonstrates a good ability to achieve a successful learning. In other three experiments, the effect of choosing different value for some parameters in investigated. The parameters include the size of S-column, the number of cell-planes, and the receptive field of cells.
In Part II, a comparison of the performance of Model 1 and Model 2 is made. In the first experiment, Model 1 and Model 2 are trained to recognize the simple and complex training sets described above. Experimental results show that Model 1 shows higher ability to achieve a successful learning, and performance of Model 1 is acceptable. In the second experiment, 17 training patterns are presented during the learning process. These training patterns include “零”,“壹”,“貳”,“參”,“肆”,“伍”,“陸”,“柒”,“捌”,“玖”,“拾”,“佰”,“仟”,“萬”,“億”,“圓”,, and “角”. From the simulation results, Model 1 is a promising approach for the recognition of printed Chinese characters.
|
Page generated in 0.0232 seconds