1 |
在Spark大數據平台上分析DBpedia開放式資料:以電影票房預測為例 / Analyzing DBpedia Linked Open Data (LOD) on Spark:Movie Box Office Prediction as an Example劉文友, Liu, Wen Yu Unknown Date (has links)
近年來鏈結開放式資料 (Linked Open Data,簡稱LOD) 被認定含有大量潛在價值。如何蒐集與整合多元化的LOD並提供給資料分析人員進行資料的萃取與分析,已成為當前研究的重要挑戰。LOD資料是RDF (Resource Description Framework) 的資料格式。我們可以利用SPARQL來查詢RDF資料,但是目前對於大量RDF的資料除了缺少一個高性能且易擴展的儲存和查詢分析整合性系統之外,對於RDF大數據資料分析流程的研究也不夠完備。本研究以預測電影票房為例,使用DBpedia LOD資料集並連結外部電影資料庫 (例如:IMDb),並在Spark大數據平台上進行巨量圖形的分析。首先利用簡單貝氏分類與貝氏網路兩種演算法進行電影票房預測模型實例的建構,並使用貝氏訊息準則 (Bayesian Information Criterion,簡稱BIC) 找到最佳的貝氏網路結構。接著計算多元分類的ROC曲線與AUC值來評估本案例預測模型的準確率。 / Recent years, Linked Open Data (LOD) has been identified as containing large amount of potential value. How to collect and integrate multiple LOD contents for effective analytics has become a research challenge. LOD is represented as a Resource Description Framework (RDF) format, which can be queried through SPARQL language. But large amount of RDF data is lack of a high performance and scalable storage analysis system. Moreover, big RDF data analytics pipeline is far from perfect. The purpose of this study is to exploit the above research issue. A movie box office sale prediction scenario is demonstrated by using DBpedia with external IMDb movie database. We perform the DBpedia big graph analytics on the Apache Spark platform. The movie box office prediction for optimal model selection is first evaluated by BIC. Then, Naïve Bayes and Bayesian Network optimal model’s ROC and AUC values are obtained to justify our approach.
|
2 |
建構台灣銀行業預警系統-貝氏網路模型之運用 / Bayesian model for bank failure risk in Taiwan黃薰儀, Huang, Hsun Yi Unknown Date (has links)
國際研究中雖有針對國家級的銀行脆弱性作分析,卻並未定義或預測台灣系統性危機,本研究在這樣的背景下,決定建構台灣本土的銀行業預警系統,建立銀行危機的領先指標,希望不只順應國際潮流,更能發展適合台灣特殊性的模型。本研究利用貝氏網路模型的特殊性: (1)事後值(2)機率特性,以個體化資料著手,建構一總體性模型。故研究者能確切了解個別銀行財務狀況,對個別銀行發出預警。事後值的特性使研究者能同時考慮多項財務比率。另外,利用機率特性,可幫助研究者了解危機的程度,且能做總體的延伸運用。
本研究發展出兩種方法建構總體模型。第一種為百分比法,以危機銀行佔總銀行個數的比率為基礎;第二種為加權平均法,讓機率值高者有較大權數,機率小者有較小權數去建立一加權平均機率值。
將本研究的推論結果和「台灣金融服務業聯合總會委託計畫-台灣金融危機領先指標之研究」比較,顯示本模型的兩種方法皆與危機之發生有相同趨勢,而考慮危機訊號的設定後,方法二加權平均法顯然具備較佳的預測結果。此外相較總體面衝擊產生的危機,本模型在預測能力上,對來自銀行個體面造成的危機預測明顯較優異。 / International organizations defined and predicted country bank crises events without Taiwan, but they happened in Taiwan in the past twenty years. We construct the early warning system for banking crises in Taiwan and develop the specific model suited to our country. Using Bayesian Model’s specialities: (1) posterior value; (2) probability, we build a systematic model based on microeconomic data. So researcher can understand all financial conditions and predict the financial distresses of individual banks. The concept of posteriority lets researchers can consider a lot of financial ratio at the same time. The characteristic of probability makes researcher to extend the model to macroeconomic.
We develop two methods to build systematic model. One is Percentage method which is based on the percentage of financial distress banks to all banks. The other one is weighted average method which used large weight in financial distress bank and small weight in financial sound banks.
Comparing our results with the report that Taiwan Financial Services Roundtable issued in 2009, our methods have distress trends which link with crisis directly. But weighted average method has a better predict power than percentage method after considering the signals of distress we specify. Besides, our model has a stronger predictive power in crises from individual effect than crises from macroeconomic shocks.
|
3 |
以情境與行為意向分析為基礎之持續性概念重構個人化影像標籤系統 / Continuous Reconceptualization of Personalized Photograph Tagging System Based on Contextuality and Intention李俊輝 Unknown Date (has links)
生活於數位時代,巨量的個人生命記憶使得人們難以輕易解讀,必須經過檢索或標籤化才可以進一步瞭解背後的意涵。本研究著力個人記憶裡繁瑣及週期性的廣泛事件,進行於「情節記憶語意化」以及「何以權衡大眾與個人資訊」兩議題之探討。透過生命記憶平台裡影像標籤自動化功能,我們以時空資訊為索引提出持續性概念重構模型,整合共同知識、個人近況以及個人偏好三項因素,模擬人們對每張照片下標籤時的認知歷程,改善其廣泛事件上註釋困難。在實驗設計上,實作大眾資訊模型、個人資訊模型以及本研究持續性概念重構模型,並招收九位受試者來剖析其認知歷程以及註釋效率。實驗結果顯示持續性概念重構模型解決了上述大眾與個人兩模型上的極限,即舊地重遊、季節性活動、非延續性活動性質以及資訊邊界註釋上的問題,因此本研究達成其個人生命記憶在廣泛事件之語意標籤自動化示範。 / In the digital era, labeling and retrieving are ways to understand the meaning behind a huge amount of lifetime archive. Foucusing on tedious and periodic general events, this study will discuss two issues: (1) the semantics of episodic memory (2) the trade-off between common and personal knowledge. Using the automatic image-tagging technique of lifelong digital archiving system, we propose the Coutinuous Reconceptualization Model which models the cognitive processing of examplar categorization based on temporal-spatial information. Integrating the common knowlegde, current personal life and hobby, the Continuous Reconceptualization Model improves the tagging efficiency. In this experiment, we compare the accuracy of cognitive modeling and tagging efficiency of the three distinct models: the common knowledge model, personal knowledge model and Coutinuous Reconceptualization Model. Nine participants were recruited to label the photos. The results show that the Continous Reconceptualization Model overcomes the limitations inherent in other models, including the auto-tagging problems of modeling certain situations, such as re-visiting places, seasonal activities, noncontinuous activities and information boundary. Consequently, the Continuous Reconceptualization Model demonstrated the efficiency of the automatic image-tagging technique used in the semantic labeling of the general event of personal memory.
|
4 |
利用貝氏網路建構綜合觀念學習模型之初步研究 / An Exploration of Applying Bayesian Networks for Mapping the Learning Processes of Composite Concepts王鈺婷, Wang, Yu-Ting Unknown Date (has links)
本研究以貝氏網路作為表示教學領域中各個學習觀念的關係的語言。教學領域中的學習觀念包含了基本觀念與綜合觀念,綜合觀念是由兩個以上的基本觀念所衍生出來的觀念,而綜合觀念的學習歷程即為學生在學習的過程中如何整合這些基本觀念的過程。了解綜合觀念的學習歷程可以幫助教師及出題者了解學生的學習路徑,並修改其教學或出題的方針,以期能提供適性化的教學及測驗。為了從考生答題資料中尋找出這個隱藏的綜合觀念學習歷程,我們提出一套以mutual information以及一套以chi-square test所發展出來的研究方法,希望能夠藉由一個模擬環境中模擬考生的答題資料來猜測考生學習綜合觀念的學習歷程。
初步的實驗結果顯示出,在一些特殊的條件假設下,我們的方法有不錯的機會找到暗藏在模擬系統中的學習歷程。因此我們進而嘗試提出一個策略來尋找較大規模結構中的學習歷程,利用搜尋的概念嘗試是否能較有效率的尋找出學生對於綜合觀念學習歷程。雖然在實驗中並沒有十分理想的結果,但是在實驗的過程中,我們除了發現學生答題資料的模糊程度為系統的正確率的主要挑戰之外,另外也發現了學生類別與觀念能力之間的關係也是影響實驗結果的主要因素。透過我們的方法,雖然不能完美的找出學生對於任何綜合觀念的綜合歷程,但是我們的實驗過程與結果也對隱藏的真實歷程結構提供了不少線索。
最後,我們探討如何藉由觀察學生接受測驗的結果來分類不同學習程度與狀況的學生之相關問題與技術。我們利用最近鄰居分類法與k-means分群法以及基於這些方法所變化出的方法,探討是否能透過學生的答題資料有效的分辨學生能力的類別。實驗結果顯示出,在每個觀念擁有多道測驗試題的情況下,利用最近鄰居分類法與k-means分群法以及基於這些方法所變化出的方法,藉由考生答題資料來進行學生能力類別的分類可以得到不錯的正確率。我們希望這些探討和結果能對適性化教學作出一些貢獻。 / In this thesis, I employ Bayesian networks to represent relations between concepts in pedagogical domains. We consider basic concepts, and composite concepts that are integrated from the basic ones. The learning processes of composite concepts are the ways how students integrate the basic concepts to form the composite concepts. Information about the learning processes can help teachers know the learning paths of students and revise their teaching methods so that teachers can provide adaptive course contents and assessments. In order to find out the latent learning processes based on students’ item response patterns, I propose two methods: a mutual information-based approach and a chi-square test-stimulated heuristics, and examine the ideas in a simulated environment.
Results of some preliminary experiments showed that the proposed methods offered satisfactory performance under some particular conditions. Hence, I went a step further to propose a search method that tried to find out the learning process of larger structures in a more efficient way. Although the experimental results for the search method were not very satisfactory, we would find that both the uncertainty included by the students’ item response patterns and the relations between student groups and concepts substantially influenced the performance achieved by the proposed methods. Although the proposed methods did not find out the learning processes perfectly, the experimental processes and results indeed had the potential to provide information about the latent learning processes.
Finally, I attempted to classify students’ competence according to their item response patterns. I used the nearest neighbor algorithm, the k-means algorithm, and some variations of these two algorithms to classify students’ competence patterns. Experimental results showed that the more the test items used in the assessment, the higher the accuracy of classification we could obtain. I hope that these experimental results can make contributions towards adaptive learning.
|
Page generated in 0.0291 seconds