• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

在Spark大數據平台上分析DBpedia開放式資料:以電影票房預測為例 / Analyzing DBpedia Linked Open Data (LOD) on Spark:Movie Box Office Prediction as an Example

劉文友, Liu, Wen Yu Unknown Date (has links)
近年來鏈結開放式資料 (Linked Open Data,簡稱LOD) 被認定含有大量潛在價值。如何蒐集與整合多元化的LOD並提供給資料分析人員進行資料的萃取與分析,已成為當前研究的重要挑戰。LOD資料是RDF (Resource Description Framework) 的資料格式。我們可以利用SPARQL來查詢RDF資料,但是目前對於大量RDF的資料除了缺少一個高性能且易擴展的儲存和查詢分析整合性系統之外,對於RDF大數據資料分析流程的研究也不夠完備。本研究以預測電影票房為例,使用DBpedia LOD資料集並連結外部電影資料庫 (例如:IMDb),並在Spark大數據平台上進行巨量圖形的分析。首先利用簡單貝氏分類與貝氏網路兩種演算法進行電影票房預測模型實例的建構,並使用貝氏訊息準則 (Bayesian Information Criterion,簡稱BIC) 找到最佳的貝氏網路結構。接著計算多元分類的ROC曲線與AUC值來評估本案例預測模型的準確率。 / Recent years, Linked Open Data (LOD) has been identified as containing large amount of potential value. How to collect and integrate multiple LOD contents for effective analytics has become a research challenge. LOD is represented as a Resource Description Framework (RDF) format, which can be queried through SPARQL language. But large amount of RDF data is lack of a high performance and scalable storage analysis system. Moreover, big RDF data analytics pipeline is far from perfect. The purpose of this study is to exploit the above research issue. A movie box office sale prediction scenario is demonstrated by using DBpedia with external IMDb movie database. We perform the DBpedia big graph analytics on the Apache Spark platform. The movie box office prediction for optimal model selection is first evaluated by BIC. Then, Naïve Bayes and Bayesian Network optimal model’s ROC and AUC values are obtained to justify our approach.
2

支援數位人文研究之文本自動標註系統發展與使用評估研究 / Development and evaluation of an automatic text annotation system for supporting digital humanities research

劉鎮宇, Liu, Chen Yu Unknown Date (has links)
在傳統的人文研究中,人文學者大多以如古籍珍善本、歷史文獻等紙本出版形式之文本為主要研究文本型式,但是隨著資訊社會的來臨,許多研究機構陸續將這些紙本資料進行數位化並建置數位典藏資料庫,對人文研究環境與知識取得管道帶來巨大的改變,基於數位閱讀之文本研究型式也成為必然的發展趨勢。 因此,本研究發展支援數位人文研究之「文本自動標註系統」,藉由Linked Data的概念匯集來自不同資料庫的資源,並加以整合後,替文本進行自動註解,讓使用者在解讀文本時能夠即時參照其他資料庫的資源,並提供友善的具文本標註之閱讀介面,以利於人文學者透過閱讀進行資料的解讀。本研究以實驗研究法比較本研究所發展之「文本自動標註系統」與「MARKUS文本半自動標註系統」在支援人文學者進行文本資料解讀之閱讀成效與科技接受度是否具有顯著差異,並輔以半結構式深度訪談了解人文學者對於本研究發展之「文本自動標註系統」的看法及感受,也進一步分析「文本自動標註系統」閱讀成效、科技接受度及使用者行為歷程之間是否具有關聯性。 實驗結果發現,採用本研究發展之文本自動標註系統的閱讀成效高於MARKUS文本半自動標註系統,但未達顯著差異;而科技接受度分析結果則顯示文本自動標註系統之科技接受度顯著優於MARKUS文本半自動標註系統。另外,從訪談結果歸納得知,文本自動標註系統閱讀介面簡潔明瞭,比MARKUS文本半自動標註系統更適合閱讀,而閱讀介面是否易於使用與是否有用,是影響人文學者能否接受採用系統輔助數位人文研究的重要因素。此外,在兩個系統類似功能比較分析後也發現,文本自動標註系統在查詢詞彙功能、連結到來源網站功能及新增標註功能都比MARKUS文本半自動標註系統更為直覺易用。另外人文學者普遍認為斷句功能比自動斷詞功能更重要,鏈結來源資料庫則以萌典最有幫助。最後,採用文本自動標註系統之閱讀成效與使用者行為歷程之間無顯著關聯性。 / In traditional humanities research, most humanities scholars studied text-type paper-based publishing texts, such as rare ancient books and historical literature. However, many research institutes, in the information society, gradually digitalized such paper-based data and established digital archives database to result in great changes in humanities research environment and knowledge acquisition channels. The research pattern with digital reading based texts became the essential development trend. For this reason, an “automatic text annotation system” for supporting digital humanities research is developed in this study. Resources from distinct database are gathered through Linked Data and integrated for the automatic annotation of texts. It allows users immediately referring to resources from other database when interpreting texts and provides friendly reading interface with text annotation for humanities scholars interpreting data through reading. With experimental research, the “automatic text annotation system” developed in this study is compared with “MARKUS semi-automatic text annotation system” for supporting humanities scholars interpreting text data to discussed the difference in reading achievement and technology acceptance. Semi-structured in-depth interviews are also proceeded to understand humanities scholars’ opinions and perception about the “automatic text annotation system” developed in this study as well as to analyze the correlations among reading achievement, technology acceptance, and user behavior course of the “automatic text annotation system”. The experimental findings show that the reading achievement with the automatic text annotation system developed in this study is higher than that with MARKUS semi-automatic text annotation system, but not achieving the significance. The technology acceptance analysis reveals remarkably better technology acceptance of the automatic text annotation system than MARKUS semi-automatic text annotation system. According to the interviews, the reading interface of the automatic text annotation system is simple and clear that it is more suitable for reading than MARKUS semi-automatic text annotation system. The ease of use and usefulness of reading interface is a key factor in humanities scholars accepting the system for the digital humanities research. In regard to the comparison of similar functions between two systems, the functions of vocabulary enquiry, linking to source web sites, and annotation appending of the automatic text annotation system are more intuitive and easy to use than those of MARKUS semi-automatic text annotation system. What is more, humanities scholars emphasize more on the sentence segmentation function than the automatic word segmentation function, and the linked source database, Moedict, appears the best assistance. Finally, there is no significant correlation between reading achievement and user behavior course with the automatic text annotation system.

Page generated in 0.0163 seconds